digitalmars.D.bugs - [Issue 5348] New: Variable Length Arrays
- d-bugmail puremagic.com (165/165) Dec 13 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (27/43) Jan 03 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (44/71) Jan 03 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (27/27) Feb 13 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (10/11) May 26 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (17/17) Nov 15 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (17/33) Nov 15 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (7/7) Mar 28 2013 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (21/21) Mar 28 2013 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (53/63) Mar 28 2013 http://d.puremagic.com/issues/show_bug.cgi?id=5348
- d-bugmail puremagic.com (7/8) Mar 28 2013 Since it's different, yes, but I think this is complex enough that it sh...
- d-bugmail puremagic.com (8/10) Mar 29 2013 http://d.puremagic.com/issues/show_bug.cgi?id=5348
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Summary: Variable Length Arrays Product: D Version: D2 Platform: Other OS/Version: Windows Status: NEW Severity: enhancement Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc This is an enhancement request for a new feature. It's a purely additive change (fully backwards compatible with D2), so it may wait for later stages of D2 development, or even for D3. I suggest to add the Variable Length Arrays (VLA) of C99 to D, using the same syntax used in C99. How D VLAs are represented on the stack: they are represented with an immutable size_t length field followed by a (possibly aligned) contiguous block of data (the pointer to the data is not necessary because the data is in-place). The length field is necessary because the contents of the variable used to define the array length may later change. Multi dimensional VLAs too are accepted. When a VLA array is allocated the D compiler tests that there is enough free stack left, and otherwise throws a runtime exception/error (like stack overflow). As in C99 "VLAarray.sizeof" is not a compile time constant, it returns the length field * item.sizeof. How VLAs are returned: they may be returned by value (creating a new VLA in the function where the return value is received), or by reference as normal dynamic array (by automatic copy on the heap). Both solutions have advantages and disadvantages. The return by dynamic array is simpler for the programmer because it changes less things in the language and it requires no syntax changes. Currently (DMD 2.050) this code: void foo(int len) { int[len] arr; } void main() { foo(1); } DMD shows four times the same error: test.d(2): Error: Integer constant expression expected instead of len test.d(2): Error: Integer constant expression expected instead of cast(uint)len test.d(2): Error: Integer constant expression expected instead of cast(uint)len test.d(2): Error: Integer constant expression expected instead of cast(uint)len Some of the disadvantages of this idea are: - It adds some complexity to both the language and the compiler. The programmer probably has to remember that there is another kind of array in D. - It may encourage more usage of the stack, but the stack space is limited compared to the heap space, so it may lead to more stack overflows. But see below for an alternative view on this. - Unlike C99, D already has dynamic arrays that cover part of the needs of dynamic length arrays. - It introduces a special case in the sizeof. - Like fixed-sized arrays, the VLAs are managed by value, so a not careful usage of them may lead to reduced performance caused by long array copies. - D already allows to use alloca() for variable length stack allocation (unlike C where alloca() is a common but non-standard function). alloca() usage is probably uncommon in D code. - I may have missed or underestimated some disadvantages of VLAs in D, in this case please add a comment below. Despite such disadvantages I think that the advantages outweigh them: - VLAs lead more elegant and readable code. - Both its syntax (and part of its semantics) is present in C99 already, and it's a natural syntax, so programmers probably will have no significant troubles in using them. - VLAs cover most usage cases of alloca(), and the introduction of VLAs don't removes the alloca() from Phobos, so this is a backward compatible change. - It is safer than alloca() because its syntax is more clean than alloca(). - Compared to alloca() it leads to safer code because a VLA is a true array, so the D compiler is able to enforce array bounds in non-release mode. This is possible with alloca() too, taking a slice of the return value of alloca(), but it requires explicit code, while in a VLA there is no need of this extra code. - VLA require less code that may contain bugs. This is an important difference for D, that's a language designed to be quite less bug-prone than C. There is no need of pointer usage, cast(), sizeof, memset() (or assignment to .init), slicing the stack pointer [], and the manual management of the situation when alloca() returns a null. This is code that shows the difference, first using alloca(): import core.stdc.stdlib: alloca; import std.stdio: writeln; import std.conv: to; struct Foo { int x = 10; string toString() { return to!string(x); } ~this() { /* something here */ } } void bar(int len) { Foo* ptr = cast(Foo*)alloca(len * Foo.sizeof); if (ptr == null) throw new Exception("alloca failed"); Foo[] arr = ptr[0 .. len]; foreach (ref item; arr) item = Foo.init; // some code here writeln(arr); foreach (ref item; arr) item.__dtor(); } void main() { bar(20); } And then equivalent code using VLAs: void bar(int len) { Foo[len] arr; // some code here writeln(arr); } As you see the D compiler is aware that 'arr' is an array, to it calls the destructors of Foo when the scope of 'arr' ends. If you use alloca() I think this needs to be done manually. If the VLA 'arr' array is 2D like this, then the code using alloca() gets more complex: Foo[len][len*2] arr; The semantics of VLA is defined better and more easy to remember, because it's the same as every other variable. The memory of the VLA is deallocated when the variable scope ends. With alloca() the situation is different and less clear, see bug 3822 . Maybe the alloca() used by DMD frees memory as soon as the current scope is left, instead of deferring all deallocation until function exit. See: http://compilers.iecc.com/comparch/article/91-12-079 Compared to dynamic arrays, the allocation of a VLA may be (and often are) more efficient, especially until D doesn't gain an advanced generational GC that has an "Eden" (that allows allocation of short lived objects in a stack-like memory area). I have seen several times people ask for, or just obliviously use (not knowing it doesn't work, DMD 2.050 doesn't refuse it), a syntax like this in D2: scope arr = new int[100]; The VLAs replace such need with a more semantically defined situation (because VLAs are more like fixed-sized arrays. While scoped dynamic arrays share some of the faults of scoped object allocation that has being deprecated in D2). Despite D has a GC, in a system language stack is an important resource, because it decreases the work of the GC reducing the garbage to clean, makes deallocation (and destructor calling) more deterministic (RAII), and often increases performance. So it's positive to offer something better that replaces the usage of alloca() and allows safer stack allocations. In some situations stack allocation leads to clean forms of code too. While encouraging too much the usage of the stack may lead to more frequent stack overflows, the usage of VLA may even reduce the stack space needed. If VLA are not available often alloca() is not used anyway because its usage is not very natural and it's not clean. So the programmer often allocates on the stack a fixed-sized array that's "large enough" for most cases. If such function is called some times in some recursive way and such large enough arrays build on each other, the total stack memory used may be higher than using VLAs that allow to use only the minimum needed memory for each function call. Reducing stack size often increases the performance too, because the less stack memory is used, the less CPU cache misses there are (I have seen that reducing the length of stack allocated arrays increases performance. Also because DMD initializes arrays, so the shorter they are the less items to initialize there are). I have implemented an efficient function that computes the Levenshtein distance between two strings, that need one or two temporary arrays to store the dynamic programming table. The arrays are allocated on the stack if they are small (it means small input strings), and on the heap if they are larger. A program calls this Levenshtein distance function millions of times because it's needed to build a particular tree that uses distances among keys to allow certain very efficient queries over the key space. I have seen that in this program using alloca() speeds up the code significantly over both dynamic allocation of arrays and initialization of a fixed-sized array of a "large enough" array for the small string case. Even defining the stack allocated array with "=void" is not as fast as using alloca) in this situation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 13 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Some comments by Dmitry Olshansky: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=125942As stated in this proposal they are quite useless, e.g. they are easily implemented via mixin with alloca. Plus, what's the benefit in throwing exception on alloca failure, how you suppose the user to handle this stackoverflow exception? I would have expected a better design to provide an optional parameter to fallback to GC or something ( like malloc(...) + scope(exit) free(...); ) and that's indicates a library solution, of course.Some answers to Dmitry Olshansky:As stated in this proposal they are quite useless, e.g. they are easily implemented via mixin with alloca.This is false, a mixin solution can't replace a VLA because: - The compiler knows what a VLA is, so it may copy it automatically (as it does with fixed-sized arrays) if you return a VLA from a function. - vla_array.sizeof gives the correct answer - alloca() is not pure, while a function that uses VLA can be pure. - The VLA syntax is nicer, it doesn't look like hack, this encourages their usage, and it doesn't require imports. And if you need a 2D variable length matrix, you don't need a different mixin, the VLA syntax supports multi dimensional arrays.Plus, what's the benefit in throwing exception on alloca failure, how you suppose the user to handle this stackoverflow exception?The proposal says that an error too is OK:When a VLA array is allocated the D compiler tests that there is enough free stack left, and otherwise throws a runtime exception/error (like stack overflow).I would have expected a better design to provide an optional parameter to fallback to GC or something ( like malloc(...) + scope(exit) free(...); ) and that's indicates a library solution, of course.This is a possibility to keep in mind. But it makes the VLA implementation a bit more complex: - in the current proposal the VLA is represented by just a length. If you add a fall-back mechanism then you need to keep a pointer too on the stack. - If you return a VLA the code has to copy the data contents only if the VLA is allocated on the stack. - This semantics is different from the C99 one. So people that know C99 need to read about this difference and understand it, and C99 code ported to D probably needs to keep in account the difference. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 03 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh gmail.com 05:18:26 PST ---Introducing more magic then it's worth - one thing I really dislike about this proposal. It doesn't solve anything at all, yet uglifies language. How would function signature look like for function returning VLA? How it then interacts with other array types? Is supposed to be a slice of stack? I guess, no. But with alloca that's all : Foo* ptr = cast(Foo*)alloca(len * Foo.sizeof); Foo[] arr = ptr[0 .. len];//that's all, last time I asked Steven, I got the answer that you can even append to it (which would reallocate on first append) Making it mixin is little extra cost for an *unsafe* feature as it is.As stated in this proposal they are quite useless, e.g. they are easily implemented via mixin with alloca.This is false, a mixin solution can't replace a VLA because: - The compiler knows what a VLA is, so it may copy it automatically (as it does with fixed-sized arrays) if you return a VLA from a function.- vla_array.sizeof gives the correct answerSmall benefit since you now beforehand the size you passed to it, and can create slice out of of it. Plus, what semnatics do you propouse for VLA's on return - value type? How it fits the rest of language?- alloca() is not pure, while a function that uses VLA can be pure...but as defined VLAs subtly destroing nothrow property, and in general are leaky abstraction.- The VLA syntax is nicer, it doesn't look like hack, this encourages their usage, and it doesn't require imports. And if you need a 2D variable length matrix, you don't need a different mixin, the VLA syntax supports multi dimensional arrays.I might be wrong but matrices are usaully either small and fixed sized, or big arbitrarily sized and require allocation anyways. IMO every special case language feature should do something importantly useful to prove worthy.I don't care what proposal *stated* is OK, I care for the user of this feature, would he/she like exception he/she can't *handle* in any sensible way or some other default mechanism? I'm not seeing VLA as much safer/better then alloca in this regard (in the latter you at least have options if you are the author of this piece of code).Plus, what's the benefit in throwing exception on alloca failure, how you suppose the user to handle this stackoverflow exception?The proposal says that an error too is OK:When a VLA array is allocated the D compiler tests that there is enough free stack left, and otherwise throws a runtime exception/error (like stack overflow).Again, this just shows how VLAs as stated are not getting anything good. Library solution can be complex and have options, built-in on the contrary should be clear and lightweight. Slicing the result of alloca + some fallback on failure path is OK (at least it has clear semantics, and explicit).I would have expected a better design to provide an optional parameter to fallback to GC or something ( like malloc(...) + scope(exit) free(...); ) and that's indicates a library solution, of course.This is a possibility to keep in mind. But it makes the VLA implementation a bit more complex:- in the current proposal the VLA is represented by just a length. If you add a fall-back mechanism then you need to keep a pointer too on the stack.It's not _too_ but _instead_ of reserving stack space :)- If you return a VLA the code has to copy the data contents only if the VLA is allocated on the stack.And I yet have to see how function's return type would be defined as VLA. What's makes me sad is there is not a single thing about actually defining anything e.g. semantics of copying, interaction with the rest of language. It's all more about "kind cool to have that, something like that, you know..."- This semantics is different from the C99 one. So people that know C99 need to read about this difference and understand it, and C99 code ported to D probably needs to keep in account the difference.We should ask C99 folks if there are lots of VLAs used at all. Sorry for being too critical, but at very least the proposal needs a lot of refinement on actual implementation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 03 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Ada language too supports stack-allocation of 2D arrays with run-time sizes, an example: with Ada.Text_Io; use Ada.Text_Io; with Ada.Float_Text_Io; use Ada.Float_Text_Io; with Ada.Integer_Text_Io; use Ada.Integer_Text_Io; procedure Two_Dimensional_Arrays is type Matrix_Type is array(Positive range <>, Positive range <>) of Float; Dim_1 : Positive; Dim_2 : Positive; begin Get(Item => Dim_1); Get(Item => Dim_2); -- Create an inner block with the correctly sized array declare Matrix : Matrix_Type(1..Dim_1, 1..Dim_2); begin Matrix(1, Dim_2) := 3.14159; Put(Item => Matrix(1, Dim_2), Fore => 1, Aft => 5, Exp => 0); New_Line; end; -- The variable Matrix is popped off the stack automatically end Two_Dimensional_Arrays; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 13 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348type Matrix_Type is array(Positive range <>, Positive range <>) of Float;That Matrix_Type it subtle: it also shows that Ada allows the definition of the _type_ of a _rectangular_ 2D stack-allocated array where the sizes of the sides of the matrix are not known at compile-time and are not specified into the type itself. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 26 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Currently this works, and ct_function() is run at compile-time because fixed array length definitions is a compile-time context, so in D it forces CTFE: int ct_function(int x) { return x * 2; } void main() { int[ct_function(5)] a; } If VLA come in D, and if their definition syntax is the same as the current fixed array definition syntax, then that length definition stops being a compile-time context, and the compiler is not forced (but free any way) to run ct_function() at compile-time. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 15 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348 timon.gehr gmx.ch changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timon.gehr gmx.chCurrently this works, and ct_function() is run at compile-time because fixed array length definitions is a compile-time context, so in D it forces CTFE: int ct_function(int x) { return x * 2; } void main() { int[ct_function(5)] a; } If VLA come in D, and if their definition syntax is the same as the current fixed array definition syntax, then that length definition stops being a compile-time context, and the compiler is not forced (but free any way) to run ct_function() at compile-time.It is not necessarily able to solve the particular halting problem instance. The best that could be done without breaking backwards compatibility would be to define the semantics of int[foo()] a; as either: if foo() can be executed at compile time, take the result as the array length, else either don't terminate compilation or introduce a VLA. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 15 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5348 See also for an alternative design: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3532.html -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 28 2013
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |bugzilla digitalmars.com Platform|Other |All Resolution| |WONTFIX OS/Version|Windows |All 13:02:08 PDT --- 1. VLAs are a failure in C99. 2. I'd prefer to deal with stack allocated arrays by optimization rather than new syntax & semantics, i.e.: int[] array = new int[5]; and determining that array[] can never leave its scope, and so can be allocated on the stack. 3. Consider that static arrays are passed by value to functions, rather than by reference. VLAs for static arrays mess this up. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 28 2013
http://d.puremagic.com/issues/show_bug.cgi?id=5348 Thank you for your comments.1. VLAs are a failure in C99.I agree, let's invent something better. My ideas have changed, and now I think it's better to define DSSAA in library code that is recognized and managed in a special way by the compiler, as here for C++: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3532.html Ada2012 has added stack-allocated collections. Rust allows any thing you want to be allocated on the stack, if you want. They know that sometimes heap allocations are bad for performance. Dynamic-size stack-allocated arrays (abbreviated to DSSAA) will be a base to create several other stack-allocated collections for D, as in Ada (and in future Rust).2. I'd prefer to deal with stack allocated arrays by optimization rather than new syntax & semantics, i.e.: int[] array = new int[5]; and determining that array[] can never leave its scope, and so can be allocated on the stack.Your idea has problems: 1) Since some time Java has added escape analysis to stack-allocate some objects and reduce a little the pressure on the GC. This feature is useful in Java, but also it shows its limits, in many cases it fails, so it doesn't bring a large improvement in Java. 2) I'd like DSSAA to be able to leave the scope (the simplest way to do this is to "dup" on them, copying them on the heap. Below I show another way to do it). A solution is to invent library-defined arrays that have a semantics different from the regular dynamic arrays. See below.3. Consider that static arrays are passed by value to functions, rather than by reference. VLAs for static arrays mess this up.A solution is to add a special value array to Phobos, as in that n3532, and then let the D compiler manage it in a special way, allocating it on the stack where possible (if you use it inside a struct its storage goes on the heap, like a dynamic array). In the following case foo creates a DSSAA and returns it. A DSSAA keeps its lenght beside the data, in the stack frame. At the return point inside bar() bar allocates another DSSAA on the stack (increasing the size of the stack frame of bar) and copies the received data: import std.collections: ValArray; ValArray!int foo(int n) { auto a = ValArray!int(n); // on the stack. return a; } void bar() { ValArray!int b = foo(5); // copied on the stack. } In this case foo() creates the DSSAA and calls bar with it. D just returns pointer to the data on the stack frame plus length (so it's a kind of slice) and then under the cover the data is also copied inside the stack frame of bar: import std.collections: ValArray; void foo(int n) { auto a = ValArray!int(n); bar(a); } void bar(ValArray!int b) { } Probably I have to open an enhancement request on this. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 28 2013
http://d.puremagic.com/issues/show_bug.cgi?id=5348 17:02:19 PDT ---Probably I have to open an enhancement request on this.Since it's different, yes, but I think this is complex enough that it should be done as a DIP, not a simple enhancement request. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 28 2013
http://d.puremagic.com/issues/show_bug.cgi?id=5348but I think this is complex enough that it should be done as a DIP, not a simple enhancement request.I agree. But at the moment I am not good enough to write a DIP, so for now I have opened just Issue 9832 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 29 2013