www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Resizable Arrays?

reply Jason House <jason.james.house gmail.com> writes:
Are there any good reasons to allow built in arrays to be resizable?

There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.

There are also bugs in bugzilla where resizing arrays after assigning one array
to another violates type safety. Those bugs can be solved by always
reallocatimg arrays when resizing, but that makes performance even worse...



Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... 
Feb 27 2009
next sibling parent reply "Tim M" <a b.com> writes:
On Sat, 28 Feb 2009 12:55:44 +1300, Jason House  
<jason.james.house gmail.com> wrote:

 Are there any good reasons to allow built in arrays to be resizable?

 There's already plenty of empirical evidence that building arrays by  
 appending is incredibly slow.

 There are also bugs in bugzilla where resizing arrays after assigning  
 one array to another violates type safety. Those bugs can be solved by  
 always reallocatimg arrays when resizing, but that makes performance  
 even worse...



 Is it time to rely on standard libraries for data structures and let  
 built in arrays be fixed length? It's a breaking change...
C++ has stl vectors to make it up for lack of resizeable arrays. Theres nothing stopping you from having all your arrays static in D. I think your crazy :)
Feb 27 2009
parent reply BCS <ao pathlink.com> writes:
Reply to tim,

 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)
 
I think the thought is that arrays can have dynamic length but can't be changed after allocation. starting with: int[] arr = new int[15]; assigning to length arr.length = 30; would always become auto tmp = new int[30]; tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)]; arr = tmp;
Feb 27 2009
parent reply Jason House <jason.james.house gmail.com> writes:
BCS Wrote:

 Reply to tim,
 
 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)
 
I think the thought is that arrays can have dynamic length but can't be changed after allocation.
Right. Length is fixed until (re)allication.
 starting with:
 
 int[] arr = new int[15];
 
 assigning to length
 
 arr.length = 30;
 
 would always become
 
 auto tmp = new int[30];
 tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)];
 arr = tmp;
Either that or a simple compile error... It's also possible a middle ground where only const(T)[] is fixed length until (re)allocation. I'm hoping there will be good discussion on the pros and cons of alternatives.
Feb 27 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 28 Feb 2009 04:27:52 +0300, Jason House <jason.james.house gmail.com>
wrote:

 BCS Wrote:

 Reply to tim,

 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)
I think the thought is that arrays can have dynamic length but can't be changed after allocation.
Right. Length is fixed until (re)allication.
 starting with:

 int[] arr = new int[15];

 assigning to length

 arr.length = 30;

 would always become

 auto tmp = new int[30];
 tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)];
 arr = tmp;
Either that or a simple compile error... It's also possible a middle ground where only const(T)[] is fixed length until (re)allocation. I'm hoping there will be good discussion on the pros and cons of alternatives.
I might be wrong, but it seems that arrays are always reallocating memory when resize. I might be wrong but that's the behaviour I discovered in a recent test.
Feb 27 2009
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
http://www.digitalmars.com/d/2.0/builtin.html
 There's already plenty of empirical evidence that building arrays by 
 appending is incredibly slow.
<snip> You need to really get into programming in D and explore the many possible ways of using arrays. Stewart.
Feb 27 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Jason House (jason.james.house gmail.com)'s article
 Are there any good reasons to allow built in arrays to be resizable?
 There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.
 There are also bugs in bugzilla where resizing arrays after assigning one array
to another violates type safety. Those bugs can be solved by always reallocatimg arrays when resizing, but that makes performance even worse...

 Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... I think that builtin arrays in D are a godsend. Resizable arrays are such an important, heavily used feature that they should get special treatment from the language, to ensure that they are syntactically elegant, efficient both at runtime and in terms of compilation time, easy to use, free of odd corner cases, and in general are true first class citizens. Nonetheless, D's arrays do have some rough edges. My proposal would be as follows: There should be two array types: T[new] and T[]. This has been proposed before, though not in as much detail as what I'm suggesting. The semantics should work as follows: T[new] is considered a subtype of T[]. This works mostly like you would expect. T[new] can be implicitly converted to T[]. The twist is that T[] can also be assigned to T[new] without any explicit conversion, but a copy of the underlying data is implicitly made for safety. Assigning either a T[] or a T[new] to a T[] will result in aliasing: T[] foo; T[new] bar = new T[N]; bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 2. T[new]s are always assigned by value to make concatenation and resizing safe. The underlying data is copied implicitly when assigning from a T[] to a T[new] or from a T[new] to another T[new]. T[new] foo; T[] bar = new T[N]; // new returns T[new], using implicit conversion. bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 1. T[]s can be sliced, but cannot be enlarged or appended to. They should be laid out the same way they are now, as a struct of a pointer and a length. T[new]s can be appended to and enlarged in addition to being sliced. Their layout should be a struct of pointer, length, capacity to make appending fast. This will also make implicit conversion to T[] essentially free, since all that is necessary is slicing off the capacity field. I also wonder if this proposal could be extended to fix some of the covariance issues with arrays (see Bugzilla 2095). This might work by only allowing covariant types to be assigned to T[new], not T[]. For example: class A{} class B:A{} B[new] ba = [new B]; A[] aa = ba; // Error: Cannot assign covariant types to T[]. A[new] aa = ba; // Works, but copy is implicitly made.
Feb 27 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Jason House (jason.james.house gmail.com)'s article
 Are there any good reasons to allow built in arrays to be resizable?
 There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.
 There are also bugs in bugzilla where resizing arrays after assigning one array
to another violates type safety. Those bugs can be solved by always reallocatimg arrays when resizing, but that makes performance even worse...

 Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... I think that builtin arrays in D are a godsend. Resizable arrays are such an important, heavily used feature that they should get special treatment from the language, to ensure that they are syntactically elegant, efficient both at runtime and in terms of compilation time, easy to use, free of odd corner cases, and in general are true first class citizens. Nonetheless, D's arrays do have some rough edges. My proposal would be as follows: There should be two array types: T[new] and T[]. This has been proposed before, though not in as much detail as what I'm suggesting. The semantics should work as follows: T[new] is considered a subtype of T[]. This works mostly like you would expect. T[new] can be implicitly converted to T[]. The twist is that T[] can also be assigned to T[new] without any explicit conversion, but a copy of the underlying data is implicitly made for safety. Assigning either a T[] or a T[new] to a T[] will result in aliasing: T[] foo; T[new] bar = new T[N]; bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 2. T[new]s are always assigned by value to make concatenation and resizing safe. The underlying data is copied implicitly when assigning from a T[] to a T[new] or from a T[new] to another T[new]. T[new] foo; T[] bar = new T[N]; // new returns T[new], using implicit conversion. bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 1. T[]s can be sliced, but cannot be enlarged or appended to. They should be laid out the same way they are now, as a struct of a pointer and a length. T[new]s can be appended to and enlarged in addition to being sliced. Their layout should be a struct of pointer, length, capacity to make appending fast. This will also make implicit conversion to T[] essentially free, since all that is necessary is slicing off the capacity field. I also wonder if this proposal could be extended to fix some of the covariance issues with arrays (see Bugzilla 2095). This might work by only allowing covariant types to be assigned to T[new], not T[]. For example: class A{} class B:A{} B[new] ba = [new B]; A[] aa = ba; // Error: Cannot assign covariant types to T[]. A[new] aa = ba; // Works, but copy is implicitly made.
One minor detail that I thought of: When returning a T[new] from a function, the copying could be optimized away. If any escaping happened during the function call, this would be either by value or as a T[] instead of a T[new]. Therefore, at the end of a function, the only T[new] reference to the heap space in question is guaranteed to be the T[new] that is about to be returned. Since the reference in the callee is about to be destroyed, it can safely be returned without any copying.
Feb 27 2009
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
 
 There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.
 
 There are also bugs in bugzilla where resizing arrays after assigning one
array to another violates type safety. Those bugs can be solved by always
reallocatimg arrays when resizing, but that makes performance even worse...
 

 
 Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... 
C has dynamic arrays now. If D got rid of them I might just cry.
Feb 28 2009
parent "Tim M" <a b.com> writes:
On Sun, 01 Mar 2009 05:30:28 +1300, Sean Kelly <sean invisibleduck.org>  
wrote:

 Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
  There's already plenty of empirical evidence that building arrays by  
 appending is incredibly slow.
  There are also bugs in bugzilla where resizing arrays after assigning  
 one array to another violates type safety. Those bugs can be solved by  
 always reallocatimg arrays when resizing, but that makes performance  
 even worse...

  Is it time to rely on standard libraries for data structures and let  
 built in arrays be fixed length? It's a breaking change...
C has dynamic arrays now. If D got rid of them I might just cry.
How do you mean? Are you talking about something like this: void someFunc(int s) { int ar[s]; }
Feb 28 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Sean Kelly Wrote:

 Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
 
 There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.
 
 There are also bugs in bugzilla where resizing arrays after assigning one
array to another violates type safety. Those bugs can be solved by always
reallocatimg arrays when resizing, but that makes performance even worse...
 

 
 Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... 
C has dynamic arrays now. If D got rid of them I might just cry.
By dynamic, do you mean no length as part of the type or resizable in place? Array resizing can lead to hidden allocations, which is against Tango design (my closest proxy for your style preferences). My past use of arrays in C did not include calls to reallocate them in place.
Feb 28 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Jason House (jason.james.house gmail.com)'s article
 Sean Kelly Wrote:
 Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?

 There's already plenty of empirical evidence that building arrays by
appending is incredibly slow.
 There are also bugs in bugzilla where resizing arrays after assigning one
array to another violates type safety. Those bugs can be solved by always reallocatimg arrays when resizing, but that makes performance even worse...


 Is it time to rely on standard libraries for data structures and let built
in arrays be fixed length? It's a breaking change...
 C has dynamic arrays now.  If D got rid of them I might just cry.
By dynamic, do you mean no length as part of the type or resizable in place?
Array resizing can lead to hidden allocations, which is against Tango design (my closest proxy for your style preferences). My past use of arrays in C did not include calls to reallocate them in place. IMHO, you can't have a high level language while at the same time completely avoiding hidden allocations. The whole point of D is that it's supposed to be a higher level language than the others in its niche, namely C and C++. The cliche is that we should forget about small efficiencies 97% of the time. If you're writing something that falls in that 97%, in a high level language, stuff should just work and you shouldn't care about a few allocations. If you're writing stuff in the other 3%, the onus should be on you to avoid features with hidden allocations. Writing performance critical or real-time code is hard, and cutting back on (slightly) hidden allocations isn't going to help much.
Mar 01 2009
parent Bill Baxter <wbaxter gmail.com> writes:
On Sun, Mar 1, 2009 at 11:17 PM, dsimcha <dsimcha yahoo.com> wrote:
 =3D=3D Quote from Jason House (jason.james.house gmail.com)'s article
 Sean Kelly Wrote:
 Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?

 There's already plenty of empirical evidence that building arrays by
appending is incredibly slow.
 There are also bugs in bugzilla where resizing arrays after assignin=
g one
 array to another violates type safety. Those bugs can be solved by always
 reallocatimg arrays when resizing, but that makes performance even worse.=
..


 Is it time to rely on standard libraries for data structures and let=
built
 in arrays be fixed length? It's a breaking change...
 C has dynamic arrays now. =A0If D got rid of them I might just cry.
By dynamic, do you mean no length as part of the type or resizable in pl=
ace?
 Array resizing can lead to hidden allocations, which is against Tango des=
ign (my
 closest proxy for your style preferences). My past use of arrays in C did=
not
 include calls to reallocate them in place.

 IMHO, you can't have a high level language while at the same time complet=
ely
 avoiding hidden allocations. =A0The whole point of D is that it's suppose=
d to be a
 higher level language than the others in its niche, namely C and C++. =A0=
The cliche
 is that we should forget about small efficiencies 97% of the time. =A0If =
you're
 writing something that falls in that 97%, in a high level language, stuff=
should
 just work and you shouldn't care about a few allocations. =A0If you're wr=
iting stuff
 in the other 3%, the onus should be on you to avoid features with hidden
 allocations. =A0Writing performance critical or real-time code is hard, a=
nd cutting
 back on (slightly) hidden allocations isn't going to help much.
I pretty much agree, but it would be nice if there were some debugging flag you could toggle to cause the gc to assert() if some allocation is attempted. Like std.gc.prohibitAllocations() or somesuch. That way in the block of code where you think you are avoiding allocation it would be easy to find a place where you slipped up. --bb --bb
Mar 01 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jason House>By dynamic, do you mean no length as part of the type or resizable
in place? Array resizing can lead to hidden allocations,<

Thy aren't hidden: when you grow a dynamic array you know an allocation
generally takes place. If you don't want reallocations, then never grow or join
them, and the reallocations will not happen. There's nothing hidden.

Bye,
bearophile
Feb 28 2009
parent Lutger <lutger.blijdestijn gmail.com> writes:
bearophile wrote:

 Jason House>By dynamic, do you mean no length as part of the type or 
resizable in place? Array resizing can lead to hidden allocations,<
 
 Thy aren't hidden: when you grow a dynamic array you know an allocation 
generally takes place. If you don't want reallocations, then never grow or join them, and the reallocations will not happen. There's nothing hidden.
 
 Bye,
 bearophile
I agree, it's something that I think is quickly understood by newcomers to where the common idiom is to use StringBuilder. I have seen such builders floating around in the ng, but is it phobos or tango already?
Mar 01 2009