digitalmars.D.learn - dynamic array capacity
- spir (14/14) Dec 29 2010 Hello,
- Steven Schveighoffer (8/12) Dec 29 2010 int[] x;
- %u (2/15) Dec 29 2010 Oh, wow! D2 really kicks D1's ass here..
- spir (9/28) Dec 29 2010 =20
- spir (70/89) Dec 29 2010 =20
- Steven Schveighoffer (14/28) Dec 29 2010 What is the baseline for this? I.e. what is it 20% slower than? FWIW,
- spir (36/68) Dec 29 2010 =20
- Steven Schveighoffer (35/84) Dec 29 2010 Then I would be suspect of your test. In simple tests I just ran,
Hello, Is there a common idiom to pre-allocate a dynamic array. I mean allocating = to avoid numerous re-allocations in loop, not setting length & filling cont= ent. The differences are: (0) no content --> no init (1) one can simply append, instead of setting elements at defined indices (2) one can append sub-sequences (3) when the initially requested capacity is full, the array automagically = resizes as needed (usng D's builtin dynarray realloc scheme). Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 29 2010
On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir gmail.com> wrote:Hello, Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.int[] x; x.reserve(10000); // pre-allocate at least 10000 elements for appending Another way to do it: Appender!(int[]) app; app.reserve(10000); this will perform much better than builtin array appending. -Steve
Dec 29 2010
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s articleOn Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir gmail.com> wrote:Oh, wow! D2 really kicks D1's ass here..Hello, Is there a common idiom to pre-allocate a dynamic array. I mean allocating to avoid numerous re-allocations in loop, not setting length & filling content.int[] x; x.reserve(10000); // pre-allocate at least 10000 elements for appending Another way to do it: Appender!(int[]) app; app.reserve(10000); this will perform much better than builtin array appending. -Steve
Dec 29 2010
On Wed, 29 Dec 2010 11:24:01 -0500 "Steven Schveighoffer" <schveiguy yahoo.com> wrote:On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir gmail.com> wrote: =20=20Hello, Is there a common idiom to pre-allocate a dynamic array. I mean =20 allocating to avoid numerous re-allocations in loop, not setting length=Thank you, Steven. I could not find anything like 'reserve' in the property= table because it's not a property ;-) Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com& filling content.=20 int[] x; x.reserve(10000); // pre-allocate at least 10000 elements for appending =20 Another way to do it: =20 Appender!(int[]) app; app.reserve(10000); =20 this will perform much better than builtin array appending. =20 -Steve
Dec 29 2010
On Wed, 29 Dec 2010 11:24:01 -0500 "Steven Schveighoffer" <schveiguy yahoo.com> wrote:On Wed, 29 Dec 2010 07:29:29 -0500, spir <denis.spir gmail.com> wrote: =20=20Hello, Is there a common idiom to pre-allocate a dynamic array. I mean =20 allocating to avoid numerous re-allocations in loop, not setting length=I've done some timings using reserve and Appender. Seems not to work on my = use case (decomposition of a string [actually a sequence of code points] ac= cording to NFD). (see sample code below) * use reserve (to source string's length) with builtin append (operator '~= =3D') --> 20% slower * use Appender w/o reserve --> 3 times slower * user Appender + its own reserve --> 1.5 times slower (i.e. divide above t= ime per 2) I'm surprised that reserve does not speed up builtin appending, since it ca= n only avoid numerous reallocations. How to interpret that? I'm even more surprised of Appender's results on this use case, after havin= g read about it's performance several times on the list. Strange. Can it be= due to the fact that I only append sub-sequences? (the decomposition '*p' = below is also a mini-array) Denis // NFD decomposition Code[] decomposition (Code[] codes) { /** Decompose code sequence according to form "NFD". */ //~ Code[] decompCodes; // new, decomposed; pile Appender!(Code[]) decompCodes; Decomp* p; // pointer to single-code decomposition Ordinal i0 =3D 0; // reference index past last decomposed code =20 // Decompose each code. //~ decompCodes.reserve(codes.length); // worse time! ??? //~ foreach (i,code ; codes) { //~ // case composed code //~ p =3D (code in DECOMPOSITIONS); //~ // Do not even check whether code is composite //~ // for simple ASCII & latin characters. //~ if (code > 0xBF) //~ if (p) { //~ // Append past simple codes, if any. //~ if (i > i0) decompCodes ~=3D codes[i0..i]; //~ // Append current code's decomposition. //~ decompCodes ~=3D *p; //~ // Update reference index. //~ i0 =3D i + 1; //~ } //~ } //~ // Append last simple codes (if any). //~ decompCodes ~=3D codes[i0..$]; decompCodes.reserve(codes.length); // worse time! ??? foreach (i,code ; codes) { // case composed code p =3D (code in DECOMPOSITIONS); // Do not even check whether code is composite // for simple ASCII & latin characters. if (code > 0xBF) if (p) { // Append past simple codes, if any. if (i > i0) decompCodes.put(codes[i0..i]); // Append current code's decomposition. decompCodes.put(*p); // Update reference index. i0 =3D i + 1; } } // Append last simple codes (if any). decompCodes.put(codes[i0..$]); =20 //~ return decompCodes; return decompCodes.data; } -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com& filling content.=20 int[] x; x.reserve(10000); // pre-allocate at least 10000 elements for appending =20 Another way to do it: =20 Appender!(int[]) app; app.reserve(10000); =20 this will perform much better than builtin array appending. =20 -Steve
Dec 29 2010
On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir gmail.com> wrote:I've done some timings using reserve and Appender. Seems not to work on my use case (decomposition of a string [actually a sequence of code points] according to NFD). (see sample code below) * use reserve (to source string's length) with builtin append (operator '~=') --> 20% slower * use Appender w/o reserve --> 3 times slower * user Appender + its own reserve --> 1.5 times slower (i.e. divide above time per 2)What is the baseline for this? I.e. what is it 20% slower than? FWIW, Appender should be much faster than builtin append, even without reserve. However, Appender has a recently fixed bug (not fixed in 2.051) where appending *arrays* of elements goes very slow. I see you are doing that in a couple spots.I'm surprised that reserve does not speed up builtin appending, since it can only avoid numerous reallocations. How to interpret that? I'm even more surprised of Appender's results on this use case, after having read about it's performance several times on the list. Strange. Can it be due to the fact that I only append sub-sequences? (the decomposition '*p' below is also a mini-array)It should speed up appending. If it doesn't, then it's either a bug or pilot error. As I said before, Appender in 2.051 and earlier has a bug where appending an array is very slow. But builtin appending should be faster if you reserve. Simple tests I run prove that this is true. Recent developments in how the appending array grows mitigate this quite a bit, but it certainly will result in less memory being consumed, and it always runs faster. -Steve
Dec 29 2010
On Wed, 29 Dec 2010 13:48:48 -0500 "Steven Schveighoffer" <schveiguy yahoo.com> wrote:On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir gmail.com> wrote: =20=20I've done some timings using reserve and Appender. Seems not to work on==20my use case (decomposition of a string [actually a sequence of code =20 points] according to NFD). (see sample code below) * use reserve (to source string's length) with builtin append (operator=The baseline is builtin append (~=3D) without reserving. (Sorry, thought it= was clear.)'~=3D') --> 20% slower * use Appender w/o reserve --> 3 times slower * user Appender + its own reserve --> 1.5 times slower (i.e. divide =20 above time per 2)=20 What is the baseline for this? I.e. what is it 20% slower than? FWIW, =20 Appender should be much faster than builtin append, even without reserve.However, Appender has a recently fixed bug (not fixed in 2.051) where =20 appending *arrays* of elements goes very slow. I see you are doing that ==20in a couple spots.As explained below, I'm doing _only_ that. So (as guessed), seems it may be= the reason why Appender performs so badly in my case.t =20I'm surprised that reserve does not speed up builtin appending, since i=* This question remains. ??? *can only avoid numerous reallocations. How to interpret that?=20I'm even more surprised of Appender's results on this use case, after ==20having read about it's performance several times on the list. Strange. ==20Can it be due to the fact that I only append sub-sequences? (the =20 decomposition '*p' below is also a mini-array)=20 It should speed up appending. If it doesn't, then it's either a bug or =pilot error. As I said before, Appender in 2.051 and earlier has a bug ==20where appending an array is very slow. =20 But builtin appending should be faster if you reserve.Yop!Simple tests I run prove that this is true. Recent developments in how ==20the appending array grows mitigate this quite a bit, but it certainly wil=l =20result in less memory being consumed, and it always runs faster.I can upload the modified version using reserve & Appender (with the timing= module) if you like. (The original one is at https://bitbucket.org/denispi= r/denispir-d/src, files pile.d and chrono.d.) Anyway, why reserve does not help builtin append is a mystery. In the worst= case, it should not trouble. By the way, how comes I do not need to import anything to be able to use re= serve (like if it were a builtin prop)? About Appender, I have had a look at the code in std.array and could not fi= nd any sign of why -- except that appending a slice loops over its elements= --calling put() for each one--, but this should not be _that_ slow, I gues= s. (Or should it?) Maybe a specialisation of put to take an array (or slice= ) would help? (Instead of using an array interface, as is done now.) The details of realloc are beyong my competence (e.g. the computation of re= sizing ratio), so I can hardly search further.-SteveDenis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Dec 29 2010
On Wed, 29 Dec 2010 14:49:35 -0500, spir <denis.spir gmail.com> wrote:On Wed, 29 Dec 2010 13:48:48 -0500 "Steven Schveighoffer" <schveiguy yahoo.com> wrote:Then I would be suspect of your test. In simple tests I just ran, reserving outperforms normal appending (though not by a huge amount). If it's *slower*, then there's something else wrong, as the worst case is it degenerates to non-reserved appending, since the same code paths are followed.On Wed, 29 Dec 2010 13:14:29 -0500, spir <denis.spir gmail.com> wrote:The baseline is builtin append (~=) without reserving. (Sorry, thought it was clear.)I've done some timings using reserve and Appender. Seems not to workonmy use case (decomposition of a string [actually a sequence of code points] according to NFD). (see sample code below) * use reserve (to source string's length) with builtin append(operator'~=') --> 20% slower * use Appender w/o reserve --> 3 times slower * user Appender + its own reserve --> 1.5 times slower (i.e. divide above time per 2)What is the baseline for this? I.e. what is it 20% slower than? FWIW, Appender should be much faster than builtin append, even without reserve.Yes, most likely. I mean it was like 10x slower in tests I ran. Try this version of std.array (shouldn't need to recompile phobos, it's a template): http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/array.d?rev=2238&format=rawHowever, Appender has a recently fixed bug (not fixed in 2.051) where appending *arrays* of elements goes very slow. I see you are doing that in a couple spots.As explained below, I'm doing _only_ that. So (as guessed), seems it may be the reason why Appender performs so badly in my case.I would lean towards pilot error (sorry), since simple tests show reserve does speed things up. I don't have a lot of time to look into your code in depth. If you can show a simple tests that proves otherwise, I'd be happy to look at it.* This question remains. ??? *I'm surprised that reserve does not speed up builtin appending, sinceitcan only avoid numerous reallocations. How to interpret that?I use the 'time' command to run benchmarks usually because I've been burned in the past with using custom-built timing mechanisms that have flaws in them.Simple tests I run prove that this is true. Recent developments in how the appending array grows mitigate this quite a bit, but it certainly will result in less memory being consumed, and it always runs faster.I can upload the modified version using reserve & Appender (with the timing module) if you like. (The original one is at https://bitbucket.org/denispir/denispir-d/src, files pile.d and chrono.d.)Anyway, why reserve does not help builtin append is a mystery. In the worst case, it should not trouble.I'll augment that and say "in your application", since it seems to perform expectedly in my tests.By the way, how comes I do not need to import anything to be able to use reserve (like if it were a builtin prop)?It's in object.di, always imported. I'm wondering, if those should be added to the 'properties' page of the spec.About Appender, I have had a look at the code in std.array and could not find any sign of why -- except that appending a slice loops over its elements --calling put() for each one--, but this should not be _that_ slow, I guess. (Or should it?) Maybe a specialisation of put to take an array (or slice) would help? (Instead of using an array interface, as is done now.) The details of realloc are beyong my competence (e.g. the computation of resizing ratio), so I can hardly search further.The issue is that appending was not amortized when appending arrays vs. appending individual elements (where appending *is* amortized). In order to achieve the appearance of O(1) appending, you must grow the array proportionally instead of linearly. In simple terms, you want to essentially double the capacity whenever you realloc. The actual aglorithm uses a logarithmic growth curve that settles around 1.4x. The flawed appender code was just adding enough space to accommodate the data being appended, but only if an array was appended (appending single elements was using the growth curve). Not only that, but it wasn't taking advantage of the ability to extend into consecutive pages without having to move data around. -Steve
Dec 29 2010