www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Appender is ... slow

reply "Philippe Sigaud" <philippe.sigaud gmail.com> writes:
 From time to time, I try to speed up some array-heavy code by 
using std.array.Appender, reserving some capacity and so on.

It never works. Never. It gives me executables that are maybe 
30-50% slower than bog-standard array code.

I don't do anything fancy: I just replace the types, use clear() 
instead of = null...

Do people here get good results from Appender? And if yes, how 
are you using it?
Aug 14 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?
I don't know much about Phobos appender implementation details but the key thing with reusable buffer is avoid freeing them. AFAIR Appender.clear frees the allocated memory but `Appender.length = 0` does not, making it possible to just overwrite stuff again and again. Won't promise you anything though!
Aug 14 2014
next sibling parent reply Philippe Sigaud via Digitalmars-d-learn writes:
 I don't know much about Phobos appender implementation details but the key
 thing with reusable buffer is avoid freeing them. AFAIR Appender.clear frees
 the allocated memory but `Appender.length = 0` does not, making it possible
 to just overwrite stuff again and again.
I call .clear() only at the beginning of the computation, to avoid any strange effects with std.datetime.benchmark (using benchmark with memoizing functions can lead to strange results if one does not take care to flush any result between to calls.) After that initial call to clear, I just append. The thing is, it's not the first time it happens. For years, I tried to use Appender to get faster code, to no avail. btw, I saw your Dconf talk yesterday, nice content! And thanks for talking about Pegged! It might interest you to know that the code I'm trying to use Appender on is a new engine for Pegged, based on GLL parsing, that should be able to produce a parser for any grammar, even ambiguous ones.
Aug 14 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 18:55:55 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 btw, I saw your Dconf talk yesterday, nice content! And thanks 
 for
 talking about Pegged!
 It might interest you to know that the code I'm trying to use 
 Appender
 on is a new engine for Pegged, based on GLL parsing, that 
 should be
 able to produce a parser for any grammar, even ambiguous ones.
Thanks! Repeating what I have mentioned during DConf talk - have you ever considered proposing Pegged for Phobos inclusion? It feels like important bit of infrastructure to me.
Aug 14 2014
parent Philippe Sigaud via Digitalmars-d-learn writes:
 Thanks! Repeating what I have mentioned during DConf talk - have you ever
 considered proposing Pegged for Phobos inclusion? It feels like important
 bit of infrastructure to me.
At the time, it was considered (rightfully) far too slow and memory-hogging. I think having a generic lexer and a standard D parser in Phobos would already be a great step forward.
Aug 14 2014
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Thursday, 14 August 2014 at 18:31:15 UTC, Dicebot wrote:
 I don't know much about Phobos appender implementation details 
 but the key thing with reusable buffer is avoid freeing them. 
 AFAIR Appender.clear frees the allocated memory but 
 `Appender.length = 0` does not, making it possible to just 
 overwrite stuff again and again.

 Won't promise you anything though!
Appender has no .length property, and .clear does rewind the write pointer without deallocating memory, thus allowing you to reuse the buffer.
Aug 15 2014
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?
I've never really tried to benchmark it, but it was my understanding that the idea behind Appender was to use it to create the array when you do that via a lot of appending, and then you use it as a normal array and stop using Appender. It sounds like you're trying to use it as a way to manage reusing the array, and I have no idea how it works for that. But then again, I've never actually benchmarked it for just creating arrays via appending. I'd just assumed that it was faster than just using ~=, because that's what it's supposedly for. But maybe I just completely misunderstood what the point of Appender was. - Jonathan M Davis
Aug 14 2014
next sibling parent reply Philippe Sigaud via Digitalmars-d-learn writes:
 I've never really tried to benchmark it, but it was my understanding that
 the idea behind Appender was to use it to create the array when you do that
 via a lot of appending, and then you use it as a normal array and stop using
 Appender.
That's how I use it, yes.
 It sounds like you're trying to use it as a way to manage reusing
 the array, and I have no idea how it works for that.
There is a misunderstanding there: I'm using clear only to flush the state at the beginning of the computation. The Appender is a class field, used by the class methods to calculate. If I do not clear it at the beginning of the methods, I keep appending new results to old computations, which is not what I want. But really, calling clear is a minor point: I'm interested in Appender's effect on *one* (long, concatenation-intensive) computation.
 I've
 never actually benchmarked it for just creating arrays via appending. I'd
 just assumed that it was faster than just using ~=, because that's what it's
 supposedly for. But maybe I just completely misunderstood what the point of
 Appender was.
I don't know. People here keep telling newcomers to use it, but I'm not convinced by its results. Maybe I'm seeing worse results because my arrays are do not have millions of elements and Appender shines for long arrays?
Aug 14 2014
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 19:29:28 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 It sounds like you're trying to use it as a way to manage 
 reusing
 the array, and I have no idea how it works for that.
There is a misunderstanding there: I'm using clear only to flush the state at the beginning of the computation. The Appender is a class field, used by the class methods to calculate. If I do not clear it at the beginning of the methods, I keep appending new results to old computations, which is not what I want. But really, calling clear is a minor point: I'm interested in Appender's effect on *one* (long, concatenation-intensive) computation.
Then it sounds like you're reusing the Appender. I've never done that. In fact, I would have assumed that that would mean that you were attempted to fill in the same array again, and I wouldn't have even thought that that was safe, because I would have assumed that Appnder used assumeSafeAppend, which would mean that reusing the Appender would be highly unsafe unless you weren't using the array that you got from it anymore. I always use Appender to construct an array, and then I get rid of the Appender. I don't think that I've ever had a member variable which was an Appender. I only ever use it for local variables or function arguments.
 I've
 never actually benchmarked it for just creating arrays via 
 appending. I'd
 just assumed that it was faster than just using ~=, because 
 that's what it's
 supposedly for. But maybe I just completely misunderstood what 
 the point of
 Appender was.
I don't know. People here keep telling newcomers to use it, but I'm not convinced by its results. Maybe I'm seeing worse results because my arrays are do not have millions of elements and Appender shines for long arrays?
I have no idea. It was my understandnig that it was faster to create an array via appending using Appender than ~=, but I've never benchmarked it or actually looked into how it works. It's quite possible that while it's _supposed_ to be faster, it's actually flawed somehow and is actually slower, and no one has noticed previously, simply assuming that it was faster because it's supposed to be. - Jonathan M Davis
Aug 14 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 19:29:28 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 There is a misunderstanding there: I'm using clear only to 
 flush the
 state at the beginning of the computation. The Appender is a 
 class
 field, used by the class methods to calculate. If I do not 
 clear it at
 the beginning of the methods, I keep appending new results to 
 old
 computations, which is not what I want. But really, calling 
 clear is a
 minor point: I'm interested in Appender's effect on *one* (long,
This is exactly what I propose to change - set `length` to 0 instead of calling `clear`. That way you will keep using same memory chunk simply abandoning its old state at the beginning of each computation.
Aug 14 2014
parent reply Philippe Sigaud via Digitalmars-d-learn writes:
 There is a misunderstanding there: I'm using clear only to flush the
 state at the beginning of the computation. The Appender is a class
 field, used by the class methods to calculate. If I do not clear it at
 the beginning of the methods, I keep appending new results to old
 computations, which is not what I want. But really, calling clear is a
 minor point: I'm interested in Appender's effect on *one* (long,
This is exactly what I propose to change - set `length` to 0 instead of calling `clear`. That way you will keep using same memory chunk simply abandoning its old state at the beginning of each computation.
You mean by using the shrinkTo method? (Appender does not have a length, that's a bit of a bother btw). I just did, it does not change anything. Too bad. Heck, my code is simpler to read and use *and* faster by using a bog-standard D array. I'll keep my array for now :)
Aug 14 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 20:42:08 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 You mean by using the shrinkTo method? (Appender does not have a
 length, that's a bit of a bother btw).
 I just did, it does not change anything. Too bad.

 Heck, my code is simpler to read and use *and* faster by using a
 bog-standard D array. I'll keep my array for now :)
Oh crap I had std.array.Array in mind which does have `length` exposes. And Appender seems to indeed clear / shrink data in a horrible way: https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597 https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611 Wow, this thing is real garbage!
Aug 14 2014
parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 20:50:37 UTC, Dicebot wrote:
 Oh crap I had std.array.Array in mind which does have `length` 
 exposes. And Appender seems to indeed clear / shrink data in a 
 horrible way:

 https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597
 https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611

 Wow, this thing is real garbage!
OK, looks like Appender is written to be fully GC-compatible and usable with immutable data which kind of explain such implementation. But that makes it inherently slow compared to plain `Array` which owns its memory and gets it via malloc.
Aug 14 2014
prev sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage reusing 
 the array, and I have no idea how it works for that. But then 
 again, I've never actually benchmarked it for just creating 
 arrays via appending. I'd just assumed that it was faster than 
 just using ~=, because that's what it's supposedly for. But 
 maybe I just completely misunderstood what the point of 
 Appender was.

 - Jonathan M Davis
I too have trouble understanding what Appender does that supposedly makes it faster (at least from the documentation). My old, naive thought was that it was something like a linked list of fixed size arrays so that appends didn't have to move existing elements until you were done appending, at which point it would bake it into a regular dynamic array moving each element only once looking at the code it appeared to be nothing like that (an std::deque with a copy into a vector in c++ terms). Skimming the code it appears to be more focused on the much more basic "~= always reallocates" performance problem. It seems it boils down to doing essentially this (someone feel free to correct me) in the form of an output range: auto a = /* some array */; auto b = a; a = a.array(); for(...) b.assumeSafeAppend() ~= /* element */; (assumeSafeAppend's documentation doesn't say whether or not it'll reallocate when capacity is exhausted, I assume it does).
Aug 14 2014
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson wrote:
 On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
 wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage reusing 
 the array, and I have no idea how it works for that. But then 
 again, I've never actually benchmarked it for just creating 
 arrays via appending. I'd just assumed that it was faster than 
 just using ~=, because that's what it's supposedly for. But 
 maybe I just completely misunderstood what the point of 
 Appender was.

 - Jonathan M Davis
I too have trouble understanding what Appender does that supposedly makes it faster (at least from the documentation). My old, naive thought was that it was something like a linked list of fixed size arrays so that appends didn't have to move existing elements until you were done appending, at which point it would bake it into a regular dynamic array moving each element only once looking at the code it appeared to be nothing like that (an std::deque with a copy into a vector in c++ terms). Skimming the code it appears to be more focused on the much more basic "~= always reallocates" performance problem. It seems it boils down to doing essentially this (someone feel free to correct me) in the form of an output range: auto a = /* some array */; auto b = a; a = a.array(); for(...) b.assumeSafeAppend() ~= /* element */;
It was my understanding that that was essentially what it did, but I've never really studied its code, and I don't know if there are any other tricks that it's able to pull. It may be that it really doesn't do anything more than be wrapper which handles assumeSafeAppend for you correctly. But if that's the case, I wouldn't expect operating on arrays directly to be any faster. Maybe it would be _slightly_ faster, because there are no wrapper functions to inline away, but it wouldn't be much faster, it would ensure that you used assumeSafeAppend correctly. If it's really as much slower as Phillippe is finding, then I have no idea what's going on. Certainly, it merits a bug report and further investigation.
 (assumeSafeAppend's documentation doesn't say whether or not 
 it'll reallocate when capacity is exhausted, I assume it does).
All assumeSafeAppend does is tell the runtime that this particular array is the array the farthest into the memory block (i.e. that any of the slices which referred to farther in the memory block don't exist anymore). So, the value in the runtime that keeps track of the farthest point into the memory block which has been referred to by an array is then set to the end of the array that you passed to assumeSafeAppend. And because it's now the last array in that block, it's safe to append to it without reallocating. But the appending then works the same way that it always does, and it'll reallocate when there's no more capacity. The whole idea is to just make it so that the runtime doesn't think that the memory after that array is unavailable for that array to expand into. - Jonathan M Davis
Aug 14 2014
next sibling parent reply "safety0ff" <safety0ff.dev gmail.com> writes:
IIRC it manages the capacity information manually instead of 
calling the runtime which reduces appending overhead.
Aug 14 2014
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.
That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing. - Jonathan M Davis
Aug 14 2014
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.
That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing.
It looks like what it does is essentially to set the array's length to the capacity that the GC gave it and then manage the capacity itself (so, basically what you were suggesting) and essentially avoids the runtime overhead of ~= by reimplementing ~=. Whether it does it in a more efficient manner is an open question, and it also begs the question why it would be cheaper to do it this way rather than in the GC. That's not at all obvious to me at the moment, especially because the code for ensureAddable and put in Appender are both fairly complicated. So, I really have no idea how Appender fairs in comparison to just using ~=, and I have to wonder why something similar can't be done in the runtime itself if Appender actually is faster. I'd have to spend a lot more time looking into that to figure it out. - Jonathan M Davis
Aug 14 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.
That would make some sense, though it must be completely avoiding ~= then and probably is even GC-mallocing the array itself. Regardless, I clearly need to study the code if I want to know what it's actually doing. - Jonathan M Davis
I wonder if using plain `Array` instead may be result in better performance where immutability is not needed.
Aug 14 2014
parent reply Philippe Sigaud via Digitalmars-d-learn writes:
 I wonder if using plain `Array` instead may be result in better performance
 where immutability is not needed.
Hmm, no: module appendertest; import std.array; import std.datetime; import std.stdio; import std.container; enum size = 1_000; void test1() { auto arr = appender!(int[])(); foreach(n; 0 .. size) arr.put(n); } void test1prime() { auto arr = appender!(int[])(); foreach(n; 0 .. size) arr ~= n; } void test2() { int[] arr; foreach(n; 0 .. size) arr ~= n; } void test2prime() { int[] arr; arr.reserve(size); foreach(n; 0 .. size) arr ~= n; } void test3() { Array!int arr; foreach(n; 0 .. size) arr ~= n; } void test3prime() { Array!int arr; arr.reserve(size); foreach(n; 0 .. size) arr ~= n; } void test4() { auto arr = new int[](size); foreach(n; 0 .. size) arr[n] = n; } void test5() { auto arr = uninitializedArray!(int[])(size); foreach(n; 0 .. size) arr[n] = n; } void main() { auto result = benchmark!(test1, test1prime, test2, test2prime, test3, test3prime, test4, test5 )(10_000); writeln("Appender.put :", cast(Duration)result[0]); writeln("Appender ~= :", cast(Duration)result[1]); writeln("Std array :", cast(Duration)result[2]); writeln("Std array.reserve :", cast(Duration)result[3]); writeln("Array :", cast(Duration)result[4]); writeln("Array.reserve :", cast(Duration)result[5]); writeln("new T[]() :", cast(Duration)result[6]); writeln("uninitializedArray :", cast(Duration)result[7]); } Times: Appender.put :157 ms, 602 μs, and 3 hnsecs Appender ~= :182 ms, 807 μs, and 1 hnsec Std array :256 ms, 210 μs, and 7 hnsecs Std array.reserve :244 ms, 770 μs, and 4 hnsecs Array :336 ms, 207 μs, and 3 hnsecs Array.reserve :321 ms, 500 μs, and 6 hnsecs new T[]() :28 ms, 496 μs, and 6 hnsecs uninitializedArray :26 ms and 620 μs
Aug 15 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 I wonder if using plain `Array` instead may be result in 
 better performance
 where immutability is not needed.
Hmm, no: ...
It is very different with better compiler though : $ ldmd2 -release -O a.d $ ./appendertest Appender.put :378 ms, 794 μs, and 9 hnsecs Appender ~= :378 ms, 416 μs, and 3 hnsecs Std array :2 secs, 222 ms, 256 μs, and 2 hnsecs Std array.reserve :2 secs, 199 ms, 64 μs, and 5 hnsecs Array :349 ms and 250 μs Array.reserve :347 ms, 694 μs, and 1 hnsec new T[]() :37 ms, 989 μs, and 8 hnsecs uninitializedArray :39 ms, 333 μs, and 3 hnsecs (size 10_000) I am still somewhat disappointed though because I'd expect static Array to get close in performance to pre-allocated array of exact size over many iterations - which doesn't happen. Need to investigate.
Aug 15 2014
next sibling parent reply Philippe Sigaud via Digitalmars-d-learn writes:
 It is very different with better compiler though :

 $ ldmd2 -release -O a.d
 $ ./appendertest
 Appender.put       :378 ms, 794 μs, and 9 hnsecs
 Appender ~=        :378 ms, 416 μs, and 3 hnsecs
 Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
 Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
 Array              :349 ms and 250 μs
 Array.reserve      :347 ms, 694 μs, and 1 hnsec
 new T[]()          :37 ms, 989 μs, and 8 hnsecs
 uninitializedArray :39 ms, 333 μs, and 3 hnsecs

 (size 10_000)
OK, same here, except std array gives me only 1.4 s, while the other timings are about the same. (0.14 alpha2 : ldmd2 -O -inline). Drat, that means testing on many different compilers. Oh well, let's start small: pre-allocating, better algorithms, then I'll do real speed instrumentation.
Aug 15 2014
parent reply "Philippe Sigaud" <philippe.sigaud gmail.com> writes:
Well, I created a wrapper around a std.array.uninitializedArray 
call, to manage the interface I need (queue behavior: pushing at 
the end, popping at the beginning). When hitting the end of the 
current array, it either reuse the current buffer or create a new 
one, depending of the remaining capacity.

On the 'synthetic' benchmarks, it performs quite reasonably: half 
the time of  Array or Appender (twice faster), 5x faster than 
standard array, and 3-4x slower than uninitializedArray.

And... It does not change the timings in my code, it even makes 
things slower when pre-allocating to much. Only by pre-allocating 
only a few elements do I get back the original timings.

So, I guess I'm suffering from a bad case of premature 
optimization :)

I thought that, having lots of concatenation in my code, that'd 
be a bottleneck. But it appears than pre-allocation does not give 
me any speed-up.

Well, at least it got me thinking, testing LDC a bit more and 
learning things on Array and Appender ;)

Thank for your help guys, it's now time for the -profile switch 
again...
Aug 15 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote:
 Well, I created a wrapper around a std.array.uninitializedArray 
 call, to manage the interface I need
Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course).
 I thought that, having lots of concatenation in my code, that'd 
 be a bottleneck. But it appears than pre-allocation does not 
 give me any speed-up.
If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*)
Aug 15 2014
next sibling parent reply "Philippe Sigaud" <philippe.sigaud gmail.com> writes:
On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud 
 wrote:
 Well, I created a wrapper around a 
 std.array.uninitializedArray call, to manage the interface I 
 need
Make sure you don't use that if your type has elaborate construction, or assumes a certain initial state (unless you are actually emplacing your objects of course).
Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array.
 I thought that, having lots of concatenation in my code, 
 that'd be a bottleneck. But it appears than pre-allocation 
 does not give me any speed-up.
If you are using "raw GC arrays", then the "raw" append operation will, outweigh the relocation cost on extension. So pre-allocation wouldn't really help in this situation (though the use of Appender *should*)
OK.
Aug 15 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 15 August 2014 at 16:51:20 UTC, Philippe Sigaud wrote:
 On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 Make sure you don't use that if your type has elaborate 
 construction, or assumes a certain initial state (unless you 
 are actually emplacing your objects of course).
Hmm, what's elaborate construction? They are classes and have constructors, of course. I assumed that this produced only null's in the array.
Actually, my statement was inaccurate. More specifically, never use anything that wasn't first properly initialized. Note that in some cases, "operator=" is itself elaborate, meaning it will also read data, so that's not a valid method of initialization. uninitializedArray simply creates an array with unspecified data in it. You *could* just use "new" instead. It's not really any slower, and has the advantage of being certifiably safe.
Aug 15 2014
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 If you are using "raw GC arrays", then the "raw" append 
 operation will, outweigh the relocation cost on extension. So 
 pre-allocation wouldn't really help in this situation (though 
 the use of Appender *should*)
Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage. - Jonathan M Davis
Aug 15 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 15 August 2014 at 21:24:25 UTC, Jonathan M Davis wrote:
 On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 If you are using "raw GC arrays", then the "raw" append 
 operation will, outweigh the relocation cost on extension. So 
 pre-allocation wouldn't really help in this situation (though 
 the use of Appender *should*)
Is that because it's able to grab memory from the GC without actually having to allocate anything? Normally, I would have expected allocations to far outweigh the cost on extension and that preallocating would help a lot. But that would be with malloc or C++'s new rather than the GC, which has already allocated memory to reuse after it collects garbage. - Jonathan M Davis
It's mostly just because GC-array appending is slow. A single operation is itself not that slow, but if you plan to append 10_000 elements, then the total cost will add up. A lot.
Aug 15 2014
prev sibling parent reply "Messenger" <dont shoot.me> writes:
On Friday, 15 August 2014 at 10:31:59 UTC, Dicebot wrote:
 On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via 
 Digitalmars-d-learn wrote:
 I wonder if using plain `Array` instead may be result in 
 better performance
 where immutability is not needed.
Hmm, no: ...
It is very different with better compiler though : $ ldmd2 -release -O a.d $ ./appendertest Appender.put :378 ms, 794 μs, and 9 hnsecs Appender ~= :378 ms, 416 μs, and 3 hnsecs Std array :2 secs, 222 ms, 256 μs, and 2 hnsecs Std array.reserve :2 secs, 199 ms, 64 μs, and 5 hnsecs Array :349 ms and 250 μs Array.reserve :347 ms, 694 μs, and 1 hnsec new T[]() :37 ms, 989 μs, and 8 hnsecs uninitializedArray :39 ms, 333 μs, and 3 hnsecs (size 10_000)
T[size] beats all of those on dmd head, though it is inarguably a bit limiting.
Aug 15 2014
next sibling parent reply Philippe Sigaud via Digitalmars-d-learn writes:
On Fri, Aug 15, 2014 at 1:57 PM, Messenger via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 T[size] beats all of those on dmd head, though it is inarguably a
 bit limiting.
I confirm (even with 2.065). With ldc2 it's optimized out of the way, so it gives 0 hnsecs :-) Hmm, what about a sort of linked list of static arrays, that allocates a new one when necessary?
Aug 15 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 15 August 2014 at 12:08:58 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 Hmm, what about a sort of linked list of static arrays, that 
 allocates
 a new one when necessary?
Appender is not a container, and has no freedom on the data it manipulates. It has to be able to accept an array as input, and when it is finished, it needs to be able to return an actual array, so that's arguably out of the question.
Aug 15 2014
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 15 August 2014 at 11:57:30 UTC, Messenger wrote:
 T[size] beats all of those on dmd head, though it is inarguably 
 a
 bit limiting.
Hey guys, just a bit of background and my own understanding of Appender, having worked on it a fair bit. First of all, Appender was not designed as a neck-breaking, mind-bending speed object. It is merely a "tool" to offset the "slow" GC-based appending. Indeed, when doing a raw GC-append, you first have to give the GC a pointer to the start of your array. The GC will then lookup in which block that pointer belongs, then look up the info related to that block, check if appending is possible, and then do the append proper... ...And then it will do all that all over again on the next append. Appender is simply a tool to "cache" the results of that once, and then do quicker appends. There are two other things to take into consideration with Appender: For starters, it can append to an *existing* array it is given. Second, you may destroy the Appender object at any time, and the referenced array is still valid: Appender does not *own* its buffer, and as such, is not allowed certain optimizations. Really, it's just designed for convenience and "pretty good speed". Also, another thing to take into account when benchmarking, is that Appender is a reference semantic object: It has a payload which itself references an array. This creates a double indirection. This usually doesn't have much impact, but with the right optimizations, it can probably explain the x10 performance differences we are seeing, in our *synthetic* benchmarks. I have some doubts about the validity of the results in a real application. So TL;DR; yeah, you can probably do faster. But Appender is convenient, fast enough, and works with the GC. If you *do* need super speeds, look into something a bit more manual: Walter's ScopeBuffer would be a good choice. I also did some work on something called ScopeAppender, but didn't have time to finish it yet. https://github.com/monarchdodra/phobos/compare/ScopeAppender It provides better speeds and deterministic management, at the cost of partial private buffer ownership.
Aug 15 2014
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 21:00:55 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson 
 wrote:
 On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
 wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage 
 reusing the array, and I have no idea how it works for that. 
 But then again, I've never actually benchmarked it for just 
 creating arrays via appending. I'd just assumed that it was 
 faster than just using ~=, because that's what it's 
 supposedly for. But maybe I just completely misunderstood 
 what the point of Appender was.

 - Jonathan M Davis
I too have trouble understanding what Appender does that supposedly makes it faster (at least from the documentation). My old, naive thought was that it was something like a linked list of fixed size arrays so that appends didn't have to move existing elements until you were done appending, at which point it would bake it into a regular dynamic array moving each element only once looking at the code it appeared to be nothing like that (an std::deque with a copy into a vector in c++ terms). Skimming the code it appears to be more focused on the much more basic "~= always reallocates" performance problem. It seems it boils down to doing essentially this (someone feel free to correct me) in the form of an output range: auto a = /* some array */; auto b = a; a = a.array(); for(...) b.assumeSafeAppend() ~= /* element */;
It was my understanding that that was essentially what it did, but I've never really studied its code, and I don't know if there are any other tricks that it's able to pull. It may be that it really doesn't do anything more than be wrapper which handles assumeSafeAppend for you correctly. But if that's the case, I wouldn't expect operating on arrays directly to be any faster. Maybe it would be _slightly_ faster, because there are no wrapper functions to inline away, but it wouldn't be much faster, it would ensure that you used assumeSafeAppend correctly. If it's really as much slower as Phillippe is finding, then I have no idea what's going on. Certainly, it merits a bug report and further investigation.
Okay. This makes no sense actually. You call assumeSafeAppend after you _shrink_ an array and then want to append to it, not when you're just appending to it. So, assumeSafeAppend wouldn't help with something like auto app = appender!string(); // Append a bunch of stuff to app auto str = app.data; So, it must be doing something else (though it may be using assumeSafeAppend in other functions). I'll clearly have to look over the actual code to have any clue about what it's actually doing. Though in reference to your idea of using a linked list of arrays, IIRC, someone was looking at changing it to do something like that at some point, but it would drastically change what Appender's data property would do, so I don't know if it would be a good idea to update Appender that way rather than creating a new type. But I don't recall what became of that proposal. - Jonathan M Davis
Aug 14 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d-learn writes:
On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
 Do people here get good results from Appender? And if yes, how are you using
it?
An example where it worked for me: http://braingam.es/2013/09/betweenness-centrality-in-dgraph/ (You will have to scroll down a bit to get to the point where appenders get introduced.)
Aug 14 2014
prev sibling next sibling parent Joseph Rushton Wakeling via Digitalmars-d-learn writes:
On 14/08/14 23:33, Joseph Rushton Wakeling via Digitalmars-d-learn wrote:
 An example where it worked for me:
 http://braingam.es/2013/09/betweenness-centrality-in-dgraph/
I should add that I don't think I ever explored the case of just using a regular array with assumeSafeAppend. Now that you've raised the question, I think I ought to try it :)
Aug 14 2014
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?
Quick test... ---------------- import std.array; import std.datetime; import std.stdio; enum size = 1000; void test1() { auto arr = appender!(int[])(); foreach(n; 0 .. size) arr.put(n); } void test2() { int[] arr; foreach(n; 0 .. size) arr ~= n; } void test3() { auto arr = new int[](size); foreach(n; 0 .. size) arr[n] = n; } void test4() { auto arr = uninitializedArray!(int[])(size); foreach(n; 0 .. size) arr[n] = n; } void main() { auto result = benchmark!(test1, test2, test3, test4)(10_000); writeln(cast(Duration)result[0]); writeln(cast(Duration)result[1]); writeln(cast(Duration)result[2]); writeln(cast(Duration)result[3]); } ---------------- With size being 1000, I get 178 ms, 229 μs, and 4 hnsecs 321 ms, 272 μs, and 8 hnsecs 27 ms, 138 μs, and 7 hnsecs 24 ms, 970 μs, and 9 hnsecs With size being 100,000, I get 15 secs, 731 ms, 499 μs, and 1 hnsec 29 secs, 339 ms, 553 μs, and 8 hnsecs 2 secs, 602 ms, 385 μs, and 2 hnsecs 2 secs, 409 ms, 448 μs, and 9 hnsecs So, it looks like using Appender to create an array of ints is about twice as fast as appending to the array directly, and unsurprisingly, creating the array at the correct size up front and filling in the values is far faster than either, whether the array's elements are default-initialized or not. And the numbers are about the same if it's an array of char rather than an array of int. Also, curiously, if I add a test which is the same as test2 (so it's just appending to the array) except that it calls reserve on the array with size, the result is almost the same as not reserving. It's a smidgeon faster but probably not enough to matter. So, it looks like reserve doesn't buy you much for some reason. Maybe the extra work for checking whether it needs to reallocate or whatever fancy logic it has to do in ~= dwarfs the extra cost of the reallocations? That certainly seems odd to me, but bizarrely, the evidence does seem to say that reserving doesn't help much. That should probably be looked into. In any case, from what I can see, if all you're doing is creating an array and then throwing away the Appender, it's faster to use Appender than to directly append to the array. Maybe that changes with structs? I don't know. It would be interesting to know what's different about your code that's causing Appender to be slower, but from what I can see, it's definitely faster to use Appender unless you know the target size of the array up front. - Jonathan M Davis
Aug 14 2014
parent Philippe Sigaud via Digitalmars-d-learn writes:
 Quick test...
Ah, thanks a lot Jonathan. I kept telling me I should probably test it on a simple case. OK, the good news is, Appender works in these cases (I mean, that's good news for Phobos). Now, I just have to find out why it's slower in my case :)
 ----------------
 import std.array;
 import std.datetime;
 import std.stdio;

 enum size = 1000;

 void test1()
 {
     auto arr = appender!(int[])();
     foreach(n; 0 .. size)
         arr.put(n);
 }
I used ~= to append to an Appender. The examples use .put, but ~= is documented so I considered the operator was just syntax sugar. I didn't have a look at its implementation.
 void test2()
 {
     int[] arr;
     foreach(n; 0 .. size)
         arr ~= n;
 }

 void test3()
 {
     auto arr = new int[](size);
     foreach(n; 0 .. size)
         arr[n] = n;
 }
This one is simple and interesting. In my case I don't know the length in advance (I'm doing some parsing and predicting the size of the parse forest based on the input length is somewhat tricky), but I can pre-allocate some, based on a simple heuristics.
 void test4()
 {
     auto arr = uninitializedArray!(int[])(size);
     foreach(n; 0 .. size)
         arr[n] = n;
 }
Oh, I didn't know this one.
 With size being 1000, I get

 178 ms, 229 μs, and 4 hnsecs
 321 ms, 272 μs, and 8 hnsecs
 27 ms, 138 μs, and 7 hnsecs
 24 ms, 970 μs, and 9 hnsecs
Same here, good. Using ~= n instead of .put(n) gives me results consistently slightly slower for Appender (170 ms -> 190 ms, repeatedly, even going back and forth between the two possibilities. I created a test1prime to test that.
 With size being 100,000, I get

 15 secs, 731 ms, 499 μs, and 1 hnsec
 29 secs, 339 ms, 553 μs, and 8 hnsecs
 2 secs, 602 ms, 385 μs, and 2 hnsecs
 2 secs, 409 ms, 448 μs, and 9 hnsecs
Ditto. OK, that's good. Also, it's still slower with using Appender ~= n instead of Appender.put. (18s instead of 15, 20% slower) So, kids: don't do that.
 So, it looks like using Appender to create an array of ints is about twice
 as fast as appending to the array directly, and unsurprisingly, creating the
 array at the correct size up front and filling in the values is far faster
 than either, whether the array's elements are default-initialized or not.
 And the numbers are about the same if it's an array of char rather than an
 array of int.
Perfect. That also means my go-to method will still be using standard arrays, but with pre-allocation. I just feel stupid writing that, because it's obvious that reserving things should give it better speed.
 Also, curiously, if I add a test which is the same as test2 (so it's just
 appending to the array) except that it calls reserve on the array with size,
 the result is almost the same as not reserving. It's a smidgeon faster but
 probably not enough to matter. So, it looks like reserve doesn't buy you
 much for some reason. Maybe the extra work for checking whether it needs to
 reallocate or whatever fancy logic it has to do in ~= dwarfs the extra cost
 of the reallocations? That certainly seems odd to me, but bizarrely, the
 evidence does seem to say that reserving doesn't help much. That should
 probably be looked into.
Yeah, I get a small boost of 5% compared to not reserving at size 1000, which completely disappears for longer arrays. (No different for size 100_000).
 In any case, from what I can see, if all you're doing is creating an array
 and then throwing away the Appender, it's faster to use Appender than to
 directly append to the array.
Indeed.
 Maybe that changes with structs? I don't know.
I'm using classes, because what I'm pushing into the Appender are graph nodes and I got fed up with tracing pointer to strucs everywhere. Maybe I should change back to structs and see what it does.
 It would be interesting to know what's different about your code that's
 causing Appender to be slower, but from what I can see, it's definitely
 faster to use Appender unless you know the target size of the array up
 front.
Good conclusion. Thanks for your help. My takeaway idea is that I'll use arrays, but 'new T[](size)' them before. If that makes heavy concatenation 10 times faster, it should have a positive effect (I'm not of course waiting for anything near a 10x boost in my computation time).
Aug 15 2014
prev sibling next sibling parent Philippe Sigaud via Digitalmars-d-learn writes:
On Thu, Aug 14, 2014 at 11:33 PM, Joseph Rushton Wakeling via
Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:
 On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
 Do people here get good results from Appender? And if yes, how are you
 using it?
An example where it worked for me: http://braingam.es/2013/09/betweenness-centrality-in-dgraph/ (You will have to scroll down a bit to get to the point where appenders get introduced.)
I remember reading this and loving it! Any news on this? Do you have new algorithms?
Aug 15 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?
compiler, version, OS, architecture, flags? Have you looked at the assembly to check that all the Appender method calls are being inlined?
Aug 15 2014
parent Philippe Sigaud via Digitalmars-d-learn writes:
On Fri, Aug 15, 2014 at 10:04 PM, John Colvin via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 compiler, version, OS, architecture, flags?
Compiler: DMD 2.065 and LDC 0.14 OS: Linux 64bits (8 cores, but there is no parallelism here) flags: -O -release -inline (and -noboundscheck for DMD)
 Have you looked at the assembly to check that all the Appender method calls
 are being inlined?
I do not know how to look at the assembly, neither do I know how to see if Appender method calls are being inlined. I did spend some time with -profile and gained a nice 10% increase in speed with that, fighting bottlenecks in my code.
Aug 15 2014