digitalmars.D.learn - Appender is ... slow

Philippe Sigaud (8/8) Aug 14 2014 From time to time, I try to speed up some array-heavy code by

Dicebot (8/16) Aug 14 2014 I don't know much about Phobos appender implementation details

Philippe Sigaud via Digitalmars-d-learn (12/16) Aug 14 2014 I call .clear() only at the beginning of the computation, to avoid any

Dicebot (5/13) Aug 14 2014 Thanks! Repeating what I have mentioned during DConf talk - have

Philippe Sigaud via Digitalmars-d-learn (3/6) Aug 14 2014 At the time, it was considered (rightfully) far too slow and

Vladimir Panteleev (4/10) Aug 15 2014 Appender has no .length property, and .clear does rewind the

Jonathan M Davis (13/21) Aug 14 2014 I've never really tried to benchmark it, but it was my

Philippe Sigaud via Digitalmars-d-learn (12/23) Aug 14 2014 There is a misunderstanding there: I'm using clear only to flush the

Jonathan M Davis (21/51) Aug 14 2014 Then it sounds like you're reusing the Appender. I've never done
Dicebot (6/17) Aug 14 2014 This is exactly what I propose to change - set `length` to 0

Philippe Sigaud via Digitalmars-d-learn (5/14) Aug 14 2014 You mean by using the shrinkTo method? (Appender does not have a

Dicebot (8/13) Aug 14 2014 Oh crap I had std.array.Array in mind which does have `length`

Dicebot (5/11) Aug 14 2014 OK, looks like Appender is written to be fully GC-compatible and

Brad Anderson (21/33) Aug 14 2014 I too have trouble understanding what Appender does that

Jonathan M Davis (27/62) Aug 14 2014 It was my understanding that that was essentially what it did,

safety0ff (2/2) Aug 14 2014 IIRC it manages the capacity information manually instead of

Jonathan M Davis (6/8) Aug 14 2014 That would make some sense, though it must be completely avoiding

Jonathan M Davis (16/23) Aug 14 2014 It looks like what it does is essentially to set the array's
Dicebot (4/12) Aug 14 2014 I wonder if using plain `Array` instead may be result in better

Philippe Sigaud via Digitalmars-d-learn (83/85) Aug 15 2014 Hmm, no:

Dicebot (18/23) Aug 15 2014 It is very different with better compiler though :

Philippe Sigaud via Digitalmars-d-learn (5/17) Aug 15 2014 OK, same here, except std array gives me only 1.4 s, while the other

Philippe Sigaud (20/20) Aug 15 2014 Well, I created a wrapper around a std.array.uninitializedArray

monarch_dodra (8/13) Aug 15 2014 Make sure you don't use that if your type has elaborate

Philippe Sigaud (5/20) Aug 15 2014 Hmm, what's elaborate construction? They are classes and have

monarch_dodra (9/16) Aug 15 2014 Actually, my statement was inaccurate. More specifically, never

Jonathan M Davis (8/12) Aug 15 2014 Is that because it's able to grab memory from the GC without

monarch_dodra (4/16) Aug 15 2014 It's mostly just because GC-array appending is slow. A single

Messenger (3/24) Aug 15 2014 T[size] beats all of those on dmd head, though it is inarguably a

Philippe Sigaud via Digitalmars-d-learn (6/8) Aug 15 2014 I confirm (even with 2.065). With ldc2 it's optimized out of the way,

monarch_dodra (6/9) Aug 15 2014 Appender is not a container, and has no freedom on the data it

monarch_dodra (39/42) Aug 15 2014 Hey guys, just a bit of background and my own understanding of

Jonathan M Davis (20/70) Aug 14 2014 Okay. This makes no sense actually. You call assumeSafeAppend

Joseph Rushton Wakeling via Digitalmars-d-learn (5/6) Aug 14 2014 An example where it worked for me:
Joseph Rushton Wakeling via Digitalmars-d-learn (4/6) Aug 14 2014 I should add that I don't think I ever explored the case of just using a...
Jonathan M Davis (76/84) Aug 14 2014 Quick test...

Philippe Sigaud via Digitalmars-d-learn (38/101) Aug 15 2014 Ah, thanks a lot Jonathan. I kept telling me I should probably test it

Philippe Sigaud via Digitalmars-d-learn (4/12) Aug 15 2014 I remember reading this and loving it! Any news on this? Do you have
John Colvin (5/13) Aug 15 2014 compiler, version, OS, architecture, flags?

Philippe Sigaud via Digitalmars-d-learn (9/12) Aug 15 2014 Compiler: DMD 2.065 and LDC 0.14

"Philippe Sigaud" <philippe.sigaud gmail.com> writes:

 From time to time, I try to speed up some array-heavy code by 
using std.array.Appender, reserving some capacity and so on.

It never works. Never. It gives me executables that are maybe 
30-50% slower than bog-standard array code.

I don't do anything fancy: I just replace the types, use clear() 
instead of = null...

Do people here get good results from Appender? And if yes, how 
are you using it?

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?

I don't know much about Phobos appender implementation details 
but the key thing with reusable buffer is avoid freeing them. 
AFAIR Appender.clear frees the allocated memory but 
`Appender.length = 0` does not, making it possible to just 
overwrite stuff again and again.

Won't promise you anything though!

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 I don't know much about Phobos appender implementation details but the key
 thing with reusable buffer is avoid freeing them. AFAIR Appender.clear frees
 the allocated memory but `Appender.length = 0` does not, making it possible
 to just overwrite stuff again and again.

I call .clear() only at the beginning of the computation, to avoid any
strange effects with std.datetime.benchmark (using benchmark with
memoizing functions can lead to strange results if one does not take
care to flush any result between to calls.)
After that initial call to clear, I just append.

The thing is, it's not the first time it happens. For years, I tried
to use Appender to get faster code, to no avail.


btw, I saw your Dconf talk yesterday, nice content! And thanks for
talking about Pegged!
It might interest you to know that the code I'm trying to use Appender
on is a new engine for Pegged, based on GLL parsing, that should be
able to produce a parser for any grammar, even ambiguous ones.

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 18:55:55 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 btw, I saw your Dconf talk yesterday, nice content! And thanks 
 for
 talking about Pegged!
 It might interest you to know that the code I'm trying to use 
 Appender
 on is a new engine for Pegged, based on GLL parsing, that 
 should be
 able to produce a parser for any grammar, even ambiguous ones.

Thanks! Repeating what I have mentioned during DConf talk - have 
you ever considered proposing Pegged for Phobos inclusion? It 
feels like important bit of infrastructure to me.

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 Thanks! Repeating what I have mentioned during DConf talk - have you ever
 considered proposing Pegged for Phobos inclusion? It feels like important
 bit of infrastructure to me.

At the time, it was considered (rightfully) far too slow and
memory-hogging. I think having a generic lexer and a standard D parser
in Phobos would already be a great step forward.

Aug 14 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Thursday, 14 August 2014 at 18:31:15 UTC, Dicebot wrote:
 I don't know much about Phobos appender implementation details 
 but the key thing with reusable buffer is avoid freeing them. 
 AFAIR Appender.clear frees the allocated memory but 
 `Appender.length = 0` does not, making it possible to just 
 overwrite stuff again and again.

 Won't promise you anything though!

Appender has no .length property, and .clear does rewind the 
write pointer without deallocating memory, thus allowing you to 
reuse the buffer.

Aug 15 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?

I've never really tried to benchmark it, but it was my 
understanding that the idea behind Appender was to use it to 
create the array when you do that via a lot of appending, and 
then you use it as a normal array and stop using Appender. It 
sounds like you're trying to use it as a way to manage reusing 
the array, and I have no idea how it works for that. But then 
again, I've never actually benchmarked it for just creating 
arrays via appending. I'd just assumed that it was faster than 
just using ~=, because that's what it's supposedly for. But maybe 
I just completely misunderstood what the point of Appender was.

- Jonathan M Davis

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 I've never really tried to benchmark it, but it was my understanding that
 the idea behind Appender was to use it to create the array when you do that
 via a lot of appending, and then you use it as a normal array and stop using
 Appender.

That's how I use it, yes.

 It sounds like you're trying to use it as a way to manage reusing
 the array, and I have no idea how it works for that.

There is a misunderstanding there: I'm using clear only to flush the
state at the beginning of the computation. The Appender is a class
field, used by the class methods to calculate. If I do not clear it at
the beginning of the methods, I keep appending new results to old
computations, which is not what I want. But really, calling clear is a
minor point: I'm interested in Appender's effect on *one* (long,
concatenation-intensive) computation.

 I've
 never actually benchmarked it for just creating arrays via appending. I'd
 just assumed that it was faster than just using ~=, because that's what it's
 supposedly for. But maybe I just completely misunderstood what the point of
 Appender was.

I don't know. People here keep telling newcomers to use it, but I'm
not convinced by its results. Maybe I'm seeing worse results because
my arrays are do not have millions of elements and Appender shines for
long arrays?

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 19:29:28 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 It sounds like you're trying to use it as a way to manage 
 reusing
 the array, and I have no idea how it works for that.

 There is a misunderstanding there: I'm using clear only to 
 flush the
 state at the beginning of the computation. The Appender is a 
 class
 field, used by the class methods to calculate. If I do not 
 clear it at
 the beginning of the methods, I keep appending new results to 
 old
 computations, which is not what I want. But really, calling 
 clear is a
 minor point: I'm interested in Appender's effect on *one* (long,
 concatenation-intensive) computation.

Then it sounds like you're reusing the Appender. I've never done 
that. In fact, I would have assumed that that would mean that you 
were attempted to fill in the same array again, and I wouldn't 
have even thought that that was safe, because I would have 
assumed that Appnder used assumeSafeAppend, which would mean that 
reusing the Appender would be highly unsafe unless you weren't 
using the array that you got from it anymore.

I always use Appender to construct an array, and then I get rid 
of the Appender. I don't think that I've ever had a member 
variable which was an Appender. I only ever use it for local 
variables or function arguments.

 I've
 never actually benchmarked it for just creating arrays via 
 appending. I'd
 just assumed that it was faster than just using ~=, because 
 that's what it's
 supposedly for. But maybe I just completely misunderstood what 
 the point of
 Appender was.

 I don't know. People here keep telling newcomers to use it, but 
 I'm
 not convinced by its results. Maybe I'm seeing worse results 
 because
 my arrays are do not have millions of elements and Appender 
 shines for
 long arrays?

I have no idea. It was my understandnig that it was faster to 
create an array via appending using Appender than ~=, but I've 
never benchmarked it or actually looked into how it works. It's 
quite possible that while it's _supposed_ to be faster, it's 
actually flawed somehow and is actually slower, and no one has 
noticed previously, simply assuming that it was faster because 
it's supposed to be.

- Jonathan M Davis

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 19:29:28 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 There is a misunderstanding there: I'm using clear only to 
 flush the
 state at the beginning of the computation. The Appender is a 
 class
 field, used by the class methods to calculate. If I do not 
 clear it at
 the beginning of the methods, I keep appending new results to 
 old
 computations, which is not what I want. But really, calling 
 clear is a
 minor point: I'm interested in Appender's effect on *one* (long,

This is exactly what I propose to change - set `length` to 0 
instead of calling `clear`. That way you will keep using same 
memory chunk simply abandoning its old state at the beginning of 
each computation.

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 There is a misunderstanding there: I'm using clear only to flush the
 state at the beginning of the computation. The Appender is a class
 field, used by the class methods to calculate. If I do not clear it at
 the beginning of the methods, I keep appending new results to old
 computations, which is not what I want. But really, calling clear is a
 minor point: I'm interested in Appender's effect on *one* (long,


 This is exactly what I propose to change - set `length` to 0 instead of
 calling `clear`. That way you will keep using same memory chunk simply
 abandoning its old state at the beginning of each computation.

You mean by using the shrinkTo method? (Appender does not have a
length, that's a bit of a bother btw).
I just did, it does not change anything. Too bad.

Heck, my code is simpler to read and use *and* faster by using a
bog-standard D array. I'll keep my array for now :)

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 20:42:08 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 You mean by using the shrinkTo method? (Appender does not have a
 length, that's a bit of a bother btw).
 I just did, it does not change anything. Too bad.

 Heck, my code is simpler to read and use *and* faster by using a
 bog-standard D array. I'll keep my array for now :)

Oh crap I had std.array.Array in mind which does have `length` 
exposes. And Appender seems to indeed clear / shrink data in a 
horrible way:

https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597
https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611

Wow, this thing is real garbage!

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 20:50:37 UTC, Dicebot wrote:
 Oh crap I had std.array.Array in mind which does have `length` 
 exposes. And Appender seems to indeed clear / shrink data in a 
 horrible way:

 https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2597
 https://github.com/D-Programming-Language/phobos/blob/master/std/array.d#L2611

 Wow, this thing is real garbage!

OK, looks like Appender is written to be fully GC-compatible and 
usable with immutable data which kind of explain such 
implementation. But that makes it inherently slow compared to 
plain `Array` which owns its memory and gets it via malloc.

Aug 14 2014

"Brad Anderson" <eco gnuk.net> writes:

On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage reusing 
 the array, and I have no idea how it works for that. But then 
 again, I've never actually benchmarked it for just creating 
 arrays via appending. I'd just assumed that it was faster than 
 just using ~=, because that's what it's supposedly for. But 
 maybe I just completely misunderstood what the point of 
 Appender was.

 - Jonathan M Davis

I too have trouble understanding what Appender does that 
supposedly makes it faster (at least from the documentation). My 
old, naive thought was that it was something like a linked list 
of fixed size arrays so that appends didn't have to move existing 
elements until you were done appending, at which point it would 
bake it into a regular dynamic array moving each element only 
once looking at the code it appeared to be nothing like that (an 
std::deque with a copy into a vector in c++ terms).

Skimming the code it appears to be more focused on the much more 
basic "~= always reallocates" performance problem. It seems it 
boils down to doing essentially this (someone feel free to 
correct me) in the form of an output range:

auto a = /* some array */;
auto b = a;
a = a.array();
for(...)
   b.assumeSafeAppend() ~= /* element */;


(assumeSafeAppend's documentation doesn't say whether or not 
it'll reallocate when capacity is exhausted, I assume it does).

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson wrote:
 On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
 wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage reusing 
 the array, and I have no idea how it works for that. But then 
 again, I've never actually benchmarked it for just creating 
 arrays via appending. I'd just assumed that it was faster than 
 just using ~=, because that's what it's supposedly for. But 
 maybe I just completely misunderstood what the point of 
 Appender was.

 - Jonathan M Davis

 I too have trouble understanding what Appender does that 
 supposedly makes it faster (at least from the documentation). 
 My old, naive thought was that it was something like a linked 
 list of fixed size arrays so that appends didn't have to move 
 existing elements until you were done appending, at which point 
 it would bake it into a regular dynamic array moving each 
 element only once looking at the code it appeared to be nothing 
 like that (an std::deque with a copy into a vector in c++ 
 terms).

 Skimming the code it appears to be more focused on the much 
 more basic "~= always reallocates" performance problem. It 
 seems it boils down to doing essentially this (someone feel 
 free to correct me) in the form of an output range:

 auto a = /* some array */;
 auto b = a;
 a = a.array();
 for(...)
   b.assumeSafeAppend() ~= /* element */;

It was my understanding that that was essentially what it did, 
but I've never really studied its code, and I don't know if there 
are any other tricks that it's able to pull. It may be that it 
really doesn't do anything more than be  wrapper which handles 
assumeSafeAppend for you correctly. But if that's the case, I 
wouldn't expect operating on arrays directly to be any faster. 
Maybe it would be _slightly_ faster, because there are no wrapper 
functions to inline away, but it wouldn't be much faster, it 
would ensure that you used assumeSafeAppend correctly. If it's 
really as much slower as Phillippe is finding, then I have no 
idea what's going on. Certainly, it merits a bug report and 
further investigation.

 (assumeSafeAppend's documentation doesn't say whether or not 
 it'll reallocate when capacity is exhausted, I assume it does).

All assumeSafeAppend does is tell the runtime that this 
particular array is the array the farthest into the memory block 
(i.e. that any of the slices which referred to farther in the 
memory block don't exist anymore). So, the value in the runtime 
that keeps track of the farthest point into the memory block 
which has been referred to by an array is then set to the end of 
the array that you passed to assumeSafeAppend. And because it's 
now the last array in that block, it's safe to append to it 
without reallocating. But the appending then works the same way 
that it always does, and it'll reallocate when there's no more 
capacity. The whole idea is to just make it so that the runtime 
doesn't think that the memory after that array is unavailable for 
that array to expand into.

- Jonathan M Davis

Aug 14 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

IIRC it manages the capacity information manually instead of 
calling the runtime which reduces appending overhead.

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.

That would make some sense, though it must be completely avoiding 
~= then and probably is even GC-mallocing the array itself. 
Regardless, I clearly need to study the code if I want to know 
what it's actually doing.

- Jonathan M Davis

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.

 That would make some sense, though it must be completely 
 avoiding ~= then and probably is even GC-mallocing the array 
 itself. Regardless, I clearly need to study the code if I want 
 to know what it's actually doing.

It looks like what it does is essentially to set the array's 
length to the capacity that the GC gave it and then manage the 
capacity itself (so, basically what you were suggesting) and 
essentially avoids the runtime overhead of ~= by reimplementing 
~=. Whether it does it in a more efficient manner is an open 
question, and it also begs the question why it would be cheaper 
to do it this way rather than in the GC. That's not at all 
obvious to me at the moment, especially because the code for 
ensureAddable and put in Appender are both fairly complicated.

So, I really have no idea how Appender fairs in comparison to 
just using ~=, and I have to wonder why something similar can't 
be done in the runtime itself if Appender actually is faster. I'd 
have to spend a lot more time looking into that to figure it out.

- Jonathan M Davis

Aug 14 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 14 August 2014 at 21:34:04 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 21:11:51 UTC, safety0ff wrote:
 IIRC it manages the capacity information manually instead of 
 calling the runtime which reduces appending overhead.

 That would make some sense, though it must be completely 
 avoiding ~= then and probably is even GC-mallocing the array 
 itself. Regardless, I clearly need to study the code if I want 
 to know what it's actually doing.

 - Jonathan M Davis

I wonder if using plain `Array` instead may be result in better 
performance where immutability is not needed.

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 I wonder if using plain `Array` instead may be result in better performance
 where immutability is not needed.

Hmm, no:

module appendertest;

import std.array;
import std.datetime;
import std.stdio;
import std.container;

enum size = 1_000;

void test1()
{
    auto arr = appender!(int[])();
    foreach(n; 0 .. size)
        arr.put(n);
}

void test1prime()
{
    auto arr = appender!(int[])();
    foreach(n; 0 .. size)
        arr ~= n;
}

void test2()
{
    int[] arr;
    foreach(n; 0 .. size)
        arr ~= n;
}

void test2prime()
{
    int[] arr;
    arr.reserve(size);
    foreach(n; 0 .. size)
        arr ~= n;
}

void test3()
{
    Array!int arr;
    foreach(n; 0 .. size)
        arr ~= n;
}

void test3prime()
{
    Array!int arr;
    arr.reserve(size);
    foreach(n; 0 .. size)
        arr ~= n;
}

void test4()
{
    auto arr = new int[](size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void test5()
{
    auto arr = uninitializedArray!(int[])(size);
    foreach(n; 0 .. size)
        arr[n] = n;
}

void main()
{
    auto result = benchmark!(test1, test1prime,
                             test2, test2prime,
                             test3, test3prime,
                             test4,
                             test5
                            )(10_000);

    writeln("Appender.put       :", cast(Duration)result[0]);
    writeln("Appender ~=        :", cast(Duration)result[1]);
    writeln("Std array          :", cast(Duration)result[2]);
    writeln("Std array.reserve  :", cast(Duration)result[3]);
    writeln("Array              :", cast(Duration)result[4]);
    writeln("Array.reserve      :", cast(Duration)result[5]);
    writeln("new T[]()          :", cast(Duration)result[6]);
    writeln("uninitializedArray :", cast(Duration)result[7]);
}

Times:

Appender.put       :157 ms, 602 μs, and 3 hnsecs
Appender ~=        :182 ms, 807 μs, and 1 hnsec
Std array          :256 ms, 210 μs, and 7 hnsecs
Std array.reserve  :244 ms, 770 μs, and 4 hnsecs
Array              :336 ms, 207 μs, and 3 hnsecs
Array.reserve      :321 ms, 500 μs, and 6 hnsecs
new T[]()          :28 ms, 496 μs, and 6 hnsecs
uninitializedArray :26 ms and 620 μs

Aug 15 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 I wonder if using plain `Array` instead may be result in 
 better performance
 where immutability is not needed.

 Hmm, no:

 ...

It is very different with better compiler though :

$ ldmd2 -release -O a.d
$ ./appendertest
Appender.put       :378 ms, 794 μs, and 9 hnsecs
Appender ~=        :378 ms, 416 μs, and 3 hnsecs
Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
Array              :349 ms and 250 μs
Array.reserve      :347 ms, 694 μs, and 1 hnsec
new T[]()          :37 ms, 989 μs, and 8 hnsecs
uninitializedArray :39 ms, 333 μs, and 3 hnsecs

(size 10_000)

I am still somewhat disappointed though because I'd expect static 
Array to get close in performance to pre-allocated array of exact 
size over many iterations - which doesn't happen. Need to 
investigate.

Aug 15 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 It is very different with better compiler though :

 $ ldmd2 -release -O a.d
 $ ./appendertest
 Appender.put       :378 ms, 794 μs, and 9 hnsecs
 Appender ~=        :378 ms, 416 μs, and 3 hnsecs
 Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
 Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
 Array              :349 ms and 250 μs
 Array.reserve      :347 ms, 694 μs, and 1 hnsec
 new T[]()          :37 ms, 989 μs, and 8 hnsecs
 uninitializedArray :39 ms, 333 μs, and 3 hnsecs

 (size 10_000)

OK, same here, except std array gives me only 1.4 s, while the other
timings are about the same. (0.14 alpha2 : ldmd2 -O -inline).
Drat, that means testing on many different compilers. Oh well, let's
start small: pre-allocating, better algorithms, then I'll do real
speed instrumentation.

Aug 15 2014

"Philippe Sigaud" <philippe.sigaud gmail.com> writes:

Well, I created a wrapper around a std.array.uninitializedArray 
call, to manage the interface I need (queue behavior: pushing at 
the end, popping at the beginning). When hitting the end of the 
current array, it either reuse the current buffer or create a new 
one, depending of the remaining capacity.

On the 'synthetic' benchmarks, it performs quite reasonably: half 
the time of  Array or Appender (twice faster), 5x faster than 
standard array, and 3-4x slower than uninitializedArray.

And... It does not change the timings in my code, it even makes 
things slower when pre-allocating to much. Only by pre-allocating 
only a few elements do I get back the original timings.

So, I guess I'm suffering from a bad case of premature 
optimization :)

I thought that, having lots of concatenation in my code, that'd 
be a bottleneck. But it appears than pre-allocation does not give 
me any speed-up.

Well, at least it got me thinking, testing LDC a bit more and 
learning things on Array and Appender ;)

Thank for your help guys, it's now time for the -profile switch 
again...

Aug 15 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud wrote:
 Well, I created a wrapper around a std.array.uninitializedArray 
 call, to manage the interface I need

Make sure you don't use that if your type has elaborate 
construction, or assumes a certain initial state (unless you are 
actually emplacing your objects of course).

 I thought that, having lots of concatenation in my code, that'd 
 be a bottleneck. But it appears than pre-allocation does not 
 give me any speed-up.

If you are using "raw GC arrays", then the "raw" append operation 
will, outweigh the relocation cost on extension. So 
pre-allocation wouldn't really help in this situation (though the 
use of Appender *should*)

Aug 15 2014

"Philippe Sigaud" <philippe.sigaud gmail.com> writes:

On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 On Friday, 15 August 2014 at 14:40:36 UTC, Philippe Sigaud 
 wrote:
 Well, I created a wrapper around a 
 std.array.uninitializedArray call, to manage the interface I 
 need

 Make sure you don't use that if your type has elaborate 
 construction, or assumes a certain initial state (unless you 
 are actually emplacing your objects of course).

Hmm, what's elaborate construction? They are classes and have 
constructors, of course. I assumed that this produced only null's 
in the array.


 I thought that, having lots of concatenation in my code, 
 that'd be a bottleneck. But it appears than pre-allocation 
 does not give me any speed-up.

 If you are using "raw GC arrays", then the "raw" append 
 operation will, outweigh the relocation cost on extension. So 
 pre-allocation wouldn't really help in this situation (though 
 the use of Appender *should*)


OK.

Aug 15 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 15 August 2014 at 16:51:20 UTC, Philippe Sigaud wrote:
 On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 Make sure you don't use that if your type has elaborate 
 construction, or assumes a certain initial state (unless you 
 are actually emplacing your objects of course).

 Hmm, what's elaborate construction? They are classes and have 
 constructors, of course. I assumed that this produced only 
 null's in the array.

Actually, my statement was inaccurate. More specifically, never 
use anything that wasn't first properly initialized. Note that in 
some cases, "operator=" is itself elaborate, meaning it will also 
read data, so that's not a valid method of initialization.

uninitializedArray simply creates an array with unspecified data 
in it.

You *could* just use "new" instead. It's not really any slower, 
and has the advantage of being certifiably safe.

Aug 15 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 If you are using "raw GC arrays", then the "raw" append 
 operation will, outweigh the relocation cost on extension. So 
 pre-allocation wouldn't really help in this situation (though 
 the use of Appender *should*)

Is that because it's able to grab memory from the GC without 
actually having to allocate anything? Normally, I would have 
expected allocations to far outweigh the cost on extension and 
that preallocating would help a lot. But that would be with 
malloc or C++'s new rather than the GC, which has already 
allocated memory to reuse after it collects garbage.

- Jonathan M Davis

Aug 15 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 15 August 2014 at 21:24:25 UTC, Jonathan M Davis wrote:
 On Friday, 15 August 2014 at 16:48:10 UTC, monarch_dodra wrote:
 If you are using "raw GC arrays", then the "raw" append 
 operation will, outweigh the relocation cost on extension. So 
 pre-allocation wouldn't really help in this situation (though 
 the use of Appender *should*)

 Is that because it's able to grab memory from the GC without 
 actually having to allocate anything? Normally, I would have 
 expected allocations to far outweigh the cost on extension and 
 that preallocating would help a lot. But that would be with 
 malloc or C++'s new rather than the GC, which has already 
 allocated memory to reuse after it collects garbage.

 - Jonathan M Davis

It's mostly just because GC-array appending is slow. A single 
operation is itself not that slow, but if you plan to append 
10_000 elements, then the total cost will add up. A lot.

Aug 15 2014

"Messenger" <dont shoot.me> writes:

On Friday, 15 August 2014 at 10:31:59 UTC, Dicebot wrote:
 On Friday, 15 August 2014 at 08:35:41 UTC, Philippe Sigaud via 
 Digitalmars-d-learn wrote:
 I wonder if using plain `Array` instead may be result in 
 better performance
 where immutability is not needed.

 Hmm, no:

 ...

 It is very different with better compiler though :

 $ ldmd2 -release -O a.d
 $ ./appendertest
 Appender.put       :378 ms, 794 μs, and 9 hnsecs
 Appender ~=        :378 ms, 416 μs, and 3 hnsecs
 Std array          :2 secs, 222 ms, 256 μs, and 2 hnsecs
 Std array.reserve  :2 secs, 199 ms, 64 μs, and 5 hnsecs
 Array              :349 ms and 250 μs
 Array.reserve      :347 ms, 694 μs, and 1 hnsec
 new T[]()          :37 ms, 989 μs, and 8 hnsecs
 uninitializedArray :39 ms, 333 μs, and 3 hnsecs

 (size 10_000)

T[size] beats all of those on dmd head, though it is inarguably a
bit limiting.

Aug 15 2014

Philippe Sigaud via Digitalmars-d-learn writes:

On Fri, Aug 15, 2014 at 1:57 PM, Messenger via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 T[size] beats all of those on dmd head, though it is inarguably a
 bit limiting.

I confirm (even with 2.065). With ldc2 it's optimized out of the way,
so it gives 0 hnsecs :-)
Hmm, what about a sort of linked list of static arrays, that allocates
a new one when necessary?

Aug 15 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 15 August 2014 at 12:08:58 UTC, Philippe Sigaud via 
Digitalmars-d-learn wrote:
 Hmm, what about a sort of linked list of static arrays, that 
 allocates
 a new one when necessary?

Appender is not a container, and has no freedom on the data it 
manipulates. It has to be able to accept an array as input, and 
when it is finished, it needs to be able to return an actual 
array, so that's arguably out of the question.

Aug 15 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 15 August 2014 at 11:57:30 UTC, Messenger wrote:
 T[size] beats all of those on dmd head, though it is inarguably 
 a
 bit limiting.

Hey guys, just a bit of background and my own understanding of
Appender, having worked on it a fair bit.

First of all, Appender was not designed as a neck-breaking,
mind-bending speed object. It is merely a "tool" to offset the
"slow" GC-based appending.

Indeed, when doing a raw GC-append, you first have to give the GC
a pointer to the start of your array. The GC will then lookup in
which block that pointer belongs, then look up the info related
to that block, check if appending is possible, and then do the
append proper...
...And then it will do all that all over again on the next append.

Appender is simply a tool to "cache" the results of that once,
and then do quicker appends.

There are two other things to take into consideration with
Appender: For starters, it can append to an *existing* array it
is given. Second, you may destroy the Appender object at any
time, and the referenced array is still valid: Appender does not
*own* its buffer, and as such, is not allowed certain
optimizations.

Really, it's just designed for convenience and "pretty good
speed".

Also, another thing to take into account when benchmarking, is
that Appender is a reference semantic object: It has a payload
which itself references an array. This creates a double
indirection. This usually doesn't have much impact, but with the
right optimizations, it can probably explain the x10 performance
differences we are seeing, in our *synthetic* benchmarks. I have
some doubts about the validity of the results in a real
application.

So TL;DR; yeah, you can probably do faster. But Appender is
convenient, fast enough, and works with the GC.

If you *do* need super speeds, look into something a bit more
manual: Walter's ScopeBuffer would be a good choice.

I also did some work on something called ScopeAppender, but
didn't have time to finish it yet.
https://github.com/monarchdodra/phobos/compare/ScopeAppender
It provides better speeds and deterministic management, at the
cost of partial private buffer ownership.

Aug 15 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 21:00:55 UTC, Jonathan M Davis 
wrote:
 On Thursday, 14 August 2014 at 19:47:33 UTC, Brad Anderson 
 wrote:
 On Thursday, 14 August 2014 at 19:10:18 UTC, Jonathan M Davis 
 wrote:
 I've never really tried to benchmark it, but it was my 
 understanding that the idea behind Appender was to use it to 
 create the array when you do that via a lot of appending, and 
 then you use it as a normal array and stop using Appender. It 
 sounds like you're trying to use it as a way to manage 
 reusing the array, and I have no idea how it works for that. 
 But then again, I've never actually benchmarked it for just 
 creating arrays via appending. I'd just assumed that it was 
 faster than just using ~=, because that's what it's 
 supposedly for. But maybe I just completely misunderstood 
 what the point of Appender was.

 - Jonathan M Davis

 I too have trouble understanding what Appender does that 
 supposedly makes it faster (at least from the documentation). 
 My old, naive thought was that it was something like a linked 
 list of fixed size arrays so that appends didn't have to move 
 existing elements until you were done appending, at which 
 point it would bake it into a regular dynamic array moving 
 each element only once looking at the code it appeared to be 
 nothing like that (an std::deque with a copy into a vector in 
 c++ terms).

 Skimming the code it appears to be more focused on the much 
 more basic "~= always reallocates" performance problem. It 
 seems it boils down to doing essentially this (someone feel 
 free to correct me) in the form of an output range:

 auto a = /* some array */;
 auto b = a;
 a = a.array();
 for(...)
  b.assumeSafeAppend() ~= /* element */;

 It was my understanding that that was essentially what it did, 
 but I've never really studied its code, and I don't know if 
 there are any other tricks that it's able to pull. It may be 
 that it really doesn't do anything more than be  wrapper which 
 handles assumeSafeAppend for you correctly. But if that's the 
 case, I wouldn't expect operating on arrays directly to be any 
 faster. Maybe it would be _slightly_ faster, because there are 
 no wrapper functions to inline away, but it wouldn't be much 
 faster, it would ensure that you used assumeSafeAppend 
 correctly. If it's really as much slower as Phillippe is 
 finding, then I have no idea what's going on. Certainly, it 
 merits a bug report and further investigation.

Okay. This makes no sense actually. You call assumeSafeAppend 
after you _shrink_ an array and then want to append to it, not 
when you're just appending to it. So, assumeSafeAppend wouldn't 
help with something like

auto app = appender!string();
// Append a bunch of stuff to app
auto str = app.data;

So, it must be doing something else (though it may be using 
assumeSafeAppend in other functions). I'll clearly have to look 
over the actual code to have any clue about what it's actually 
doing.

Though in reference to your idea of using a linked list of 
arrays, IIRC, someone was looking at changing it to do something 
like that at some point, but it would drastically change what 
Appender's data property would do, so I don't know if it would be 
a good idea to update Appender that way rather than creating a 
new type. But I don't recall what became of that proposal.

- Jonathan M Davis

Aug 14 2014

Joseph Rushton Wakeling via Digitalmars-d-learn writes:

On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
 Do people here get good results from Appender? And if yes, how are you using
it?

An example where it worked for me:
http://braingam.es/2013/09/betweenness-centrality-in-dgraph/

(You will have to scroll down a bit to get to the point where appenders get 
introduced.)

Aug 14 2014

Joseph Rushton Wakeling via Digitalmars-d-learn writes:

On 14/08/14 23:33, Joseph Rushton Wakeling via Digitalmars-d-learn wrote:
 An example where it worked for me:
 http://braingam.es/2013/09/betweenness-centrality-in-dgraph/

I should add that I don't think I ever explored the case of just using a
regular 
array with assumeSafeAppend.  Now that you've raised the question, I think I 
ought to try it :)

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?

Quick test...

----------------
import std.array;
import std.datetime;
import std.stdio;

enum size = 1000;

void test1()
{
     auto arr = appender!(int[])();
     foreach(n; 0 .. size)
         arr.put(n);
}

void test2()
{
     int[] arr;
     foreach(n; 0 .. size)
         arr ~= n;
}

void test3()
{
     auto arr = new int[](size);
     foreach(n; 0 .. size)
         arr[n] = n;
}

void test4()
{
     auto arr = uninitializedArray!(int[])(size);
     foreach(n; 0 .. size)
         arr[n] = n;
}

void main()
{
     auto result = benchmark!(test1, test2, test3, test4)(10_000);
     writeln(cast(Duration)result[0]);
     writeln(cast(Duration)result[1]);
     writeln(cast(Duration)result[2]);
     writeln(cast(Duration)result[3]);
}

----------------

With size being 1000, I get

178 ms, 229 μs, and 4 hnsecs
321 ms, 272 μs, and 8 hnsecs
27 ms, 138 μs, and 7 hnsecs
24 ms, 970 μs, and 9 hnsecs

With size being 100,000, I get

15 secs, 731 ms, 499 μs, and 1 hnsec
29 secs, 339 ms, 553 μs, and 8 hnsecs
2 secs, 602 ms, 385 μs, and 2 hnsecs
2 secs, 409 ms, 448 μs, and 9 hnsecs

So, it looks like using Appender to create an array of ints is 
about twice as fast as appending to the array directly, and 
unsurprisingly, creating the array at the correct size up front 
and filling in the values is far faster than either, whether the 
array's elements are default-initialized or not. And the numbers 
are about the same if it's an array of char rather than an array 
of int.

Also, curiously, if I add a test which is the same as test2 (so 
it's just appending to the array) except that it calls reserve on 
the array with size, the result is almost the same as not 
reserving. It's a smidgeon faster but probably not enough to 
matter. So, it looks like reserve doesn't buy you much for some 
reason. Maybe the extra work for checking whether it needs to 
reallocate or whatever fancy logic it has to do in ~= dwarfs the 
extra cost of the reallocations? That certainly seems odd to me, 
but bizarrely, the evidence does seem to say that reserving 
doesn't help much. That should probably be looked into.

In any case, from what I can see, if all you're doing is creating 
an array and then throwing away the Appender, it's faster to use 
Appender than to directly append to the array. Maybe that changes 
with structs? I don't know. It would be interesting to know 
what's different about your code that's causing Appender to be 
slower, but from what I can see, it's definitely faster to use 
Appender unless you know the target size of the array up front.

- Jonathan M Davis

Aug 14 2014

Philippe Sigaud via Digitalmars-d-learn writes:

 Quick test...

Ah, thanks a lot Jonathan. I kept telling me I should probably test it
on a simple case.
OK, the good news is, Appender works in these cases (I mean, that's
good news for Phobos).
Now, I just have to find out why it's slower in my case :)



 ----------------
 import std.array;
 import std.datetime;
 import std.stdio;

 enum size = 1000;

 void test1()
 {
     auto arr = appender!(int[])();
     foreach(n; 0 .. size)
         arr.put(n);
 }


I used ~= to append to an Appender. The examples use .put, but ~= is
documented so I considered the operator was just syntax sugar. I
didn't have a look at its implementation.


 void test2()
 {
     int[] arr;
     foreach(n; 0 .. size)
         arr ~= n;
 }

 void test3()
 {
     auto arr = new int[](size);
     foreach(n; 0 .. size)
         arr[n] = n;
 }

This one is simple and interesting. In my case I don't know the length
in advance (I'm doing some parsing and predicting the size of the
parse forest based on the input length is somewhat tricky), but I can
pre-allocate some, based on a simple heuristics.

 void test4()
 {
     auto arr = uninitializedArray!(int[])(size);
     foreach(n; 0 .. size)
         arr[n] = n;
 }

Oh, I didn't know this one.


 With size being 1000, I get

 178 ms, 229 μs, and 4 hnsecs
 321 ms, 272 μs, and 8 hnsecs
 27 ms, 138 μs, and 7 hnsecs
 24 ms, 970 μs, and 9 hnsecs

Same here, good.
Using ~= n instead of .put(n) gives me results consistently slightly
slower for Appender (170 ms -> 190 ms, repeatedly, even going back and
forth between the two possibilities.
I created a test1prime to test that.


 With size being 100,000, I get

 15 secs, 731 ms, 499 μs, and 1 hnsec
 29 secs, 339 ms, 553 μs, and 8 hnsecs
 2 secs, 602 ms, 385 μs, and 2 hnsecs
 2 secs, 409 ms, 448 μs, and 9 hnsecs

Ditto. OK, that's good. Also, it's still slower with using Appender ~=
n instead of Appender.put. (18s instead of 15, 20% slower)
So, kids: don't do that.

 So, it looks like using Appender to create an array of ints is about twice
 as fast as appending to the array directly, and unsurprisingly, creating the
 array at the correct size up front and filling in the values is far faster
 than either, whether the array's elements are default-initialized or not.
 And the numbers are about the same if it's an array of char rather than an
 array of int.

Perfect. That also means my go-to method will still be using standard
arrays, but with pre-allocation.
I just feel stupid writing that, because it's obvious that reserving
things should give it better speed.


 Also, curiously, if I add a test which is the same as test2 (so it's just
 appending to the array) except that it calls reserve on the array with size,
 the result is almost the same as not reserving. It's a smidgeon faster but
 probably not enough to matter. So, it looks like reserve doesn't buy you
 much for some reason. Maybe the extra work for checking whether it needs to
 reallocate or whatever fancy logic it has to do in ~= dwarfs the extra cost
 of the reallocations? That certainly seems odd to me, but bizarrely, the
 evidence does seem to say that reserving doesn't help much. That should
 probably be looked into.

Yeah, I get a small boost of 5% compared to not reserving at size
1000, which completely disappears for longer arrays.
(No different for size 100_000).


 In any case, from what I can see, if all you're doing is creating an array
 and then throwing away the Appender, it's faster to use Appender than to
 directly append to the array.

Indeed.

 Maybe that changes with structs? I don't know.

I'm using classes, because what I'm pushing into the Appender are
graph nodes and I got fed up with tracing pointer to strucs
everywhere. Maybe I should change back to structs and see what it
does.


 It would be interesting to know what's different about your code that's
 causing Appender to be slower, but from what I can see, it's definitely
 faster to use Appender unless you know the target size of the array up
 front.

Good conclusion. Thanks for your help. My takeaway idea is that I'll
use arrays, but 'new T[](size)' them before. If that makes heavy
concatenation 10 times faster, it should have a positive effect (I'm
not of course waiting for anything near a 10x boost in my computation
time).

Aug 15 2014

Philippe Sigaud via Digitalmars-d-learn writes:

On Thu, Aug 14, 2014 at 11:33 PM, Joseph Rushton Wakeling via
Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:
 On 14/08/14 19:16, Philippe Sigaud via Digitalmars-d-learn wrote:
 Do people here get good results from Appender? And if yes, how are you
 using it?


 An example where it worked for me:
 http://braingam.es/2013/09/betweenness-centrality-in-dgraph/

 (You will have to scroll down a bit to get to the point where appenders get
 introduced.)

I remember reading this and loving it! Any news on this? Do you have
new algorithms?

Aug 15 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 14 August 2014 at 17:16:42 UTC, Philippe Sigaud 
wrote:
 From time to time, I try to speed up some array-heavy code by 
 using std.array.Appender, reserving some capacity and so on.

 It never works. Never. It gives me executables that are maybe 
 30-50% slower than bog-standard array code.

 I don't do anything fancy: I just replace the types, use 
 clear() instead of = null...

 Do people here get good results from Appender? And if yes, how 
 are you using it?

compiler, version, OS, architecture, flags?

Have you looked at the assembly to check that all the Appender 
method calls are being inlined?

Aug 15 2014

Philippe Sigaud via Digitalmars-d-learn writes:

On Fri, Aug 15, 2014 at 10:04 PM, John Colvin via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 compiler, version, OS, architecture, flags?

Compiler: DMD 2.065 and LDC 0.14
OS: Linux 64bits (8 cores, but there is no parallelism here)
flags: -O -release -inline (and -noboundscheck for DMD)


 Have you looked at the assembly to check that all the Appender method calls
 are being inlined?

I do not know how to look at the assembly, neither do I know how to
see if Appender method calls are being inlined.

I did spend some time with -profile and gained a nice 10% increase in
speed with that, fighting bottlenecks in my code.

Aug 15 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Appender is ... slow