digitalmars.D - memory management of array

ZHOU Zhenyu (11/11) Dec 19 2008 I use D to solve the programming problems on projecteuler ( http://proj...

dsimcha (43/57) Dec 19 2008 slower.

bearophile (8/12) Dec 19 2008 It's not easy nor fast to write a good version of it.

Andrei Alexandrescu (8/19) Dec 19 2008 I disagree on both counts. Haven't we discussed and settled this before?...

dsimcha (7/18) Dec 19 2008 Phobos/Tango
bearophile (5/9) Dec 19 2008 It's possible, I don't know, I have done my best :-)

Andrei Alexandrescu (5/12) Dec 20 2008 Well this is not the time and place to feel persecuted :o). If your

Bill Baxter (18/29) Dec 19 2008 In addition to what other people have said, if you know the length the

Michel Fortin (9/15) Dec 20 2008 The downside of reserving this way is that it'll initialize all the

Bill Baxter (16/26) Dec 20 2008 +1, reserve() should be in the language.

mastrost (16/17) Dec 21 2008 No it is not. I mean it is certainly true for your very particular probl...

ZHOU Zhenyu (4/11) Dec 21 2008 I found that (oldSize*k) will be about 20% slower than (requiredSize*k)
KennyTM~ (3/31) Dec 21 2008 Given that requiredSize == oldSize+1, I don't see any difference between...

ZHOU Zhenyu <rinick goozo.net> writes:

I use D to solve the programming problems  on projecteuler (
http://projecteuler.net/ )


I thought D should be as fast as C++, but it turns out that sometimes D is much
slower.

It seems that array would reallocate its memory every time the  ~= operation is
invoked.
If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
That would be at least 100 times slower than C++STL.


So I don't use ~= and manage the length of the array myself 
When the length is not enough, I reallocate the memory with the size of 
requiredSize * 9 / 8. ( In STL, it's oldSize *  2. )
9/8 could save a lot of memory, and the attached source code shows that *9/8 is
almost as fast as *2 .

My question is:
Is there any other way to solve this problem?
I think this kind of optimization should be implemented in the compiler or the
D runtime.

Dec 19 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from ZHOU Zhenyu (rinick goozo.net)'s article
 This is a multi-part message in MIME format
 ------------19106828741229702115
 Content-Type: text/plain
 I use D to solve the programming problems  on projecteuler (

http://projecteuler.net/ )
 I thought D should be as fast as C++, but it turns out that sometimes D is much

slower.
 It seems that array would reallocate its memory every time the  ~= operation is

invoked.
 If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
 That would be at least 100 times slower than C++STL.
 So I don't use ~= and manage the length of the array myself
 When the length is not enough, I reallocate the memory with the size of

requiredSize * 9 / 8. ( In STL, it's oldSize *  2. )
 9/8 could save a lot of memory, and the attached source code shows that *9/8 is

almost as fast as *2 .
 My question is:
 Is there any other way to solve this problem?
 I think this kind of optimization should be implemented in the compiler or the
D

runtime.

Internally, dynamic arrays in D are represented by something like:

struct Array(T) {
    T* ptr;
    size_t length;
}

When the ~= operator is invoked, there is no fast way to get the capacity of the
array.  Therefore, the GC is queried for this information.  This can be very
slow
since, if the program is multithreaded, interacting with the GC means waiting
for
a lock.  Furthermore, even in single-threaded programs, this is not particularly
efficient.  The GC (I believe, I haven't looked at that part of the code since
the
old Phobos GC was replaced with druntime) somewhat compensates by caching the
capacity of the last pointer queried, and this helps to some degree as long as
you're only appending to one array in a single-threaded program.

There are basically three solutions to this, each with its pros and cons:

1.  Allocate the whole array at once and keep track of the last position used. 
If
you know in advance how big it is going to be, or at least a reasonable upper
bound, and the code is performance-critical, this is a good idea anyway. 
However,
it's a bad solution if you don't know this.
2.  Use something like an ArrayBuilder struct that caches the capacity.  The
core
of the code would be something like:

void opCatAssign(T elem) {
    if(array.length + 1 < capacity) {
        array = array.ptr[0..$ + 1];
        array[$ - 1] = elem;
    } else {
        array ~= elem;
        capacity = GC.sizeOf(array.ptr) / T.sizeof;
    }
}

The problems with this are that it's a little bit ugly and that there's no
standard ArrayBuilder struct.  People have written them before, and it's very
easy
to write your own just the way you like it, but a standard version in
Phobos/Tango
would be nice.

3.  Include the capacity field in the internal array struct.  The problems with
this are that it would require changes to the compiler and the ABI and that it
would be inefficient in a lot of cases where the appending functionality is not
used.

Dec 19 2008

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very
easy
 to write your own just the way you like it, but a standard version in
Phobos/Tango
 would be nice.

It's not easy nor fast to write a good version of it.
You can use the version from my dlibs, docs:
http://www.fantascienza.net/leonardo/so/dlibs/builders.html
Download code+docs here:
http://www.fantascienza.net/leonardo/so/libs_d.zip

Bye,
bearophile

Dec 19 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very
easy
 to write your own just the way you like it, but a standard version in
Phobos/Tango
 would be nice.

 
 It's not easy nor fast to write a good version of it.

I disagree on both counts. Haven't we discussed and settled this before? 
OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and 
search for ArrayAppender.

 You can use the version from my dlibs, docs:
 http://www.fantascienza.net/leonardo/so/dlibs/builders.html
 Download code+docs here:
 http://www.fantascienza.net/leonardo/so/libs_d.zip

I believe that that implementation is unnecessarily complicated because 
it reimplements from scratch a number of strategies already present in 
the built-in array.


Andrei

Dec 19 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 bearophile wrote:
 dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very



easy
 to write your own just the way you like it, but a standard version in



Phobos/Tango
 would be nice.

 It's not easy nor fast to write a good version of it.

 I disagree on both counts. Haven't we discussed and settled this before?
 OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and
 search for ArrayAppender.

I realized we had discussed this, but I didn't realize that this ArrayAppender
struct is going to become part of Phobos anytime soon.  I think it definitely
should be, so that everyone doesn't have to search for it or reinvent the wheel.
Thanks for reminding me about it though, will test.

Dec 19 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Haven't we discussed and settled this before? 

Discussed yes, but I think not settled: no one has performed a good performance
comparison of the two implementations. But I presume it doesn't matter much,
people aren't going to use my implementation anyway :-)


 I believe that that implementation is unnecessarily complicated because 
 it reimplements from scratch a number of strategies already present in 
 the built-in array.

It's possible, I don't know, I have done my best :-)

Bye,
bearophile

Dec 19 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 Haven't we discussed and settled this before?

 
 Discussed yes, but I think not settled: no one has performed a good
 performance comparison of the two implementations. But I presume it
 doesn't matter much, people aren't going to use my implementation
 anyway :-)

Well this is not the time and place to feel persecuted :o). If your 
implementation is technically superior, I'll be glad to include it in 
Phobos.


Andrei

Dec 20 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Dec 20, 2008 at 12:55 AM, ZHOU Zhenyu <rinick goozo.net> wrote:
 I use D to solve the programming problems  on projecteuler (
http://projecteuler.net/ )


 I thought D should be as fast as C++, but it turns out that sometimes D is
much slower.

 It seems that array would reallocate its memory every time the  ~= operation
is invoked.
 If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
 That would be at least 100 times slower than C++STL.


 So I don't use ~= and manage the length of the array myself
 When the length is not enough, I reallocate the memory with the size of 
requiredSize * 9 / 8. ( In STL, it's oldSize *  2. )
 9/8 could save a lot of memory, and the attached source code shows that *9/8
is almost as fast as *2 .

 My question is:
 Is there any other way to solve this problem?
 I think this kind of optimization should be implemented in the compiler or the
D runtime.

In addition to what other people have said, if you know the length the
array will be eventually you can preallocate by doing this:

float[] arr;
arr.length = N;
arr.length = 0;

And you can put that into a little template function called "reserve" like so:

void reserve(ref T[] x, size_t N) {
    auto len = x.length;
    if (N > len) {
        x.length = N;
        x.length = len;
    }
}

and use it like this:

float[] arr;
arr.reserve(N);

--bb

Dec 19 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:

 In addition to what other people have said, if you know the length the
 array will be eventually you can preallocate by doing this:
 
 float[] arr;
 arr.length = N;
 arr.length = 0;

The downside of reserving this way is that it'll initialize all the 
array elements you're reserving with a default value for no reason at 
all. I think reserve should be part of the language so we can avoid 
this.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 20 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Dec 20, 2008 at 8:22 PM, Michel Fortin
<michel.fortin michelf.com> wrote:
 On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:

 In addition to what other people have said, if you know the length the
 array will be eventually you can preallocate by doing this:

 float[] arr;
 arr.length = N;
 arr.length = 0;

 The downside of reserving this way is that it'll initialize all the array
 elements you're reserving with a default value for no reason at all. I think
 reserve should be part of the language so we can avoid this.

+1,  reserve() should be in the language.

Also, the spec isn't exactly clear that the above will always work.
All I can find is this:
"""
To maximize efficiency, the runtime always tries to resize the array
in place to avoid extra copying. It will always do a copy if the new
size is larger and the array was not allocated via the new operator or
a previous resize operation.
"""

It says the runtime "tries", but the doesn't really give any explicit
guarantees about what will happen when the new length is shorter.  So
I think from this language a spec-conforming D compiler would be free
to release the memory when you set length to 0 if it wanted to.

--bb

Dec 20 2008

mastrost <titi.mastro free.fr> writes:

 the attached source code shows that *9/8 is almost as fast as *2 .

No it is not. I mean it is certainly true for your very particular problem, but
in fact it is not even the same order of complexity. Let's take this code:

void main(){
    int n=100;
    int[] myArray; 

    assert(myArray.length == 0);

    for(auto i=0; i<n ; ++i){
        myArray ~= 1;
    }

    assert(myArray.length == n);
}

Now assume k1 and k2 are constant number.

If you reallocate the memory using (requiredSize*k1), you will do about 'n'
reallocations, wich means 'n' copies of the buffer, which cost you 'n' each
time you do so. So your algo costs about n^2 operations.

If you reallocate the memory using (oldSize * K2) you will do about 'log(n)'
reallocations. That is why the stl is using (oldSize * 2). When using
(oldSize*k2), the memory used is the same order of magnitude than with
(requiredSize * k1), but the execution time becomes n*log(n) instead of n^2.

This is just an asymptotic behaviour (this becomes more and more true if n is
big), so in real life, it is not necessary true that oldSize*2 is really
better. But if you cannot be sure that (requiredSize * K1) is better than
(oldSize * K2), I think you should use (oldSize * k2).

I have been tought this stuffs a couple of years ago, so I might be mistaken.
Moreover I did not make the experience myself.

regards

Dec 21 2008

ZHOU Zhenyu <rinick goozo.net> writes:

mastrost Wrote:
 If you reallocate the memory using (requiredSize*k1), you will do about 'n'
reallocations, wich means 'n' copies of the buffer, which cost you 'n' each
time you do so. So your algo costs about n^2 operations.
 
 If you reallocate the memory using (oldSize * K2) you will do about 'log(n)'
reallocations. That is why the stl is using (oldSize * 2). When using
(oldSize*k2), the memory used is the same order of magnitude than with
(requiredSize * k1), but the execution time becomes n*log(n) instead of n^2.
 
 This is just an asymptotic behaviour (this becomes more and more true if n is
big), so in real life, it is not necessary true that oldSize*2 is really
better. But if you cannot be sure that (requiredSize * K1) is better than
(oldSize * K2), I think you should use (oldSize * k2).
 
 I have been tought this stuffs a couple of years ago, so I might be mistaken.
Moreover I did not make the experience myself.

I found that (oldSize*k) will be about 20% slower than (requiredSize*k)
I thought they should be same, be the test result told me that (requiredSize*k)
is always faster.
I don't know why that happened, maybe the reason is something related to the
mechanism of GC.

Dec 21 2008

KennyTM~ <kennytm gmail.com> writes:

mastrost wrote:
 the attached source code shows that *9/8 is almost as fast as *2 .

 
 No it is not. I mean it is certainly true for your very particular problem,
but in fact it is not even the same order of complexity. Let's take this code:
 
 void main(){
     int n=100;
     int[] myArray; 
 
     assert(myArray.length == 0);
 
     for(auto i=0; i<n ; ++i){
         myArray ~= 1;
     }
 
     assert(myArray.length == n);
 }
 
 Now assume k1 and k2 are constant number.
 
 If you reallocate the memory using (requiredSize*k1), you will do about 'n'
reallocations, wich means 'n' copies of the buffer, which cost you 'n' each
time you do so. So your algo costs about n^2 operations.
 
 If you reallocate the memory using (oldSize * K2) you will do about 'log(n)'
reallocations. That is why the stl is using (oldSize * 2). When using
(oldSize*k2), the memory used is the same order of magnitude than with
(requiredSize * k1), but the execution time becomes n*log(n) instead of n^2.
 
 This is just an asymptotic behaviour (this becomes more and more true if n is
big), so in real life, it is not necessary true that oldSize*2 is really
better. But if you cannot be sure that (requiredSize * K1) is better than
(oldSize * K2), I think you should use (oldSize * k2).
 
 I have been tought this stuffs a couple of years ago, so I might be mistaken.
Moreover I did not make the experience myself.
 
 regards

Given that requiredSize == oldSize+1, I don't see any difference between 
the schemes...

Dec 21 2008

D Programming

C/C++ Programming

Other

digitalmars.D - memory management of array