www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - memory management of array

reply ZHOU Zhenyu <rinick goozo.net> writes:
I use D to solve the programming problems  on projecteuler (
http://projecteuler.net/ )


I thought D should be as fast as C++, but it turns out that sometimes D is much
slower.

It seems that array would reallocate its memory every time the  ~= operation is
invoked.
If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
That would be at least 100 times slower than C++STL.


So I don't use ~= and manage the length of the array myself 
When the length is not enough, I reallocate the memory with the size of 
requiredSize * 9 / 8. ( In STL, it's oldSize *  2. )
9/8 could save a lot of memory, and the attached source code shows that *9/8 is
almost as fast as *2 .

My question is:
Is there any other way to solve this problem?
I think this kind of optimization should be implemented in the compiler or the
D runtime.
Dec 19 2008
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from ZHOU Zhenyu (rinick goozo.net)'s article
 This is a multi-part message in MIME format
 ------------19106828741229702115
 Content-Type: text/plain
 I use D to solve the programming problems  on projecteuler (
http://projecteuler.net/ )
 I thought D should be as fast as C++, but it turns out that sometimes D is much
slower.
 It seems that array would reallocate its memory every time the  ~= operation is
invoked.
 If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
 That would be at least 100 times slower than C++STL.
 So I don't use ~= and manage the length of the array myself
 When the length is not enough, I reallocate the memory with the size of
requiredSize * 9 / 8. ( In STL, it's oldSize * 2. )
 9/8 could save a lot of memory, and the attached source code shows that *9/8 is
almost as fast as *2 .
 My question is:
 Is there any other way to solve this problem?
 I think this kind of optimization should be implemented in the compiler or the
D
runtime. Internally, dynamic arrays in D are represented by something like: struct Array(T) { T* ptr; size_t length; } When the ~= operator is invoked, there is no fast way to get the capacity of the array. Therefore, the GC is queried for this information. This can be very slow since, if the program is multithreaded, interacting with the GC means waiting for a lock. Furthermore, even in single-threaded programs, this is not particularly efficient. The GC (I believe, I haven't looked at that part of the code since the old Phobos GC was replaced with druntime) somewhat compensates by caching the capacity of the last pointer queried, and this helps to some degree as long as you're only appending to one array in a single-threaded program. There are basically three solutions to this, each with its pros and cons: 1. Allocate the whole array at once and keep track of the last position used. If you know in advance how big it is going to be, or at least a reasonable upper bound, and the code is performance-critical, this is a good idea anyway. However, it's a bad solution if you don't know this. 2. Use something like an ArrayBuilder struct that caches the capacity. The core of the code would be something like: void opCatAssign(T elem) { if(array.length + 1 < capacity) { array = array.ptr[0..$ + 1]; array[$ - 1] = elem; } else { array ~= elem; capacity = GC.sizeOf(array.ptr) / T.sizeof; } } The problems with this are that it's a little bit ugly and that there's no standard ArrayBuilder struct. People have written them before, and it's very easy to write your own just the way you like it, but a standard version in Phobos/Tango would be nice. 3. Include the capacity field in the internal array struct. The problems with this are that it would require changes to the compiler and the ABI and that it would be inefficient in a lot of cases where the appending functionality is not used.
Dec 19 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very
easy
 to write your own just the way you like it, but a standard version in
Phobos/Tango
 would be nice.
It's not easy nor fast to write a good version of it. You can use the version from my dlibs, docs: http://www.fantascienza.net/leonardo/so/dlibs/builders.html Download code+docs here: http://www.fantascienza.net/leonardo/so/libs_d.zip Bye, bearophile
Dec 19 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very
easy
 to write your own just the way you like it, but a standard version in
Phobos/Tango
 would be nice.
It's not easy nor fast to write a good version of it.
I disagree on both counts. Haven't we discussed and settled this before? OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and search for ArrayAppender.
 You can use the version from my dlibs, docs:
 http://www.fantascienza.net/leonardo/so/dlibs/builders.html
 Download code+docs here:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
I believe that that implementation is unnecessarily complicated because it reimplements from scratch a number of strategies already present in the built-in array. Andrei
Dec 19 2008
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 bearophile wrote:
 dsimcha:
 The problems with this are that it's a little bit ugly and that there's no
 standard ArrayBuilder struct.  People have written them before, and it's very
easy
 to write your own just the way you like it, but a standard version in
Phobos/Tango
 would be nice.
It's not easy nor fast to write a good version of it.
I disagree on both counts. Haven't we discussed and settled this before? OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and search for ArrayAppender.
I realized we had discussed this, but I didn't realize that this ArrayAppender struct is going to become part of Phobos anytime soon. I think it definitely should be, so that everyone doesn't have to search for it or reinvent the wheel. Thanks for reminding me about it though, will test.
Dec 19 2008
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Haven't we discussed and settled this before? 
Discussed yes, but I think not settled: no one has performed a good performance comparison of the two implementations. But I presume it doesn't matter much, people aren't going to use my implementation anyway :-)
 I believe that that implementation is unnecessarily complicated because 
 it reimplements from scratch a number of strategies already present in 
 the built-in array.
It's possible, I don't know, I have done my best :-) Bye, bearophile
Dec 19 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 Haven't we discussed and settled this before?
Discussed yes, but I think not settled: no one has performed a good performance comparison of the two implementations. But I presume it doesn't matter much, people aren't going to use my implementation anyway :-)
Well this is not the time and place to feel persecuted :o). If your implementation is technically superior, I'll be glad to include it in Phobos. Andrei
Dec 20 2008
prev sibling next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Dec 20, 2008 at 12:55 AM, ZHOU Zhenyu <rinick goozo.net> wrote:
 I use D to solve the programming problems  on projecteuler (
http://projecteuler.net/ )


 I thought D should be as fast as C++, but it turns out that sometimes D is
much slower.

 It seems that array would reallocate its memory every time the  ~= operation
is invoked.
 If I push 10^7 numbers into an array, the realloc would be called for 10^7
times.
 That would be at least 100 times slower than C++STL.


 So I don't use ~= and manage the length of the array myself
 When the length is not enough, I reallocate the memory with the size of 
requiredSize * 9 / 8. ( In STL, it's oldSize *  2. )
 9/8 could save a lot of memory, and the attached source code shows that *9/8
is almost as fast as *2 .

 My question is:
 Is there any other way to solve this problem?
 I think this kind of optimization should be implemented in the compiler or the
D runtime.
In addition to what other people have said, if you know the length the array will be eventually you can preallocate by doing this: float[] arr; arr.length = N; arr.length = 0; And you can put that into a little template function called "reserve" like so: void reserve(ref T[] x, size_t N) { auto len = x.length; if (N > len) { x.length = N; x.length = len; } } and use it like this: float[] arr; arr.reserve(N); --bb
Dec 19 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:

 In addition to what other people have said, if you know the length the
 array will be eventually you can preallocate by doing this:
 
 float[] arr;
 arr.length = N;
 arr.length = 0;
The downside of reserving this way is that it'll initialize all the array elements you're reserving with a default value for no reason at all. I think reserve should be part of the language so we can avoid this. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 20 2008
parent "Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Dec 20, 2008 at 8:22 PM, Michel Fortin
<michel.fortin michelf.com> wrote:
 On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:

 In addition to what other people have said, if you know the length the
 array will be eventually you can preallocate by doing this:

 float[] arr;
 arr.length = N;
 arr.length = 0;
The downside of reserving this way is that it'll initialize all the array elements you're reserving with a default value for no reason at all. I think reserve should be part of the language so we can avoid this.
+1, reserve() should be in the language. Also, the spec isn't exactly clear that the above will always work. All I can find is this: """ To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or a previous resize operation. """ It says the runtime "tries", but the doesn't really give any explicit guarantees about what will happen when the new length is shorter. So I think from this language a spec-conforming D compiler would be free to release the memory when you set length to 0 if it wanted to. --bb
Dec 20 2008
prev sibling parent reply mastrost <titi.mastro free.fr> writes:
 the attached source code shows that *9/8 is almost as fast as *2 .
No it is not. I mean it is certainly true for your very particular problem, but in fact it is not even the same order of complexity. Let's take this code: void main(){ int n=100; int[] myArray; assert(myArray.length == 0); for(auto i=0; i<n ; ++i){ myArray ~= 1; } assert(myArray.length == n); } Now assume k1 and k2 are constant number. If you reallocate the memory using (requiredSize*k1), you will do about 'n' reallocations, wich means 'n' copies of the buffer, which cost you 'n' each time you do so. So your algo costs about n^2 operations. If you reallocate the memory using (oldSize * K2) you will do about 'log(n)' reallocations. That is why the stl is using (oldSize * 2). When using (oldSize*k2), the memory used is the same order of magnitude than with (requiredSize * k1), but the execution time becomes n*log(n) instead of n^2. This is just an asymptotic behaviour (this becomes more and more true if n is big), so in real life, it is not necessary true that oldSize*2 is really better. But if you cannot be sure that (requiredSize * K1) is better than (oldSize * K2), I think you should use (oldSize * k2). I have been tought this stuffs a couple of years ago, so I might be mistaken. Moreover I did not make the experience myself. regards
Dec 21 2008
next sibling parent ZHOU Zhenyu <rinick goozo.net> writes:
mastrost Wrote:
 If you reallocate the memory using (requiredSize*k1), you will do about 'n'
reallocations, wich means 'n' copies of the buffer, which cost you 'n' each
time you do so. So your algo costs about n^2 operations.
 
 If you reallocate the memory using (oldSize * K2) you will do about 'log(n)'
reallocations. That is why the stl is using (oldSize * 2). When using
(oldSize*k2), the memory used is the same order of magnitude than with
(requiredSize * k1), but the execution time becomes n*log(n) instead of n^2.
 
 This is just an asymptotic behaviour (this becomes more and more true if n is
big), so in real life, it is not necessary true that oldSize*2 is really
better. But if you cannot be sure that (requiredSize * K1) is better than
(oldSize * K2), I think you should use (oldSize * k2).
 
 I have been tought this stuffs a couple of years ago, so I might be mistaken.
Moreover I did not make the experience myself.
I found that (oldSize*k) will be about 20% slower than (requiredSize*k) I thought they should be same, be the test result told me that (requiredSize*k) is always faster. I don't know why that happened, maybe the reason is something related to the mechanism of GC.
Dec 21 2008
prev sibling parent KennyTM~ <kennytm gmail.com> writes:
mastrost wrote:
 the attached source code shows that *9/8 is almost as fast as *2 .
No it is not. I mean it is certainly true for your very particular problem, but in fact it is not even the same order of complexity. Let's take this code: void main(){ int n=100; int[] myArray; assert(myArray.length == 0); for(auto i=0; i<n ; ++i){ myArray ~= 1; } assert(myArray.length == n); } Now assume k1 and k2 are constant number. If you reallocate the memory using (requiredSize*k1), you will do about 'n' reallocations, wich means 'n' copies of the buffer, which cost you 'n' each time you do so. So your algo costs about n^2 operations. If you reallocate the memory using (oldSize * K2) you will do about 'log(n)' reallocations. That is why the stl is using (oldSize * 2). When using (oldSize*k2), the memory used is the same order of magnitude than with (requiredSize * k1), but the execution time becomes n*log(n) instead of n^2. This is just an asymptotic behaviour (this becomes more and more true if n is big), so in real life, it is not necessary true that oldSize*2 is really better. But if you cannot be sure that (requiredSize * K1) is better than (oldSize * K2), I think you should use (oldSize * k2). I have been tought this stuffs a couple of years ago, so I might be mistaken. Moreover I did not make the experience myself. regards
Given that requiredSize == oldSize+1, I don't see any difference between the schemes...
Dec 21 2008