digitalmars.D - memory management of array
- ZHOU Zhenyu (11/11) Dec 19 2008 I use D to solve the programming problems on projecteuler ( http://proj...
- dsimcha (43/57) Dec 19 2008 slower.
- bearophile (8/12) Dec 19 2008 It's not easy nor fast to write a good version of it.
- Andrei Alexandrescu (8/19) Dec 19 2008 I disagree on both counts. Haven't we discussed and settled this before?...
- dsimcha (7/18) Dec 19 2008 Phobos/Tango
- bearophile (5/9) Dec 19 2008 It's possible, I don't know, I have done my best :-)
- Andrei Alexandrescu (5/12) Dec 20 2008 Well this is not the time and place to feel persecuted :o). If your
- Bill Baxter (18/29) Dec 19 2008 In addition to what other people have said, if you know the length the
- Michel Fortin (9/15) Dec 20 2008 The downside of reserving this way is that it'll initialize all the
- Bill Baxter (16/26) Dec 20 2008 +1, reserve() should be in the language.
- mastrost (16/17) Dec 21 2008 No it is not. I mean it is certainly true for your very particular probl...
- ZHOU Zhenyu (4/11) Dec 21 2008 I found that (oldSize*k) will be about 20% slower than (requiredSize*k)
- KennyTM~ (3/31) Dec 21 2008 Given that requiredSize == oldSize+1, I don't see any difference between...
I use D to solve the programming problems on projecteuler ( http://projecteuler.net/ ) I thought D should be as fast as C++, but it turns out that sometimes D is much slower. It seems that array would reallocate its memory every time the ~= operation is invoked. If I push 10^7 numbers into an array, the realloc would be called for 10^7 times. That would be at least 100 times slower than C++STL. So I don't use ~= and manage the length of the array myself When the length is not enough, I reallocate the memory with the size of requiredSize * 9 / 8. ( In STL, it's oldSize * 2. ) 9/8 could save a lot of memory, and the attached source code shows that *9/8 is almost as fast as *2 . My question is: Is there any other way to solve this problem? I think this kind of optimization should be implemented in the compiler or the D runtime.
Dec 19 2008
== Quote from ZHOU Zhenyu (rinick goozo.net)'s articleThis is a multi-part message in MIME format ------------19106828741229702115 Content-Type: text/plain I use D to solve the programming problems on projecteuler (http://projecteuler.net/ )I thought D should be as fast as C++, but it turns out that sometimes D is muchslower.It seems that array would reallocate its memory every time the ~= operation isinvoked.If I push 10^7 numbers into an array, the realloc would be called for 10^7 times. That would be at least 100 times slower than C++STL. So I don't use ~= and manage the length of the array myself When the length is not enough, I reallocate the memory with the size ofrequiredSize * 9 / 8. ( In STL, it's oldSize * 2. )9/8 could save a lot of memory, and the attached source code shows that *9/8 isalmost as fast as *2 .My question is: Is there any other way to solve this problem? I think this kind of optimization should be implemented in the compiler or the Druntime. Internally, dynamic arrays in D are represented by something like: struct Array(T) { T* ptr; size_t length; } When the ~= operator is invoked, there is no fast way to get the capacity of the array. Therefore, the GC is queried for this information. This can be very slow since, if the program is multithreaded, interacting with the GC means waiting for a lock. Furthermore, even in single-threaded programs, this is not particularly efficient. The GC (I believe, I haven't looked at that part of the code since the old Phobos GC was replaced with druntime) somewhat compensates by caching the capacity of the last pointer queried, and this helps to some degree as long as you're only appending to one array in a single-threaded program. There are basically three solutions to this, each with its pros and cons: 1. Allocate the whole array at once and keep track of the last position used. If you know in advance how big it is going to be, or at least a reasonable upper bound, and the code is performance-critical, this is a good idea anyway. However, it's a bad solution if you don't know this. 2. Use something like an ArrayBuilder struct that caches the capacity. The core of the code would be something like: void opCatAssign(T elem) { if(array.length + 1 < capacity) { array = array.ptr[0..$ + 1]; array[$ - 1] = elem; } else { array ~= elem; capacity = GC.sizeOf(array.ptr) / T.sizeof; } } The problems with this are that it's a little bit ugly and that there's no standard ArrayBuilder struct. People have written them before, and it's very easy to write your own just the way you like it, but a standard version in Phobos/Tango would be nice. 3. Include the capacity field in the internal array struct. The problems with this are that it would require changes to the compiler and the ABI and that it would be inefficient in a lot of cases where the appending functionality is not used.
Dec 19 2008
dsimcha:The problems with this are that it's a little bit ugly and that there's no standard ArrayBuilder struct. People have written them before, and it's very easy to write your own just the way you like it, but a standard version in Phobos/Tango would be nice.It's not easy nor fast to write a good version of it. You can use the version from my dlibs, docs: http://www.fantascienza.net/leonardo/so/dlibs/builders.html Download code+docs here: http://www.fantascienza.net/leonardo/so/libs_d.zip Bye, bearophile
Dec 19 2008
bearophile wrote:dsimcha:I disagree on both counts. Haven't we discussed and settled this before? OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and search for ArrayAppender.The problems with this are that it's a little bit ugly and that there's no standard ArrayBuilder struct. People have written them before, and it's very easy to write your own just the way you like it, but a standard version in Phobos/Tango would be nice.It's not easy nor fast to write a good version of it.You can use the version from my dlibs, docs: http://www.fantascienza.net/leonardo/so/dlibs/builders.html Download code+docs here: http://www.fantascienza.net/leonardo/so/libs_d.zipI believe that that implementation is unnecessarily complicated because it reimplements from scratch a number of strategies already present in the built-in array. Andrei
Dec 19 2008
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articlebearophile wrote:easydsimcha:The problems with this are that it's a little bit ugly and that there's no standard ArrayBuilder struct. People have written them before, and it's veryPhobos/Tangoto write your own just the way you like it, but a standard version inI realized we had discussed this, but I didn't realize that this ArrayAppender struct is going to become part of Phobos anytime soon. I think it definitely should be, so that everyone doesn't have to search for it or reinvent the wheel. Thanks for reminding me about it though, will test.I disagree on both counts. Haven't we discussed and settled this before? OP, you may want to refer to this page: http://tinyurl.com/4ufuxg and search for ArrayAppender.would be nice.It's not easy nor fast to write a good version of it.
Dec 19 2008
Andrei Alexandrescu:Haven't we discussed and settled this before?Discussed yes, but I think not settled: no one has performed a good performance comparison of the two implementations. But I presume it doesn't matter much, people aren't going to use my implementation anyway :-)I believe that that implementation is unnecessarily complicated because it reimplements from scratch a number of strategies already present in the built-in array.It's possible, I don't know, I have done my best :-) Bye, bearophile
Dec 19 2008
bearophile wrote:Andrei Alexandrescu:Well this is not the time and place to feel persecuted :o). If your implementation is technically superior, I'll be glad to include it in Phobos. AndreiHaven't we discussed and settled this before?Discussed yes, but I think not settled: no one has performed a good performance comparison of the two implementations. But I presume it doesn't matter much, people aren't going to use my implementation anyway :-)
Dec 20 2008
On Sat, Dec 20, 2008 at 12:55 AM, ZHOU Zhenyu <rinick goozo.net> wrote:I use D to solve the programming problems on projecteuler ( http://projecteuler.net/ ) I thought D should be as fast as C++, but it turns out that sometimes D is much slower. It seems that array would reallocate its memory every time the ~= operation is invoked. If I push 10^7 numbers into an array, the realloc would be called for 10^7 times. That would be at least 100 times slower than C++STL. So I don't use ~= and manage the length of the array myself When the length is not enough, I reallocate the memory with the size of requiredSize * 9 / 8. ( In STL, it's oldSize * 2. ) 9/8 could save a lot of memory, and the attached source code shows that *9/8 is almost as fast as *2 . My question is: Is there any other way to solve this problem? I think this kind of optimization should be implemented in the compiler or the D runtime.In addition to what other people have said, if you know the length the array will be eventually you can preallocate by doing this: float[] arr; arr.length = N; arr.length = 0; And you can put that into a little template function called "reserve" like so: void reserve(ref T[] x, size_t N) { auto len = x.length; if (N > len) { x.length = N; x.length = len; } } and use it like this: float[] arr; arr.reserve(N); --bb
Dec 19 2008
On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:In addition to what other people have said, if you know the length the array will be eventually you can preallocate by doing this: float[] arr; arr.length = N; arr.length = 0;The downside of reserving this way is that it'll initialize all the array elements you're reserving with a default value for no reason at all. I think reserve should be part of the language so we can avoid this. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 20 2008
On Sat, Dec 20, 2008 at 8:22 PM, Michel Fortin <michel.fortin michelf.com> wrote:On 2008-12-19 17:01:48 -0500, "Bill Baxter" <wbaxter gmail.com> said:+1, reserve() should be in the language. Also, the spec isn't exactly clear that the above will always work. All I can find is this: """ To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or a previous resize operation. """ It says the runtime "tries", but the doesn't really give any explicit guarantees about what will happen when the new length is shorter. So I think from this language a spec-conforming D compiler would be free to release the memory when you set length to 0 if it wanted to. --bbIn addition to what other people have said, if you know the length the array will be eventually you can preallocate by doing this: float[] arr; arr.length = N; arr.length = 0;The downside of reserving this way is that it'll initialize all the array elements you're reserving with a default value for no reason at all. I think reserve should be part of the language so we can avoid this.
Dec 20 2008
the attached source code shows that *9/8 is almost as fast as *2 .No it is not. I mean it is certainly true for your very particular problem, but in fact it is not even the same order of complexity. Let's take this code: void main(){ int n=100; int[] myArray; assert(myArray.length == 0); for(auto i=0; i<n ; ++i){ myArray ~= 1; } assert(myArray.length == n); } Now assume k1 and k2 are constant number. If you reallocate the memory using (requiredSize*k1), you will do about 'n' reallocations, wich means 'n' copies of the buffer, which cost you 'n' each time you do so. So your algo costs about n^2 operations. If you reallocate the memory using (oldSize * K2) you will do about 'log(n)' reallocations. That is why the stl is using (oldSize * 2). When using (oldSize*k2), the memory used is the same order of magnitude than with (requiredSize * k1), but the execution time becomes n*log(n) instead of n^2. This is just an asymptotic behaviour (this becomes more and more true if n is big), so in real life, it is not necessary true that oldSize*2 is really better. But if you cannot be sure that (requiredSize * K1) is better than (oldSize * K2), I think you should use (oldSize * k2). I have been tought this stuffs a couple of years ago, so I might be mistaken. Moreover I did not make the experience myself. regards
Dec 21 2008
mastrost Wrote:If you reallocate the memory using (requiredSize*k1), you will do about 'n' reallocations, wich means 'n' copies of the buffer, which cost you 'n' each time you do so. So your algo costs about n^2 operations. If you reallocate the memory using (oldSize * K2) you will do about 'log(n)' reallocations. That is why the stl is using (oldSize * 2). When using (oldSize*k2), the memory used is the same order of magnitude than with (requiredSize * k1), but the execution time becomes n*log(n) instead of n^2. This is just an asymptotic behaviour (this becomes more and more true if n is big), so in real life, it is not necessary true that oldSize*2 is really better. But if you cannot be sure that (requiredSize * K1) is better than (oldSize * K2), I think you should use (oldSize * k2). I have been tought this stuffs a couple of years ago, so I might be mistaken. Moreover I did not make the experience myself.I found that (oldSize*k) will be about 20% slower than (requiredSize*k) I thought they should be same, be the test result told me that (requiredSize*k) is always faster. I don't know why that happened, maybe the reason is something related to the mechanism of GC.
Dec 21 2008
mastrost wrote:Given that requiredSize == oldSize+1, I don't see any difference between the schemes...the attached source code shows that *9/8 is almost as fast as *2 .No it is not. I mean it is certainly true for your very particular problem, but in fact it is not even the same order of complexity. Let's take this code: void main(){ int n=100; int[] myArray; assert(myArray.length == 0); for(auto i=0; i<n ; ++i){ myArray ~= 1; } assert(myArray.length == n); } Now assume k1 and k2 are constant number. If you reallocate the memory using (requiredSize*k1), you will do about 'n' reallocations, wich means 'n' copies of the buffer, which cost you 'n' each time you do so. So your algo costs about n^2 operations. If you reallocate the memory using (oldSize * K2) you will do about 'log(n)' reallocations. That is why the stl is using (oldSize * 2). When using (oldSize*k2), the memory used is the same order of magnitude than with (requiredSize * k1), but the execution time becomes n*log(n) instead of n^2. This is just an asymptotic behaviour (this becomes more and more true if n is big), so in real life, it is not necessary true that oldSize*2 is really better. But if you cannot be sure that (requiredSize * K1) is better than (oldSize * K2), I think you should use (oldSize * k2). I have been tought this stuffs a couple of years ago, so I might be mistaken. Moreover I did not make the experience myself. regards
Dec 21 2008