www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Often repeated array allocations

reply "Namespace" <rswhite4 googlemail.com> writes:
Let us assume we have a method of a class which is used often and 
the method is called periodically and must allocate every time a 
array between 100 and 4000 elements. What would you do?

1. Simple: float[] array;
2. Reserve: float[] array; array.reserve(N); /// N is a parameter 
value
3. Use manual memory allocation (e.g. with malloc) and free the 
memory immediately at the end of the scope.
4. Use stack allocated memory (But maybe 4000 is too big?)


I can then control when _exactly_ the memory is released. But I 
want to hear other opinions. :)
Jul 20 2013
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
21-Jul-2013 00:19, Namespace пишет:
 Let us assume we have a method of a class which is used often and the
 method is called periodically and must allocate every time a array
 between 100 and 4000 elements. What would you do?

 1. Simple: float[] array;
 2. Reserve: float[] array; array.reserve(N); /// N is a parameter value
 3. Use manual memory allocation (e.g. with malloc) and free the memory
 immediately at the end of the scope.
 4. Use stack allocated memory (But maybe 4000 is too big?)


 then control when _exactly_ the memory is released. But I want to hear
 other opinions. :)
5. Keep a TLS scratch pad buffer (static class member) for said 100-4000 floats and re-use it. -- Dmitry Olshansky
Jul 20 2013
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/20/2013 01:22 PM, Dmitry Olshansky wrote:

 21-Jul-2013 00:19, Namespace пишет:
 Let us assume we have a method of a class which is used often and the
 method is called periodically and must allocate every time a array
 between 100 and 4000 elements. What would you do?
 5. Keep a TLS scratch pad buffer (static  class member) for said
 100-4000 floats and re-use it.
As long as the function finishes with that buffer, there shouldn't be reentrancy issues in general. But if the function calls the same function, say on another object perhaps indirectly, then the two objects would be sharing the same buffer: class C { void foo() { /* modify the static buffer here */ auto c = new C(); c.foo(); /* the static buffer has changed */ } } Ali
Jul 20 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
21-Jul-2013 00:42, Ali Çehreli пишет:
 On 07/20/2013 01:22 PM, Dmitry Olshansky wrote:

  > 21-Jul-2013 00:19, Namespace пишет:
  >> Let us assume we have a method of a class which is used often and the
  >> method is called periodically and must allocate every time a array
  >> between 100 and 4000 elements. What would you do?

  > 5. Keep a TLS scratch pad buffer (static  class member) for said
  > 100-4000 floats and re-use it.

 As long as the function finishes with that buffer, there shouldn't be
 reentrancy issues in general. But if the function calls the same
 function, say on another object perhaps indirectly, then the two objects
 would be sharing the same buffer:
Yes, this case is problematic. It cannot be called recursively even with another instance.
 class C
 {
      void foo() {
          /* modify the static buffer here */
          auto c = new C();
          c.foo();
          /* the static buffer has changed */
      }
 }

 Ali
-- Dmitry Olshansky
Jul 21 2013
prev sibling parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Saturday, 20 July 2013 at 20:22:56 UTC, Dmitry Olshansky wrote:
 21-Jul-2013 00:19, Namespace пишет:
 Let us assume we have a method of a class which is used often 
 and the
 method is called periodically and must allocate every time a 
 array
 between 100 and 4000 elements. What would you do?

 1. Simple: float[] array;
 2. Reserve: float[] array; array.reserve(N); /// N is a 
 parameter value
 3. Use manual memory allocation (e.g. with malloc) and free 
 the memory
 immediately at the end of the scope.
 4. Use stack allocated memory (But maybe 4000 is too big?)


 because I can
 then control when _exactly_ the memory is released. But I want 
 to hear
 other opinions. :)
5. Keep a TLS scratch pad buffer (static class member) for said 100-4000 floats and re-use it.
TLS scratch pad buffer?
Jul 20 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
21-Jul-2013 00:46, Namespace пишет:
 On Saturday, 20 July 2013 at 20:22:56 UTC, Dmitry Olshansky wrote:
 21-Jul-2013 00:19, Namespace пишет:
 Let us assume we have a method of a class which is used often and the
 method is called periodically and must allocate every time a array
 between 100 and 4000 elements. What would you do?

 1. Simple: float[] array;
 2. Reserve: float[] array; array.reserve(N); /// N is a parameter value
 3. Use manual memory allocation (e.g. with malloc) and free the memory
 immediately at the end of the scope.
 4. Use stack allocated memory (But maybe 4000 is too big?)


 then control when _exactly_ the memory is released. But I want to hear
 other opinions. :)
5. Keep a TLS scratch pad buffer (static class member) for said 100-4000 floats and re-use it.
TLS scratch pad buffer?
My analogy goes as follows: a chunk of memory for temporary needs => scratch pad (as in sheet of paper for quick notes/sketches). Something along the lines of: class A{ static float[] buffer; static this(){ buffer = new float[as_big_as_it_gets]; } void foo(){ float[] tempAlloc = buffer[0..need_this_much]; tempAlloc[] = 0.0; ... } } As long as foo is not called recursively should just work. Other thing that may wreck this is if foo is called in Fiber context and uses yeild internally. One may as well fall back to option 3 in rare cases where scratch pad is too small to fit the bill. -- Dmitry Olshansky
Jul 21 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
 My analogy goes as follows: a chunk of memory for temporary 
 needs => scratch pad (as in sheet of paper for quick 
 notes/sketches).

 Something along the lines of:

 class A{
 	static float[] buffer;
 	static this(){
 		buffer = new float[as_big_as_it_gets];
 	}

 	void foo(){
 		float[] tempAlloc = buffer[0..need_this_much];
 		tempAlloc[] = 0.0;
 		...
 	}	
 }

 As long as foo is not called recursively should just work. 
 Other thing that may wreck this is if foo is called in Fiber 
 context and uses yeild internally.

 One may as well fall back to option 3 in rare cases where 
 scratch pad is too small to fit the bill.
I really like the idea. I've changed it a bit: I have a float[1024] buffer which is used, as long as the requested size is less than 1024. If it's greater, I will temporary allocate the whole array with new float[Size]; Any improvements? Or is 1024 to small / big?
Jul 21 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 I have a float[1024] buffer which is used, as long as the 
 requested size is less than 1024. If it's greater, I will 
 temporary allocate the whole array with new float[Size];
That's one of the ways I solve similar problems.
 Any improvements? Or is 1024 to small / big?
An idea is to add some debug{} code that computes statistics for the array sizes... Bye, bearophile
Jul 21 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Sunday, 21 July 2013 at 21:31:08 UTC, bearophile wrote:
 Namespace:

 I have a float[1024] buffer which is used, as long as the 
 requested size is less than 1024. If it's greater, I will 
 temporary allocate the whole array with new float[Size];
That's one of the ways I solve similar problems.
 Any improvements? Or is 1024 to small / big?
An idea is to add some debug{} code that computes statistics for the array sizes... Bye, bearophile
Too much effort. But if some of you have this experiences already, I would gladly profit from them. ;)
Jul 21 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 21 July 2013 at 21:35:15 UTC, Namespace wrote:
 On Sunday, 21 July 2013 at 21:31:08 UTC, bearophile wrote:
 Namespace:

 I have a float[1024] buffer which is used, as long as the 
 requested size is less than 1024. If it's greater, I will 
 temporary allocate the whole array with new float[Size];
That's one of the ways I solve similar problems.
 Any improvements? Or is 1024 to small / big?
An idea is to add some debug{} code that computes statistics for the array sizes... Bye, bearophile
Too much effort. But if some of you have this experiences already, I would gladly profit from them. ;)
It would really help to know what your function is doing exactly. Personally, I think you are over thinking it. Just do this: //---- void foo () { static float[] scratchPadBuffer; // use scratchPadBuffer here } //---- And simply make buffer grow as it needs, and never shorten it. You could consider this to be "controlled leak" I guess. But at least, with this approach, you can't "guess wrong", and you'll only very rarely allocate.
Jul 22 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
It's part of Dgame and it's the shape draw method. The method 
collects the vertices and give them to the vertex buffer.
I will try it with my current way: static stack buffer with a 
size of 1024 and otherwise I will temporary allocate memory. But 
how much shapes grow larger than 1024 vertices ;).
Jul 22 2013
parent reply David <d dav1d.de> writes:
Am 22.07.2013 10:41, schrieb Namespace:
 It's part of Dgame and it's the shape draw method. The method collects
 the vertices and give them to the vertex buffer.
 I will try it with my current way: static stack buffer with a size of
 1024 and otherwise I will temporary allocate memory. But how much shapes
 grow larger than 1024 vertices ;).
This will be over 1024 vertices in no time, I am currently drawing vertices with a size of ~500MiB depending on the terrain. The CPU doesn't need to know of all vertices at once, so collect them and upload them to the GPU if they go over a certain limit and begin to fill your buffer from the start again. I use a malloc allocated buffer, its size is never decreased, I initially allocate an educated guess + some extra to be safe, then if it really happens to run ot of memory I simple resize it depending on how much is left with another educated guess + some extra.
Jul 22 2013
parent "Namespace" <rswhite4 googlemail.com> writes:
On Monday, 22 July 2013 at 09:26:33 UTC, David wrote:
 Am 22.07.2013 10:41, schrieb Namespace:
 It's part of Dgame and it's the shape draw method. The method 
 collects
 the vertices and give them to the vertex buffer.
 I will try it with my current way: static stack buffer with a 
 size of
 1024 and otherwise I will temporary allocate memory. But how 
 much shapes
 grow larger than 1024 vertices ;).
This will be over 1024 vertices in no time, I am currently drawing vertices with a size of ~500MiB depending on the terrain. The CPU doesn't need to know of all vertices at once, so collect them and upload them to the GPU if they go over a certain limit and begin to fill your buffer from the start again. I use a malloc allocated buffer, its size is never decreased, I initially allocate an educated guess + some extra to be safe, then if it really happens to run ot of memory I simple resize it depending on how much is left with another educated guess + some extra.
Hm, but Dgame isn't designed for that sort of thing. It's more like SFML for C++. So do you think that such Framework will ever use more vertices than 1024? Anyway, if it's larger, I allocate with malloc enough memory and free it at the end of the scope.
Jul 22 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
22-Jul-2013 01:19, Namespace пишет:

 I really like the idea.
 I've changed it a bit:
 I have a float[1024] buffer which is used, as long as the requested size
 is less than 1024. If it's greater, I will temporary allocate the whole
 array with new float[Size];
 Any improvements? Or is 1024 to small / big?
No one size fits all. Measure and profit. -- Dmitry Olshansky
Jul 21 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 But I want to hear other opinions. :)
D also supports alloca(), that for 1D arrays is almost bearable. See also: http://d.puremagic.com/issues/show_bug.cgi?id=9832 Bye, bearophile
Jul 20 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Saturday, 20 July 2013 at 20:34:50 UTC, bearophile wrote:
 Namespace:

 But I want to hear other opinions. :)
D also supports alloca(), that for 1D arrays is almost bearable. See also: http://d.puremagic.com/issues/show_bug.cgi?id=9832 Bye, bearophile
Yeah, but aren't 4000 elements a bit much for a stack allocated array?
Jul 20 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 Yeah, but aren't 4000 elements a bit much for a stack allocated 
 array?
4000 floats take about 16 KB. If your function is not recursive and it's not much transitively recursive, then I think it's acceptable. But how much stack are your other functions using? You have to estimate if you have enough stack space. Today it's very easy to have 50+ MB of stack, if necessary. DMD on Windows supports a linker command like this: -L/STACK:200000000 Bye, bearophile
Jul 20 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Saturday, 20 July 2013 at 21:39:31 UTC, bearophile wrote:
 Namespace:

 Yeah, but aren't 4000 elements a bit much for a stack 
 allocated array?
4000 floats take about 16 KB. If your function is not recursive and it's not much transitively recursive, then I think it's acceptable. But how much stack are your other functions using? You have to estimate if you have enough stack space. Today it's very easy to have 50+ MB of stack, if necessary. DMD on Windows supports a linker command like this: -L/STACK:200000000 Bye, bearophile
So, you would never allocate with float[]?
Jul 20 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 So, you would never allocate with float[]?
Generally in D I allocate on the heap with new, and once in a while with minimallyInitializedArray. Stack-allocation is left for situations where max performance is needed. Like you have to build a tree of string distances, and you use a Levenshtein distance to compute them. Usually a Levenshtein distance needs one buffer, or two. If you want to put one million of strings in such tree of distances, you can't allocate one billions of buffers, it's too much wasted time. For such case I allocate the Levenshtein buffer with an alloca (or I allocate the buffer before the function, etc, there are many ways). I think the usual D code uses too much heap allocation. Allocations on the heap are slow. If you take a look at modern Ada code (like Ada2012 code) you see very little heap allocations, or no heap allocations at all :-) Probably Rust code will avoid lot of of heap allocations. Bye, bearophile
Jul 20 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
But D isn't like Ada. It's more like C++ and there Heap 
allocations is often used too.
It would be really cool if we had allocators already.
Something like:
----
with (AllocatorX) { /// will use malloc and free instead of 
calling the GC
     float[] arr;
     arr ~= 42;
}
----

And I still don't know what a 'TLS scratch pad buffer' is.
Jul 20 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-07-21 08:45, Namespace wrote:
 But D isn't like Ada. It's more like C++ and there Heap allocations is
 often used too.
 It would be really cool if we had allocators already.
 Something like:
 ----
 with (AllocatorX) { /// will use malloc and free instead of calling the GC
      float[] arr;
      arr ~= 42;
 }
 ----

 And I still don't know what a 'TLS scratch pad buffer' is.
Perhaps: float[4000] scratchPadBuffer; void foo () { // use scratchPadBuffer here } I guess he just refers to a some temporary data you need during the execution of a function and at the end of the function you don't care about it. -- /Jacob Carlborg
Jul 21 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Sunday, 21 July 2013 at 09:16:47 UTC, Jacob Carlborg wrote:
 On 2013-07-21 08:45, Namespace wrote:
 But D isn't like Ada. It's more like C++ and there Heap 
 allocations is
 often used too.
 It would be really cool if we had allocators already.
 Something like:
 ----
 with (AllocatorX) { /// will use malloc and free instead of 
 calling the GC
     float[] arr;
     arr ~= 42;
 }
 ----

 And I still don't know what a 'TLS scratch pad buffer' is.
Perhaps: float[4000] scratchPadBuffer; void foo () { // use scratchPadBuffer here } I guess he just refers to a some temporary data you need during the execution of a function and at the end of the function you don't care about it.
But then I have mostly far too much and maybe a few times a bit too less store. It's not flexible. But maybe with a smaller granule. What's about this: ---- struct Chunk(T, ushort maxSize = 1024) { public: static T[maxSize] _chunk; T* ptr; size_t length; size_t capacity; this(size_t capacity) { this.capacity = capacity; if (capacity < maxSize) this.ptr = &_chunk[0]; else this.ptr = cast(T*) .malloc(this.capacity * T.sizeof); } disable this(this); ~this() { if (this.ptr && this.capacity > maxSize) .free(this.ptr); } void ensureAddable(size_t capacity) { if (capacity > this.capacity) { this.capacity = capacity; if (this.capacity > maxSize) this.ptr = cast(T*) .realloc(this.ptr, this.capacity * T.sizeof); } } void opOpAssign(string op : "~", U)(auto ref U item) { this.ptr[this.length++] = cast(T) item; } void opOpAssign(string op : "~", U, size_t n)(auto ref U[n] items) { foreach (ref U item; items) { this.ptr[this.length++] = cast(T) item; } } } ----
Jul 21 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-07-21 12:18, Namespace wrote:

 But then I have mostly far too much and maybe a few times a bit too less
 store. It's not flexible. But maybe with a smaller granule.
 What's about this:
You said you needed between 100 and 4000 floats. My suggestion will allocate 4000 floats once per thread. Use what you need from the array. I guess your solution with Chunk encapsulate this usage pattern. -- /Jacob Carlborg
Jul 22 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 But D isn't like Ada.
But D should be a bit more like Ada, for such things :-) Bye, bearophile
Jul 21 2013
parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Sunday, 21 July 2013 at 10:30:16 UTC, bearophile wrote:
 Namespace:

 But D isn't like Ada.
But D should be a bit more like Ada, for such things :-) Bye, bearophile
I agree, but how? ;) You know by yourself that Walter & Andrei are very hard to convince by such things.
Jul 21 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
Namespace:

 I agree, but how? ;)
 You know by yourself that Walter & Andrei are very hard to 
 convince by such things.
I think there is only a small number of Ada ideas worth copying, like: - Integral overflows; - Ranged integers; - A library of collections allocated in-place in the standard library (Bounded Collections of Ada2012); - More deterministic multi processing; - Nicer and more handy enumerated sets; - No undefined behaviours; - Stack-allocated fixed-length arrays; - etc. Some of such things are work in progress (no undefined behaviours), some of them can go in the D standard library (bounded collections, variable length arrays with a bit of help from the compiler), some of them can be implemented with help from the compiler (partially library-defined integral type with overflow tests). Some of them maybe can be implemented once some bugs are sorted out and alias this gets better (ranged integers). Some of them are just currently not present in D (more deterministic multi processing), but D is supposed to have something to replace them ("scope", a better "shared", Walter is introducing a bit of object ownership behind the curtains, etc). And then it becomes a matter of using some of those idioms. Bye, bearophile
Jul 21 2013