www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - why allocation of large amount of small objects so slow (x10) in D?

reply nobody <no where.com> writes:
$ g++     alloc.cpp   -o alloc
$ time ./alloc
real    0m1.946s
user    0m1.688s
sys     0m0.256s

$ dmd -O -release allocd.d
$ time ./allocd
real    0m22.734s
user    0m22.353s
sys     0m0.360s

$ cat alloc.cpp
#include <vector>

typedef std::vector<int> intvec;
typedef intvec* intvecp;

int main() {
  int i, n = 20000000;
  intvecp* iva;
  iva = new intvecp[n];
  for (i = n; i-- > 0; ) {
    iva[i] = new intvec();
  }

  return 0;
}

$ cat allocd.d
int main() {
  int i, n = 20000000;
  Object[] oa;
  oa = new Object[n];
  for (i = n; i-- > 0; ) {
    oa[i] = new Object();
  }

  return 0;
}
May 21 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
nobody, el 22 de mayo a las 00:08 me escribiste:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 
 $ cat alloc.cpp
 #include <vector>
 
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
 
   return 0;
 }
 
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
 
   return 0;
 }
You mean why GCC C++ compiler is faster than DM D compiler. You are comparing 2 different compiler implementations for 2 different languages. It's really hard to get any conclusion from that. It would be much more meaningfull if you compare g++ vs gdc or dmd vs dmc or ldc vs llvm-g++ (they at least use the same backend). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Oiganmen ñatos de corazón, es más posible que un potus florezca en primavera a que un ángel pase con una remera. -- Peperino Pómoro
May 21 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from nobody (no where.com)'s article
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 $ cat alloc.cpp
 #include <vector>
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
   return 0;
 }
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
   return 0;
 }
You've probably hit a corner case for the garbage collector. When you allocate 20,000,000 Object instances and hold onto the references, the garbage collector probably runs about a zillion times and never finds anything to delete. If this is a bottleneck, you should temporarily disable the garbage collector in these situations, until you are done with all these allocations, then re-enable it.
May 21 2009
parent reply nobody <no where.com> writes:
 You've probably hit a corner case for the garbage collector.  When you allocate
 20,000,000 Object instances and hold onto the references, the garbage collector
 probably runs about a zillion times and never finds anything to delete.  If
this
 is a bottleneck, you should temporarily disable the garbage collector in these
 situations, until you are done with all these allocations, then re-enable it.
Thanks. Disble gc improve the speed by 5x: $ dmd -O -release allocd.d $ time ./allocd real 0m4.080s user 0m3.644s sys 0m0.420s $ cat allocd.d import core.memory; int main() { core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; }
May 21 2009
parent Sean Kelly <sean invisibleduck.org> writes:
nobody wrote:
 You've probably hit a corner case for the garbage collector.  When you allocate
 20,000,000 Object instances and hold onto the references, the garbage collector
 probably runs about a zillion times and never finds anything to delete.  If
this
 is a bottleneck, you should temporarily disable the garbage collector in these
 situations, until you are done with all these allocations, then re-enable it.
Thanks. Disble gc improve the speed by 5x: $ dmd -O -release allocd.d $ time ./allocd real 0m4.080s user 0m3.644s sys 0m0.420s $ cat allocd.d import core.memory; int main() { core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; }
Try: import core.memory; int main() { core.memory.GC.reserve(80000000); core.memory.GC.disable(); int i, n = 20000000; Object[] oa; oa = new Object[n]; for (i = n; i-- > 0; ) { oa[i] = new Object(); } core.memory.GC.enable(); return 0; } If the call to reserve fails you could try breaking it up into two calls for half the space each.
May 21 2009
prev sibling next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
nobody wrote:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s
 
 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s
 
 $ cat alloc.cpp
 #include <vector>
 
 typedef std::vector<int> intvec;
 typedef intvec* intvecp;
 
 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }
 
   return 0;
 }
 
 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }
 
   return 0;
 }
I use this a structure for arena-based memory allocation (attached). Example of use: import candy.util.MemPool MemStack!() stack; class MyObject { mixin MemPoolNew!(stack); } int main() { stack.push(); int i, n = 20000000; MyObject[] oa; oa = new MyObject[n]; for (i = n; i-- > 0; ) { oa[i] = new MyObject(); } stack.pop(); return 0; } The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually. -------------------- /** * Provides a pool of GCed memory to allocate things from a block. * This maintains cache coherency for related types (i.e. tree nodes). * It doesn't garuntee any ordering, though, the array struct should be * used for that. Also, everything has to be freed at once, freeing one * portion of this has no effect. * * Based on a similar concept posted by bearophile at: * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=88227 */ public struct MemPool(size_t BLOCK_SIZE = 1 << 14) { private void* next; // Next available block private void* end; // End of the current block private void*[] blocks; public void* alloc(size_t sz) { sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a multiple of 8 if (this.next + sz >= this.end) { void* blk = GC.calloc(BLOCK_SIZE); this.blocks.length = this.blocks.length + 1; this.blocks[$ - 1] = blk; this.next = blk; this.end = blk + BLOCK_SIZE; } void* ret = this.next; this.next += sz; return ret; } public void free() { foreach(blk; this.blocks) GC.free(blk); this.blocks = null; this.blocks.length = 0; this.next = null; this.end = null; } } /** * Wrapper for MemPool that allocates the given struct */ public struct StructPool(T) { private MemPool!() pool; public T* alloc() { return cast(T*) pool.alloc(T.sizeof); } } public struct MemStack(size_t BLOCK_SIZE = 1 << 14) { private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack; public static const size_t MAX_ALLOC = BLOCK_SIZE; public void* alloc(size_t sz) { return stack.peek().alloc(sz); } public void push() { stack.push(new MemPool!(BLOCK_SIZE)); } public void pop() { stack.pop().free(); } } /** * Placement new mixin for allocating from a memory pool. Benchmarks show this * as faster than the D new in real usage (i.e. the parser runs about 1.2x * faster using this). */ public template MemPoolNew(alias Pool) { version(NoMemPool) { } else { public final new(uint sz) { return Pool.alloc(sz); } public final delete(void *p) { } } }
May 21 2009
parent Robert Fraser <fraserofthenight gmail.com> writes:
Robert Fraser wrote:
 nobody wrote:
 $ g++     alloc.cpp   -o alloc
 $ time ./alloc
 real    0m1.946s
 user    0m1.688s
 sys     0m0.256s

 $ dmd -O -release allocd.d
 $ time ./allocd
 real    0m22.734s
 user    0m22.353s
 sys     0m0.360s

 $ cat alloc.cpp
 #include <vector>

 typedef std::vector<int> intvec;
 typedef intvec* intvecp;

 int main() {
   int i, n = 20000000;
   intvecp* iva;
   iva = new intvecp[n];
   for (i = n; i-- > 0; ) {
     iva[i] = new intvec();
   }

   return 0;
 }

 $ cat allocd.d
 int main() {
   int i, n = 20000000;
   Object[] oa;
   oa = new Object[n];
   for (i = n; i-- > 0; ) {
     oa[i] = new Object();
   }

   return 0;
 }
I use this a structure for arena-based memory allocation (attached). Example of use: import candy.util.MemPool MemStack!() stack; class MyObject { mixin MemPoolNew!(stack); } int main() { stack.push(); int i, n = 20000000; MyObject[] oa; oa = new MyObject[n]; for (i = n; i-- > 0; ) { oa[i] = new MyObject(); } stack.pop(); return 0; } The push() and pop() allows memory to be allocated and deallocated as large blocks. However, you shouldn't need to deallocate manually -- it's GCed memory, so ideally the GC should free it when it's no longer referenced. That being said, I've run some tests, and the GC will free it *eventually*, but it allocates 4-6x as much memory as it needs before it starts freeing it, even when GC.collect() is called manually. -------------------- /** * Provides a pool of GCed memory to allocate things from a block. * This maintains cache coherency for related types (i.e. tree nodes). * It doesn't garuntee any ordering, though, the array struct should be * used for that. Also, everything has to be freed at once, freeing one * portion of this has no effect. * * Based on a similar concept posted by bearophile at: * http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=88227 */ public struct MemPool(size_t BLOCK_SIZE = 1 << 14) { private void* next; // Next available block private void* end; // End of the current block private void*[] blocks; public void* alloc(size_t sz) { sz = ((sz + 7) & ~7); // Next multiple of 8 if this isn't a multiple of 8 if (this.next + sz >= this.end) { void* blk = GC.calloc(BLOCK_SIZE); this.blocks.length = this.blocks.length + 1; this.blocks[$ - 1] = blk; this.next = blk; this.end = blk + BLOCK_SIZE; } void* ret = this.next; this.next += sz; return ret; } public void free() { foreach(blk; this.blocks) GC.free(blk); this.blocks = null; this.blocks.length = 0; this.next = null; this.end = null; } } /** * Wrapper for MemPool that allocates the given struct */ public struct StructPool(T) { private MemPool!() pool; public T* alloc() { return cast(T*) pool.alloc(T.sizeof); } } public struct MemStack(size_t BLOCK_SIZE = 1 << 14) { private Stack!(MemPool!(BLOCK_SIZE)*, 16, true, true) stack; public static const size_t MAX_ALLOC = BLOCK_SIZE; public void* alloc(size_t sz) { return stack.peek().alloc(sz); } public void push() { stack.push(new MemPool!(BLOCK_SIZE)); } public void pop() { stack.pop().free(); } } /** * Placement new mixin for allocating from a memory pool. Benchmarks show this * as faster than the D new in real usage (i.e. the parser runs about 1.2x * faster using this). */ public template MemPoolNew(alias Pool) { version(NoMemPool) { } else { public final new(uint sz) { return Pool.alloc(sz); } public final delete(void *p) { } } }
Oops, that needs another module. Okay, both are attached in a compile-able form.
May 21 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
nobody, you can see how the GC interacts with your code also with the following
small D1 program:

import std.c.stdlib: malloc;
struct S { int i; }
void main() {
    const N = 20_000_000;

    static if (true) // change this to false
        S** sa = cast(S**)calloc(N, (S*).sizeof);
    else
        auto sa = new S*[N];

    for (int i = N; i-- > 0; )
        sa[i] = new S;
}

(Here I have assumed that null == 0, this isn't true on all CPUs).
If you change the static if to false you will see the program become (on
Windows) about 2.5 times slower (if you use malloc it gets even faster, but
it's not a fair comparison, because new clears the array).
If you add a disable() before the array creation (and the static if is false),
the code gets significantly faster.

Bye,
bearophile
May 22 2009