digitalmars.D - GC performances
- bearophile (16/16) Nov 26 2007 This Shootout page shows an interesting comparison between Java6 and D:
- Lutger (17/17) Nov 26 2007 Another interesting thing here is that the Java version beats all other
- bearophile (80/91) Nov 26 2007 Because the MMM is done in a vey naive way. See at the bottom of the pag...
- bearophile (3/4) Nov 26 2007 Read that as:
- Lutger (85/93) Nov 26 2007 I've hacked up an array based implementation without recursion and
- Lutger (3/4) Nov 26 2007 Oops, that's wrong of course, but not used in this benchmark.
- Craig Black (97/190) Nov 26 2007 Here's an array-based C++ implementation from the shootout. It performs...
- bearophile (5/8) Nov 26 2007 Allocating huge chunks is slow on Win, look at my better version here:
- Lutger (5/16) Nov 26 2007 The code I posted uses an approach similar to the C++ version. There are...
- BCS (13/35) Nov 26 2007 I wonder how fast it would be to just malloc a huge block of ram and use...
- Craig Black (4/38) Nov 27 2007 Yes this is very fast but causes huge amounts of fragmentation and thus
- Sean Kelly (5/49) Nov 27 2007 It's not even necessarily that fast. In the research I've seen, custom
- Kenny TM~ (9/30) Nov 26 2007 Aside, if the (your) code is compiled with Tango instead the code runs
- Craig Black (7/40) Nov 26 2007 The modern streamlined Java GC may win against an archaic manual memory
- bearophile (6/11) Nov 26 2007 There is this widely used one too, by Doug Lea:
- Craig Black (4/16) Nov 26 2007 nedmalloc is based on Doug Lea's allocator I think. It improves on it a...
- janderson (4/51) Nov 26 2007 I agree. Someone requested this ages ago. I wonder why hasn't this
- Bill Baxter (3/58) Nov 26 2007 Priorities?
This Shootout page shows an interesting comparison between Java6 and D: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=all As you can see there are two D versions, here I have written the slow one (the fast D version uses manual memory management (MMM), the slow version leaves it to the built-in GC). I think my version is equal to the faster solution, that run on Java (it uses automatic GC), you can see both versions here: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=dlang&id=0 http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=javaxx&id=2 The interesting thing is that despite the code being essentially the same, the D version run 9.1 times slower than the Java one. Some notes: - This shootout test is designed to stress the GC. - From my experience on this demo, I think that speed difference (between that Java and the slow D version) is caused mostly by the GC, so the D GC is quite worse than the Java6 one. - There the Java6 GC receives some hints from the programmer, those "-server -Xms64m" (but not -Xbatch, it disables background compilation) on the command line, I presume the D GC may someday accept similar hints. - You may take a look at he memory used too: Java6 with those GC hints: 40.3 MB (the -Xms64m flag sets the initial Java heap size to 64 MB), D MMM: 4.7 MB, D with GC: 17.6 MB. - The Java code shows how most people writes code, because manually managing memory is okay in a little benchmark like this, but if you have a very large system it's not easy to manage every memory usage by hand. - There are even faster ways to solve this problem using D, I have seen that using a free list the code becomes even faster than the MMM one. And there are ways to speed that up too a lot. - Today the source code of Java6 can be read, it's open source, so some of the ideas used by GC can be borrowed to the D GC (but it may require some work). Bye, bearophile
Nov 26 2007
Another interesting thing here is that the Java version beats all other languages too, including the ones with manual memory management. But it does cost a lot of memory. I would guess that this is an ideal case for a copying garbage collector? I wonder how much of an improvement such a GC will have in normal applications, and how hard it will be to implement in D. Another concern is what changes are needed in client code to work nicely with a copying GC. But this example does show how relative these microbenchmarks are. After all, it costs the Java version more than ten times the memory to outperform D, how will that affect the speed of a complex application? btw. I tried implementing this with a freelist, as suggested by the article at the digitalmars site, but it was slower than manual memory management. Might be because the code for allocation wasn't inlined for some reason. Likely not the intent of the benchmark, but an array based implementation could be faster here.
Nov 26 2007
Lutger Wrote:Another interesting thing here is that the Java version beats all other languages too, including the ones with manual memory management.Because the MMM is done in a vey naive way. See at the bottom of the page for faster versions: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=all#altBut this example does show how relative these microbenchmarks are. After all, it costs the Java version more than ten times the memory to outperform D, how will that affect the speed of a complex application?At the moment people use Java on big-iron servers, not D. Programming efforts often cost less than the RAM, it seems.btw. I tried implementing this with a freelist, as suggested by the article at the digitalmars site, but it was slower than manual memory management. Might be because the code for allocation wasn't inlined for some reason.Try beating this on the last DMD: static import std.gc; import std.c.stdlib: malloc; import std.conv: toInt; struct TyTreeNode { TyTreeNode* left, right; int item; } static TyTreeNode* freelist; TyTreeNode* newTreeNode(int item, TyTreeNode* left=null, TyTreeNode* right=null) { TyTreeNode* t = freelist; freelist = freelist.left; t.left = left; t.right = right; t.item = item; return t; } int itemCheck(TyTreeNode* tree) { if (tree.left is null) return tree.item; else return tree.item + itemCheck(tree.left) - itemCheck(tree.right); } TyTreeNode* makeTree(int item, uint depth) { if (depth > 0) return newTreeNode(item, makeTree(2*item-1, depth-1), makeTree(2*item, depth-1)); else return newTreeNode(item); } void deleteTree(TyTreeNode* tree) { if (tree.left !is null) { deleteTree(tree.left); deleteTree(tree.right); } tree.left = freelist; freelist = tree; } void main(string[] args) { std.gc.disable(); const minDepth = 4; int n = (args.length > 1) ? toInt(args[1]) : 2; int maxDepth = (minDepth + 2) > n ? minDepth + 2 : n; const int nnodes = ((1 << 16) - 1) / TyTreeNode.sizeof; const int block_len = nnodes > 0 ? nnodes : 1; uint m = n > 6 ? (1<<(n+2)) : 255; while (m > 0) { uint toAllocate = m > block_len ? block_len : m; m -= toAllocate; auto pool = cast(TyTreeNode*)malloc(TyTreeNode.sizeof * toAllocate); pool[toAllocate-1].left = freelist; freelist = pool; for (uint i = 0; i < toAllocate-1; i++) pool[i].left = &(pool[i+1]); } auto stretchTree = makeTree(0, maxDepth + 1); printf("stretch tree of depth %u\t check: %li\n", maxDepth + 1, itemCheck(stretchTree)); deleteTree(stretchTree); auto longLivedTree = makeTree(0, maxDepth); for (int depth = minDepth; depth <= maxDepth; depth += 2) { int iterations = 1 << (maxDepth - depth + minDepth); int check = 0; for (int i = 1; i <= iterations; i++) { auto tempTree = makeTree(i, depth); check += itemCheck(tempTree); deleteTree(tempTree); tempTree = makeTree(-i, depth); check += itemCheck(tempTree); deleteTree(tempTree); } printf("%li\t trees of depth %u\t check: %li\n", iterations*2, depth, check); } printf("int lived tree of depth %u\t check: %li\n", maxDepth, itemCheck(longLivedTree) ); }Likely not the intent of the benchmark, but an array based implementation could be faster here.It seems slower. Bye, bearophile
Nov 26 2007
bearophile:Programming efforts often cost less than the RAM, it seems.Read that as: Programming efforts often cost more than the RAM, it seems.
Nov 26 2007
bearophile wrote:Lutger Wrote:...I've hacked up an array based implementation without recursion and allocating whole trees at once. Although it produces the same results, it's probably flawed somewhere or the profiling is wrong I think. Anyway it runs much faster, perhaps you care to take a look. I'm not so good at these things: import std.stdio, std.conv; import std.math; int left(int i) { return i * 2; } int right(int i) { return ++i * 2; } int parent(int i) { return i / 2; } struct Tree { static Tree bottumUpTree(int item, int depth) { Tree result; result.depth = depth; if (depth > 0) { auto len = pow(depth, cast(real)1); result.children.length = cast(uint)len; } else result.children.length = 1; result.children[0] = item; for (int i = 1; i < result.children.length; i+=2) { result.children[i] = (result.children[parent(i)] * 2) - 1; result.children[i + 1] = result.children[parent(i)] * 2; } return result; } int depth; int check() { int result = children[0]; auto copy = children.dup; for (int i = children.length - ((depth * 2) -1); i < children.length; i+=2) { auto p = parent(i); copy[p] += children[i] - children[i+1]; } for (int i = children.length - ((depth * 2) - 2); i > 0; i-=2) { auto p = parent(i); copy[p] += children[i-1] - children[i]; } result+= copy[1] - copy[2]; return result; } void deleteTree() { delete children; } int[] children; private int item; } void main(string[] args) { const minDepth = 4; int n = (args.length > 1) ? toInt(args[1]) : 2; int maxDepth = (minDepth + 2) > n ? minDepth + 2 : n; auto t = Tree.bottumUpTree(0, maxDepth + 1); writefln("stretch tree of depth ", maxDepth + 1, "\t check: ", t.check()); t.deleteTree(); auto longLivedTree = Tree.bottumUpTree(0, maxDepth); for (int depth = minDepth; depth <= maxDepth; depth += 2) { int check; int iterations = 1 << (maxDepth - depth + minDepth); for (int i = 1; i <= iterations; i++) { auto t1 = Tree.bottumUpTree(i, depth); check += t1.check(); t1.deleteTree(); auto t2 = Tree.bottumUpTree(-i, depth); check += t2.check(); t2.deleteTree(); } writefln(iterations * 2, "\t trees of depth ", depth, "\t check: ", check); } writefln("long lived tree of depth ", maxDepth, "\t check: ", longLivedTree.check); longLivedTree.deleteTree(); }Likely not the intent of the benchmark, but an array based implementation could be faster here.It seems slower. Bye, bearophile
Nov 26 2007
Lutger wrote: ...int right(int i) { return ++i * 2; }Oops, that's wrong of course, but not used in this benchmark.
Nov 26 2007
"Lutger" <lutger.blijdestijn gmail.com> wrote in message news:fiekma$otu$1 digitalmars.com...bearophile wrote:Here's an array-based C++ implementation from the shootout. It performs the best but was disqualified because it allocates everything in one big chunk and then uses placement new to allocate each node. /* The Computer Language Shootout * http://shootout.alioth.debian.org/ * Contributed by Paul Kitchin */ #include <iostream> #include <sstream> class Tree { struct Node { Node(int value, std::size_t depth, std::size_t index, Node * nodes, std::size_t max) : value(value) { if (index * 2 + 2 < max) { new (nodes + index * 2 + 1) Node(2 * value - 1, depth - 1, index * 2 + 1, nodes, max); new (nodes + index * 2 + 2) Node(2 * value, depth - 1, index * 2 + 2, nodes, max); } } int check(std::size_t index, Node * nodes, std::size_t max) const { if (index * 2 + 2 < max) { return nodes[index * 2 + 1].check(index * 2 + 1, nodes, max) + value - nodes[index * 2 + 2].check(index * 2 + 2, nodes, max); } return value; } int value; }; public: Tree(int value, std::size_t depth) : size((2 << depth) - 1), nodes(static_cast< Node * >(operator new(size * sizeof(Node)))) { new (nodes) Node(value, depth, 0, nodes, size); } ~Tree() { operator delete(nodes); } int check() const { return nodes->check(0, nodes, size); } private: std::size_t size; Node * nodes; }; int main(int argc, char * * argv) { std::size_t user_depth = 10; if (argc == 2) { std::istringstream convertor(argv[1]); if (!(convertor >> user_depth) || !convertor.eof()) { std::cerr << "Usage: " << argv[0] << " [n]\n"; std::cerr << "\tn must be an integer\n"; return 1; } } std::size_t minimum_depth = 4; std::size_t maximum_depth = std::max(minimum_depth + 2, user_depth); { Tree tree(0, maximum_depth + 1); std::cout << "stretch tree of depth " << (maximum_depth + 1) << "\t check: " << tree.check() << '\n'; } Tree long_lived_tree(0, maximum_depth); for (std::size_t depth = minimum_depth; depth <= maximum_depth; depth += 2) { int iterations = 1 << (maximum_depth - depth + minimum_depth); int check = 0; for (int iteration = 1; iteration <= iterations; ++iteration) { Tree first(iteration, depth); Tree second(-iteration, depth); check += first.check() + second.check(); } std::cout << (2 * iterations) << "\t trees of depth " << depth << "\t check: " << check << '\n'; } std::cout << "long lived tree of depth " << maximum_depth << "\t check: " << long_lived_tree.check() << '\n'; }Lutger Wrote:...I've hacked up an array based implementation without recursion and allocating whole trees at once. Although it produces the same results, it's probably flawed somewhere or the profiling is wrong I think. Anyway it runs much faster, perhaps you care to take a look. I'm not so good at these things: import std.stdio, std.conv; import std.math; int left(int i) { return i * 2; } int right(int i) { return ++i * 2; } int parent(int i) { return i / 2; } struct Tree { static Tree bottumUpTree(int item, int depth) { Tree result; result.depth = depth; if (depth > 0) { auto len = pow(depth, cast(real)1); result.children.length = cast(uint)len; } else result.children.length = 1; result.children[0] = item; for (int i = 1; i < result.children.length; i+=2) { result.children[i] = (result.children[parent(i)] * 2) - 1; result.children[i + 1] = result.children[parent(i)] * 2; } return result; } int depth; int check() { int result = children[0]; auto copy = children.dup; for (int i = children.length - ((depth * 2) -1); i < children.length; i+=2) { auto p = parent(i); copy[p] += children[i] - children[i+1]; } for (int i = children.length - ((depth * 2) - 2); i > 0; i-=2) { auto p = parent(i); copy[p] += children[i-1] - children[i]; } result+= copy[1] - copy[2]; return result; } void deleteTree() { delete children; } int[] children; private int item; } void main(string[] args) { const minDepth = 4; int n = (args.length > 1) ? toInt(args[1]) : 2; int maxDepth = (minDepth + 2) > n ? minDepth + 2 : n; auto t = Tree.bottumUpTree(0, maxDepth + 1); writefln("stretch tree of depth ", maxDepth + 1, "\t check: ", t.check()); t.deleteTree(); auto longLivedTree = Tree.bottumUpTree(0, maxDepth); for (int depth = minDepth; depth <= maxDepth; depth += 2) { int check; int iterations = 1 << (maxDepth - depth + minDepth); for (int i = 1; i <= iterations; i++) { auto t1 = Tree.bottumUpTree(i, depth); check += t1.check(); t1.deleteTree(); auto t2 = Tree.bottumUpTree(-i, depth); check += t2.check(); t2.deleteTree(); } writefln(iterations * 2, "\t trees of depth ", depth, "\t check: ", check); } writefln("long lived tree of depth ", maxDepth, "\t check: ", longLivedTree.check); longLivedTree.deleteTree(); }Likely not the intent of the benchmark, but an array based implementation could be faster here.It seems slower. Bye, bearophile
Nov 26 2007
Craig Black:Here's an array-based C++ implementation from the shootout. It performs the best but was disqualified because it allocates everything in one big chunk and then uses placement new to allocate each node.Allocating huge chunks is slow on Win, look at my better version here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=62373 Bye, bearophile
Nov 26 2007
bearophile wrote:Craig Black:The code I posted uses an approach similar to the C++ version. There are some mistakes in there, after fixing those it runs about 6 times as fast as the code you posted. (dmd 1.022, windows xp, 20 iterations), and could be optimized further. But it's not the same benchmark.Here's an array-based C++ implementation from the shootout. It performs the best but was disqualified because it allocates everything in one big chunk and then uses placement new to allocate each node.Allocating huge chunks is slow on Win, look at my better version here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=62373 Bye, bearophile
Nov 26 2007
Reply to Lutger,Another interesting thing here is that the Java version beats all other languages too, including the ones with manual memory management. But it does cost a lot of memory. I would guess that this is an ideal case for a copying garbage collector? I wonder how much of an improvement such a GC will have in normal applications, and how hard it will be to implement in D. Another concern is what changes are needed in client code to work nicely with a copying GC. But this example does show how relative these microbenchmarks are. After all, it costs the Java version more than ten times the memory to outperform D, how will that affect the speed of a complex application? btw. I tried implementing this with a freelist, as suggested by the article at the digitalmars site, but it was slower than manual memory management. Might be because the code for allocation wasn't inlined for some reason. Likely not the intent of the benchmark, but an array based implementation could be faster here.I wonder how fast it would be to just malloc a huge block of ram and use this for your memeory allocator: void* hugeBlock; //void* end; T* Allocate(T)() { T* ret = cast(T*) hugeBlock; hugeBlock = cast(void*) (&ret[1]); //if(end < hugeBlock) *(cast(int*)null) = 0; return ret; } as long as you don't run out of ram, it will be real fast. :)
Nov 26 2007
"BCS" <ao pathlink.com> wrote in message news:ce0a33432639b8c9fe730ad8498a news.digitalmars.com...Reply to Lutger,Yes this is very fast but causes huge amounts of fragmentation and thus inefficient use of memory.Another interesting thing here is that the Java version beats all other languages too, including the ones with manual memory management. But it does cost a lot of memory. I would guess that this is an ideal case for a copying garbage collector? I wonder how much of an improvement such a GC will have in normal applications, and how hard it will be to implement in D. Another concern is what changes are needed in client code to work nicely with a copying GC. But this example does show how relative these microbenchmarks are. After all, it costs the Java version more than ten times the memory to outperform D, how will that affect the speed of a complex application? btw. I tried implementing this with a freelist, as suggested by the article at the digitalmars site, but it was slower than manual memory management. Might be because the code for allocation wasn't inlined for some reason. Likely not the intent of the benchmark, but an array based implementation could be faster here.I wonder how fast it would be to just malloc a huge block of ram and use this for your memeory allocator: void* hugeBlock; //void* end; T* Allocate(T)() { T* ret = cast(T*) hugeBlock; hugeBlock = cast(void*) (&ret[1]); //if(end < hugeBlock) *(cast(int*)null) = 0; return ret; } as long as you don't run out of ram, it will be real fast. :)
Nov 27 2007
Craig Black wrote:"BCS" <ao pathlink.com> wrote in message news:ce0a33432639b8c9fe730ad8498a news.digitalmars.com...It's not even necessarily that fast. In the research I've seen, custom allocators are almost always outperformed by the default allocator for general use. SeanReply to Lutger,Yes this is very fast but causes huge amounts of fragmentation and thus inefficient use of memory.Another interesting thing here is that the Java version beats all other languages too, including the ones with manual memory management. But it does cost a lot of memory. I would guess that this is an ideal case for a copying garbage collector? I wonder how much of an improvement such a GC will have in normal applications, and how hard it will be to implement in D. Another concern is what changes are needed in client code to work nicely with a copying GC. But this example does show how relative these microbenchmarks are. After all, it costs the Java version more than ten times the memory to outperform D, how will that affect the speed of a complex application? btw. I tried implementing this with a freelist, as suggested by the article at the digitalmars site, but it was slower than manual memory management. Might be because the code for allocation wasn't inlined for some reason. Likely not the intent of the benchmark, but an array based implementation could be faster here.I wonder how fast it would be to just malloc a huge block of ram and use this for your memeory allocator: void* hugeBlock; //void* end; T* Allocate(T)() { T* ret = cast(T*) hugeBlock; hugeBlock = cast(void*) (&ret[1]); //if(end < hugeBlock) *(cast(int*)null) = 0; return ret; } as long as you don't run out of ram, it will be real fast. :)
Nov 27 2007
bearophile wrote:This Shootout page shows an interesting comparison between Java6 and D: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=all As you can see there are two D versions, here I have written the slow one (the fast D version uses manual memory management (MMM), the slow version leaves it to the built-in GC). I think my version is equal to the faster solution, that run on Java (it uses automatic GC), you can see both versions here: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=dlang&id=0 http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=javaxx&id=2 The interesting thing is that despite the code being essentially the same, the D version run 9.1 times slower than the Java one. Some notes: - This shootout test is designed to stress the GC. - From my experience on this demo, I think that speed difference (between that Java and the slow D version) is caused mostly by the GC, so the D GC is quite worse than the Java6 one. - There the Java6 GC receives some hints from the programmer, those "-server -Xms64m" (but not -Xbatch, it disables background compilation) on the command line, I presume the D GC may someday accept similar hints. - You may take a look at he memory used too: Java6 with those GC hints: 40.3 MB (the -Xms64m flag sets the initial Java heap size to 64 MB), D MMM: 4.7 MB, D with GC: 17.6 MB. - The Java code shows how most people writes code, because manually managing memory is okay in a little benchmark like this, but if you have a very large system it's not easy to manage every memory usage by hand. - There are even faster ways to solve this problem using D, I have seen that using a free list the code becomes even faster than the MMM one. And there are ways to speed that up too a lot. - Today the source code of Java6 can be read, it's open source, so some of the ideas used by GC can be borrowed to the D GC (but it may require some work). Bye, bearophileAside, if the (your) code is compiled with Tango instead the code runs 23% faster. Of course still not comparable with the 9x performance of Java. Time elapsed / seconds (parameter = 16) ---------------------------- D2+Phobos: 13.3828 & 13.5043 D1+Phobos: 13.5441 & 13.4825 D1+Tango : 10.1788 & 9.98871 -- Kenny.
Nov 26 2007
"bearophile" <bearophileHUGS lycos.com> wrote in message news:fie98d$8ed$1 digitalmars.com...This Shootout page shows an interesting comparison between Java6 and D: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=all As you can see there are two D versions, here I have written the slow one (the fast D version uses manual memory management (MMM), the slow version leaves it to the built-in GC). I think my version is equal to the faster solution, that run on Java (it uses automatic GC), you can see both versions here: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=dlang&id=0 http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=javaxx&id=2 The interesting thing is that despite the code being essentially the same, the D version run 9.1 times slower than the Java one. Some notes: - This shootout test is designed to stress the GC. - From my experience on this demo, I think that speed difference (between that Java and the slow D version) is caused mostly by the GC, so the D GC is quite worse than the Java6 one. - There the Java6 GC receives some hints from the programmer, those "-server -Xms64m" (but not -Xbatch, it disables background compilation) on the command line, I presume the D GC may someday accept similar hints. - You may take a look at he memory used too: Java6 with those GC hints: 40.3 MB (the -Xms64m flag sets the initial Java heap size to 64 MB), D MMM: 4.7 MB, D with GC: 17.6 MB. - The Java code shows how most people writes code, because manually managing memory is okay in a little benchmark like this, but if you have a very large system it's not easy to manage every memory usage by hand. - There are even faster ways to solve this problem using D, I have seen that using a free list the code becomes even faster than the MMM one. And there are ways to speed that up too a lot. - Today the source code of Java6 can be read, it's open source, so some of the ideas used by GC can be borrowed to the D GC (but it may require some work). Bye, bearophileThe modern streamlined Java GC may win against an archaic manual memory allocator. However, I don't think it could possibly outperform a modern manual allocator like nedmalloc. I think that D should include nedmalloc in the standard library. This would put D at the top of the list in this benchmark.
Nov 26 2007
Craig Black:The modern streamlined Java GC may win against an archaic manual memory allocator. However, I don't think it could possibly outperform a modern manual allocator like nedmalloc. I think that D should include nedmalloc in the standard library. This would put D at the top of the list in this benchmark.There is this widely used one too, by Doug Lea: http://gee.cs.oswego.edu/pub/misc/malloc.c http://gee.cs.oswego.edu/pub/misc/malloc.h Bye, bearophile
Nov 26 2007
"bearophile" <bearophileHUGS lycos.com> wrote in message news:fieqlv$12lr$1 digitalmars.com...Craig Black:nedmalloc is based on Doug Lea's allocator I think. It improves on it and significantly outperforms it, especially when multiple threads are used.The modern streamlined Java GC may win against an archaic manual memory allocator. However, I don't think it could possibly outperform a modern manual allocator like nedmalloc. I think that D should include nedmalloc in the standard library. This would put D at the top of the list in this benchmark.There is this widely used one too, by Doug Lea: http://gee.cs.oswego.edu/pub/misc/malloc.c http://gee.cs.oswego.edu/pub/misc/malloc.h Bye, bearophile
Nov 26 2007
Craig Black wrote:"bearophile" <bearophileHUGS lycos.com> wrote in message news:fie98d$8ed$1 digitalmars.com...I agree. Someone requested this ages ago. I wonder why hasn't this been done yet? -JoelThis Shootout page shows an interesting comparison between Java6 and D: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=all As you can see there are two D versions, here I have written the slow one (the fast D version uses manual memory management (MMM), the slow version leaves it to the built-in GC). I think my version is equal to the faster solution, that run on Java (it uses automatic GC), you can see both versions here: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=dlang&id=0 http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytrees&lang=javaxx&id=2 The interesting thing is that despite the code being essentially the same, the D version run 9.1 times slower than the Java one. Some notes: - This shootout test is designed to stress the GC. - From my experience on this demo, I think that speed difference (between that Java and the slow D version) is caused mostly by the GC, so the D GC is quite worse than the Java6 one. - There the Java6 GC receives some hints from the programmer, those "-server -Xms64m" (but not -Xbatch, it disables background compilation) on the command line, I presume the D GC may someday accept similar hints. - You may take a look at he memory used too: Java6 with those GC hints: 40.3 MB (the -Xms64m flag sets the initial Java heap size to 64 MB), D MMM: 4.7 MB, D with GC: 17.6 MB. - The Java code shows how most people writes code, because manually managing memory is okay in a little benchmark like this, but if you have a very large system it's not easy to manage every memory usage by hand. - There are even faster ways to solve this problem using D, I have seen that using a free list the code becomes even faster than the MMM one. And there are ways to speed that up too a lot. - Today the source code of Java6 can be read, it's open source, so some of the ideas used by GC can be borrowed to the D GC (but it may require some work). Bye, bearophileThe modern streamlined Java GC may win against an archaic manual memory allocator. However, I don't think it could possibly outperform a modern manual allocator like nedmalloc. I think that D should include nedmalloc in the standard library. This would put D at the top of the list in this benchmark.
Nov 26 2007
janderson wrote:Craig Black wrote:Priorities? --bb"bearophile" <bearophileHUGS lycos.com> wrote in message news:fie98d$8ed$1 digitalmars.com...I agree. Someone requested this ages ago. I wonder why hasn't this been done yet?This Shootout page shows an interesting comparison between Java6 and D: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test= inarytrees&lang=all As you can see there are two D versions, here I have written the slow one (the fast D version uses manual memory management (MMM), the slow version leaves it to the built-in GC). I think my version is equal to the faster solution, that run on Java (it uses automatic GC), you can see both versions here: http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binaryt ees&lang=dlang&id=0 http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=binarytr es&lang=javaxx&id=2 The interesting thing is that despite the code being essentially the same, the D version run 9.1 times slower than the Java one. Some notes: - This shootout test is designed to stress the GC. - From my experience on this demo, I think that speed difference (between that Java and the slow D version) is caused mostly by the GC, so the D GC is quite worse than the Java6 one. - There the Java6 GC receives some hints from the programmer, those "-server -Xms64m" (but not -Xbatch, it disables background compilation) on the command line, I presume the D GC may someday accept similar hints. - You may take a look at he memory used too: Java6 with those GC hints: 40.3 MB (the -Xms64m flag sets the initial Java heap size to 64 MB), D MMM: 4.7 MB, D with GC: 17.6 MB. - The Java code shows how most people writes code, because manually managing memory is okay in a little benchmark like this, but if you have a very large system it's not easy to manage every memory usage by hand. - There are even faster ways to solve this problem using D, I have seen that using a free list the code becomes even faster than the MMM one. And there are ways to speed that up too a lot. - Today the source code of Java6 can be read, it's open source, so some of the ideas used by GC can be borrowed to the D GC (but it may require some work). Bye, bearophileThe modern streamlined Java GC may win against an archaic manual memory allocator. However, I don't think it could possibly outperform a modern manual allocator like nedmalloc. I think that D should include nedmalloc in the standard library. This would put D at the top of the list in this benchmark.
Nov 26 2007