digitalmars.D - Port a benchmark to D?
- Andrei Alexandrescu (7/7) Jun 03 2011 This paper:
- Timon Gehr (4/10) Jun 03 2011 OK. I'll start right away.
- bearophile (5/6) Jun 03 2011 I have independently started it too :-)
- Nick Sabalausky (4/6) Jun 03 2011 I realize you'e probably talking about content, but...Pet Peeve: Trying ...
- Timon Gehr (5/11) Jun 03 2011 Ok, we can compare the two implementations afterwards. BTW, have you alr...
- Andrei Alexandrescu (3/17) Jun 03 2011 Terrific, thanks! I think this will be very instructive.
- Andrei Alexandrescu (4/18) Jun 03 2011 One note - some of their std::list uses can be replaced by SList, and
- Andrej Mitrovic (1/1) Jun 03 2011 Speaking of SList, for some reason std_slist.html is still distributed w...
- Timon Gehr (4/25) Jun 03 2011 Yes, but built-in arrays are more similar to std::vector, therefore it m...
- Andrei Alexandrescu (6/33) Jun 03 2011 I noticed that the C++ code uses std::list without there being any need
- Timon Gehr (7/12) Jun 03 2011 Yes, but the list in FindSet is unnecessary anyways. If I start changing...
- Jonathan M Davis (7/21) Jun 03 2011 You give RedBlackTree a different predicate if you want to treat it as a...
- Timon Gehr (4/25) Jun 03 2011 Yes, thats what I had in mind, but I thought it is strange that there is...
- Jonathan M Davis (8/36) Jun 03 2011 std.container provides data structures. It's up to you to decide how to ...
- Steven Schveighoffer (13/44) Jun 06 2011 This makes it a set, not a multiset. allowDuplicates set to true would ...
- Jonathan M Davis (4/49) Jun 06 2011 Bleh. You're right. I guess that I wasn't thinking straight (or was thin...
- Andrej Mitrovic (3/5) Jun 03 2011 I've filed it (and many more):
- bearophile (14/15) Jun 03 2011 Some of those "stupid" things come from the very strict Google C++ codin...
- Timon Gehr (26/29) Jun 03 2011 standards.
- Walter Bright (5/8) Jun 04 2011 I never understood the point of such.
- Jeff Nowakowski (3/14) Jun 04 2011 I thought you were big on printed out code reviews and not requiring any...
- Walter Bright (2/4) Jun 04 2011 I don't find those comments useful in printed code either.
- bearophile (20/21) Jun 03 2011 My modified C++0x code:
- Adam D. Ruppe (15/20) Jun 03 2011 What timings did you get? On my computer, the D version ran
- Andrei Alexandrescu (4/12) Jun 03 2011 What are your timings for the Java, Scala, and Go versions?
- Adam D. Ruppe (3/4) Jun 03 2011 I don't have compilers nor the code for those languages, so I don't
- Andrej Mitrovic (3/3) Jun 03 2011 I must be doing something wrong. Do I have to pipe some files to the
- Andrej Mitrovic (9/9) Jun 03 2011 Oh I think it's that stack overflow thing bear was talking about?
- Andrej Mitrovic (2/2) Jun 03 2011 Okay, here's how to increase stack size via Optlink:
- bearophile (4/6) Jun 03 2011 Keep in mind that sets the max stack size, not the max heap size. -L/STA...
- Andrej Mitrovic (4/11) Jun 03 2011 Is that in bytes or kilobytes? Well, I guess Optlink/LD will
- Andrej Mitrovic (2/2) Jun 03 2011 Btw for those wondering, the paper's code is at
- Jonathan M Davis (3/7) Jun 03 2011 Or maybe you just have a way better machine?
- Andrej Mitrovic (9/9) Jun 03 2011 DMD debug: 50s
- Andrei Alexandrescu (4/13) Jun 03 2011 I think they do. But another good data point would be the C++ timings on...
- Andrej Mitrovic (3/3) Jun 03 2011 I've compiled the C++ example from google, but I had to remove the
- Andrei Alexandrescu (4/7) Jun 03 2011 Hmm, it would be odd if the D version is more than twice faster than the...
- Andrej Mitrovic (1/1) Jun 03 2011 SORRY, I meant gcc 4.5.2 not 4.4.0.
- Andrej Mitrovic (4/4) Jun 03 2011 I can't compile bear's C++0x example, I get a ton of undefined
- Andrej Mitrovic (3/3) Jun 03 2011 It seems I should have used g++:
- Andrej Mitrovic (1/1) Jun 03 2011 I'll give a shot at compiling on Linux, maybe this is all MinGW's fault.
- Andrej Mitrovic (1/1) Jun 03 2011 What do I use to accurately time an executable under Linux?
- Andrej Mitrovic (7/7) Jun 03 2011 Running Ubuntu v10.10 x32 on Virtualbox with hardware virtualization,
- Andrei Alexandrescu (9/16) Jun 03 2011 Interesting. So now we have a run time comparable with C++'s run time in...
- bearophile (18/20) Jun 03 2011 First version, with just classes, a bit better cleaned up:
- Andrej Mitrovic (2/4) Jun 03 2011 38secs. Cut down by 10 secs from last time.
- Caligo (19/19) Jun 03 2011 Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5,
- dsimcha (2/21) Jun 03 2011 Why not -inline on dmd?
- Caligo (23/49) Jun 04 2011 I don't like the '-inline' option, but here it is. Besides, I usually
- Caligo (6/6) Jun 04 2011 And just to be fair to C++:
- Andrei Alexandrescu (4/10) Jun 04 2011 Thanks to all who participated to this. I've shared these results on red...
- Mafi (2/14) Jun 04 2011 Could you please give a link to the post. I can't find it.
- Andrei Alexandrescu (4/24) Jun 04 2011 My id is andralex.
- Andrej Mitrovic (3/3) Jun 04 2011 Here's another run on win32 for bear's struct version:
- Andrej Mitrovic (2/2) Jun 04 2011 AFAIK google also messed with the GC in their paper, so I'd say this
- Walter Bright (3/5) Jun 04 2011 Direct link:
- Iain Buclaw (8/40) Jun 04 2011 Well, I'm not the least bit surprised by DMD outperforming GDC here.
- dsimcha (4/23) Jun 03 2011 Oh, also, you way want to re-try the benchmark w/ 2.053. It looks
- Timon Gehr (7/27) Jun 04 2011 then I'll create one or two more optimized versions (one using a memory ...
- Timon Gehr (12/32) Jun 04 2011 then I'll create one or two more optimized versions (one using a memory ...
- Daniel Gibson (9/53) Jun 04 2011 What was your time for D without disabling the GC? Probably 40-50s?
- Timon Gehr (7/15) Jun 04 2011 Without disabling the GC, it runs through in 38.2s.
- Andrei Alexandrescu (3/37) Jun 04 2011 Better yet, feel free to contribute. The more of us participate, the bet...
- Walter Bright (3/6) Jun 04 2011 It would be nice to figure out what is different. Try using the coverage...
- bearophile (4/6) Jun 04 2011 There are little differences and inefficiencies here and there, but in t...
- dsimcha (6/12) Jun 04 2011 That's probably right, for two reasons:
- Daniel Gibson (5/28) Jun 04 2011 Besides better optimization by the compiler,
- Walter Bright (2/5) Jun 04 2011 Easy to test, simply disable the gc.
- Andrei Alexandrescu (9/26) Jun 03 2011 Just call stdout.flush().
- bearophile (54/54) Jun 04 2011 UnionFindNode struct instances probably doesn't need a memory pool, it's...
- bearophile (4/8) Jun 04 2011 Sorry, this code is wrong, it has to iterate from 0 to length.
This paper: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ compares the efficiency of an algorithm implemented in C++, Scala, Java, and Go. I wonder how D would fare. Does anyone have the time and inclination to port the benchmark so we can take a look? Thanks, Andrei
Jun 03 2011
Andrei Alexandrescu wrote:This paper:http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/compares the efficiency of an algorithm implemented in C++, Scala, Java, and Go. I wonder how D would fare. Does anyone have the time and inclination to port the benchmark so we can take a look? Thanks, AndreiOK. I'll start right away. Timon
Jun 03 2011
Timon Gehr:OK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
"bearophile" <bearophileHUGS lycos.com> wrote in message news:isbd1j$1u9s$1 digitalmars.com...But this article written by Google and its benchmarking methodology are not good.I realize you'e probably talking about content, but...Pet Peeve: Trying to read a paged multi-column layout outside of dead-tree form.
Jun 03 2011
bearophile wrote:Timon Gehr:Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. TimonOK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:Terrific, thanks! I think this will be very instructive. AndreiTimon Gehr:Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. TimonOK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
On 6/3/11 2:48 PM, Timon Gehr wrote:bearophile wrote:One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. AndreiTimon Gehr:Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. TimonOK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
Speaking of SList, for some reason std_slist.html is still distributed with DMD.
Jun 03 2011
Andrei Alexandresco wrote:On 6/3/11 2:48 PM, Timon Gehr wrote:Yes, but built-in arrays are more similar to std::vector, therefore it might already be an optimization over the original benchmark. I'm not sure if that is okay? Timonbearophile wrote:One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. AndreiTimon Gehr:Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. TimonOK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
On 6/3/11 3:01 PM, Timon Gehr wrote:Andrei Alexandresco wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. AndreiOn 6/3/11 2:48 PM, Timon Gehr wrote:Yes, but built-in arrays are more similar to std::vector, therefore it might already be an optimization over the original benchmark. I'm not sure if that is okay? Timonbearophile wrote:One note - some of their std::list uses can be replaced by SList, and others by built-in arrays. AndreiTimon Gehr:Ok, we can compare the two implementations afterwards. BTW, have you already noticed how incredibly stupid some parts of the implementation of cpp are? I think I will also do a d_pro version after porting cpp. TimonOK. I'll start right away.I have independently started it too :-) But this article written by Google and its benchmarking methodology are not good. Bye, bearophile
Jun 03 2011
Andrei Alexandrescu wrote:I noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps? Timon
Jun 03 2011
On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree as its data structure. - Jonathan M DavisI noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
Jun 03 2011
Jonathan M Davis wrote:On 2011-06-03 14:08, Timon Gehr wrote:Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container. TimonAndrei Alexandrescu wrote:You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree > as its data structure. - Jonathan M DavisI noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
Jun 03 2011
On 2011-06-03 14:30, Timon Gehr wrote:Jonathan M Davis wrote:std.container provides data structures. It's up to you to decide how to use the data structures. It does not attempt to create a wrapper or anything of the sort to give you particular uses for a data structure (such as using RedBlackTree as a map). Whether it should do that sort of thing is up for debate, but its focus is on data structures not on "collections" which are organized by how they're used. - Jonathan M DavisOn 2011-06-03 14:08, Timon Gehr wrote:Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container.Andrei Alexandrescu wrote:You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset. But If you adjust its template parameters, you can make it a set, map, multimap, or any other type of collection which uses a red-black tree > as its data structure. - Jonathan M DavisI noticed that the C++ code uses std::list without there being any need for a linked list structure. See for example the data structure used in FindSet. It's a list, but it's just appended too and then used for one iteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I start changing the original implementation, the first thing I will do is to remove that. First however, I will port the code as closely as possible. Is there any associative version of RedBlackTree (I realize it could be made associative quite easily), or should I just use built-in hash maps?
Jun 03 2011
On Fri, 03 Jun 2011 17:30:54 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:Jonathan M Davis wrote:This makes it a set, not a multiset. allowDuplicates set to true would make it a multiset.On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:needI noticed that the C++ code uses std::list without there being anyused infor a linked list structure. See for example the data structureoneFindSet. It's a list, but it's just appended too and then used forchangingiteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I startthe original implementation, the first thing I will do is to removethat.First however, I will port the code as closely as possible. Is thereanyassociative version of RedBlackTree (I realize it could be madeassociativequite easily), or should I just use built-in hash maps?You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset.Yes, thats what I had in mind, but I thought it is strange that there is no boilerplate map in std.container.I think the goal is to eventually have these higher-level types based on the implementation types, but I'm uncertain how they will look or where they will go. Almost certainly, there will be a template based on a set-like template for defining a map (e.g. map!(RedBlackTree, int, int) ). If you want a cookie-cutter map type that uses the exact same implementation as RedBlackTree, with a custom allocator that helps with performance for long-lasting containers, dcollections provides such a type (TreeMap). If you do end up using it, I recommend the latest trunk, I've made some critical bug fixes (I need to release soon...). -Steve
Jun 06 2011
On 2011-06-06 06:37, Steven Schveighoffer wrote:On Fri, 03 Jun 2011 17:30:54 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:Bleh. You're right. I guess that I wasn't thinking straight (or was thinking too quickly) on that one. :( - Jonathan M DavisJonathan M Davis wrote:This makes it a set, not a multiset. allowDuplicates set to true would make it a multiset.On 2011-06-03 14:08, Timon Gehr wrote:Andrei Alexandrescu wrote:needI noticed that the C++ code uses std::list without there being anyused infor a linked list structure. See for example the data structureoneFindSet. It's a list, but it's just appended too and then used forchangingiteration. AndreiYes, but the list in FindSet is unnecessary anyways. If I startthe original implementation, the first thing I will do is to removethat.First however, I will port the code as closely as possible. Is thereanyassociative version of RedBlackTree (I realize it could be madeassociativequite easily), or should I just use built-in hash maps?You give RedBlackTree a different predicate if you want to treat it as a map. It defaults to "a < b" with allowDuplicates as false, which makes it a multiset.
Jun 06 2011
On 6/3/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Speaking of SList, for some reason std_slist.html is still distributed with DMD.I've filed it (and many more): http://d.puremagic.com/issues/show_bug.cgi?id=6101
Jun 03 2011
Timon Gehr:have you already noticed how incredibly stupid some parts of the implementation of cpp are?Some of those "stupid" things come from the very strict Google C++ coding standards. Even if my C++ experience is not extensive, I find this C++ easy to understand. One part of the code: } // node_pool.size } // Step c } // FindLoops private: MaoCFG *CFG_; // current control flow graph. LoopStructureGraph *lsg_; // loop forest. }; // HavlakLoopFinder The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync. Despite this risk I add similar comments at the end of functions/classe/structs/loops in my D code too, because the risk of them getting out of sync is lower than the advantages they give me. I have not written an enhancement request yet about this in the D Bugzilla yet, also because I don't know what a good C-style syntax for it could be. Bye, bearophile
Jun 03 2011
bearophile wrote:Timon Gehr:of cpp are?have you already noticed how incredibly stupid some parts of the implementationSome of those "stupid" things come from the very strict Google C++ codingstandards. sure? Eg. UnionFindNode *FindSet() { typedef std::list<UnionFindNode *> NodeListType; NodeListType nodeList; UnionFindNode *node = this; while (node != node->parent()) { if (node->parent() != node->parent()->parent()) nodeList.push_back(node); node = node->parent(); } // Path Compression, all nodes' parents point to the 1st level parent. NodeListType::iterator iter = nodeList.begin(); NodeListType::iterator end = nodeList.end(); for (; iter != end; ++iter) (*iter)->set_parent(node->parent()); return node; } They create a new linked list of nodes on the way to the root. It is not that that is just basically the same they already had when following the parent pointers. They obviously need a copy. Also, they made sure that node==node->parent(). (node is the a root) But then they call node->parent(). etc... Timon
Jun 03 2011
On 6/3/2011 1:24 PM, bearophile wrote:The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync.I never understood the point of such. My editor has a single key "find matching { [ < ( ) > ] }" command, and so I never have a need for such ugly comments. In fact I usually delete such comments when I encounter them.
Jun 04 2011
On 06/04/2011 04:25 AM, Walter Bright wrote:On 6/3/2011 1:24 PM, bearophile wrote:I thought you were big on printed out code reviews and not requiring any editing features from the language?The Ada language has a syntax to write those names at the closing ends, and the Ada compiler enforces such names to be always coherent and correct. In C/C++/D unfortunately such names (written as comments) may go out of sync.I never understood the point of such. My editor has a single key "find matching { [ < ( ) > ] }" command, and so I never have a need for such ugly comments. In fact I usually delete such comments when I encounter them.
Jun 04 2011
On 6/4/2011 7:14 AM, Jeff Nowakowski wrote:I thought you were big on printed out code reviews and not requiring any editing features from the language?I don't find those comments useful in printed code either.
Jun 04 2011
Andrei:Does anyone have the time and inclination to port the benchmark so we can take a look?My modified C++0x code: http://codepad.org/iBwINTkJ First D2 translation that seems to work: http://codepad.org/sEEFNlAd Notes on the translation: - The program crashes with zero error messages if you don't increase the stack. Time ago I have written an enhancement request on this: http://d.puremagic.com/issues/show_bug.cgi?id=6088 - The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome. - The C++ code uses classes, but they are often more like D structs. In the first D version to keep things simple I have used final classes everywhere. I will profile the code and I will create a more efficient version that uses structs where needed. - I have had to track down a null reference error (caused by the nodes array, that contains just references in the first D version). Nonnullable type modifiers avoids similar bugs at compile-time. - The C++ code is 25 KB, the D code (first version is 21 KB), the C++ code uses about 100 MB at runtime, while the D code uses about 160 MB. - The C++ loops to iterate on collections is many times worse than the D foreach. - In the main() of C++ code, there is a LoopStructureGraph lsg that shadows another one, I've replaced it with a lsg2 in both D and C++ code. - In the C++ code I have used a std::unordered_map to implement BasicBlockMap, because it's significantly faster. So now the code is C++0x. - The method names start with uppercase, I'd like to change this in a second D version. - No linked lists or sorted dictionaries seem needed in this program. But I have felt a bit of need of a set data structure in D (I have used an AA with bool values). Despite there are some associative arrays of SimpleLoop and BasicBlock objects, I think their classes don't need a hash function and opEquals. Bye, bearophile
Jun 03 2011
bearophile wrote:My modified C++0x code: First D2 translation that seems to work:What timings did you get? On my computer, the D version ran slightly faster (56 seconds vs 63s for C++) without optimizations turned on. With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D). This is similar to other benchmarks I've run in the past. Standard D builds beat standard C++ builds, but gcc's optimizer takes the lead vs dmd's optimizer. However, in the past, I've found writing D style instead of C++ style lets standard D beat even optimized g++. I wonder if that's the case here too..- The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome.To flush with D's stdout, simply call stdout.flush: write("."); stdout.flush();
Jun 03 2011
On 6/3/11 6:09 PM, Adam D. Ruppe wrote:bearophile wrote:What are your timings for the Java, Scala, and Go versions? Thanks, AndreiMy modified C++0x code: First D2 translation that seems to work:What timings did you get? On my computer, the D version ran slightly faster (56 seconds vs 63s for C++) without optimizations turned on. With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D).
Jun 03 2011
Andrei Alexandrescu wrote:What are your timings for the Java, Scala, and Go versions?I don't have compilers nor the code for those languages, so I don't know.
Jun 03 2011
I must be doing something wrong. Do I have to pipe some files to the module or something? I've compiled it and ran it, it's done in 1.5 seconds. :s
Jun 03 2011
Oh I think it's that stack overflow thing bear was talking about? Because it quits after this with no message: Welcome to LoopTesterApp, D edition Constructing App... Constructing Simple CFG... 15000 dummy loops Constructing CFG... Performing Loop Recognition 1 Iteration
Jun 03 2011
Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000
Jun 03 2011
Andrej Mitrovic:Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000Keep in mind that sets the max stack size, not the max heap size. -L/STACK:5000000 is plenty :-) Bye, bearophile
Jun 03 2011
On 6/4/11, bearophile <bearophileHUGS lycos.com> wrote:Andrej Mitrovic:Is that in bytes or kilobytes? Well, I guess Optlink/LD will automatically limit the stack size to something sane if I've blown it out of proportion. The app ends up using about 130megs.Okay, here's how to increase stack size via Optlink: dmd benchmark.d -L/STACK:128000000Keep in mind that sets the max stack size, not the max heap size. -L/STACK:5000000 is plenty :-) Bye, bearophile
Jun 03 2011
Btw for those wondering, the paper's code is at http://code.google.com/p/multi-language-bench/
Jun 03 2011
On 2011-06-03 17:16, Andrej Mitrovic wrote:I must be doing something wrong. Do I have to pipe some files to the module or something? I've compiled it and ran it, it's done in 1.5 seconds. :sOr maybe you just have a way better machine? - Jonathan M Davis
Jun 03 2011
DMD debug: 50s DMD optimized: 49s (-release -noboundscheck -O -inline -L/STACK:1280000000) That's around 1GB of stack memory. Compiling with GDC will make the app throw an exception at runtime due to the stack being blown, the error message isn't special but it's better than no error message. GDC debug: 1m:24s GDC optimized: 1m:12s (gdc -v2 -O3 -Wl,--stack,1280000000) I haven't tested the other ones. Does Go have a Windows compiler yet?
Jun 03 2011
On 6/3/11 7:42 PM, Andrej Mitrovic wrote:DMD debug: 50s DMD optimized: 49s (-release -noboundscheck -O -inline -L/STACK:1280000000) That's around 1GB of stack memory. Compiling with GDC will make the app throw an exception at runtime due to the stack being blown, the error message isn't special but it's better than no error message. GDC debug: 1m:24s GDC optimized: 1m:12s (gdc -v2 -O3 -Wl,--stack,1280000000) I haven't tested the other ones. Does Go have a Windows compiler yet?I think they do. But another good data point would be the C++ timings on the same machine. Andrei
Jun 03 2011
I've compiled the C++ example from google, but I had to remove the "-lc" option due to errors (what is this option anyway?). gcc 4.4.0 with -O3: 1:50
Jun 03 2011
On 6/3/11 8:05 PM, Andrej Mitrovic wrote:I've compiled the C++ example from google, but I had to remove the "-lc" option due to errors (what is this option anyway?). gcc 4.4.0 with -O3: 1:50Hmm, it would be odd if the D version is more than twice faster than the C++ one. Wonder what's going on there. Andrei
Jun 03 2011
SORRY, I meant gcc 4.5.2 not 4.4.0.
Jun 03 2011
I can't compile bear's C++0x example, I get a ton of undefined reference errors. I've tried with: gcc -O3 -std=gnu++0x raw.cpp Maybe he's using gcc 4.6 or I'm missing some flags.
Jun 03 2011
It seems I should have used g++: g++ -O3 -std=gnu++0x -Wl,--stack,1280000000 raw.cpp time: 2m:12s
Jun 03 2011
I'll give a shot at compiling on Linux, maybe this is all MinGW's fault.
Jun 03 2011
What do I use to accurately time an executable under Linux?
Jun 03 2011
Running Ubuntu v10.10 x32 on Virtualbox with hardware virtualization, GCC v4.4.4 and DMD 2.053 installed. DMD debug: 54s DMD optimized: 47s Google's C++ optimized: 27s (gcc -O3 -lc -lstdc++) Bear's C++ optimized: 24s (g++ -O3 -std=gnu++0x) So it's MinGW's fault apparently.
Jun 03 2011
On 06/03/2011 08:47 PM, Andrej Mitrovic wrote:Running Ubuntu v10.10 x32 on Virtualbox with hardware virtualization, GCC v4.4.4 and DMD 2.053 installed. DMD debug: 54s DMD optimized: 47s Google's C++ optimized: 27s (gcc -O3 -lc -lstdc++) Bear's C++ optimized: 24s (g++ -O3 -std=gnu++0x) So it's MinGW's fault apparently.Interesting. So now we have a run time comparable with C++'s run time in the paper. The implementation by bearophile has 721 lines. Once we have compilation times, binary size, and memory footprint, we have a good comparison point with the study in the paper. Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer. Thanks, Andrei
Jun 03 2011
Andrei:Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas from the original paper). The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophile
Jun 03 2011
On 6/4/11, bearophile <bearophileHUGS lycos.com> wrote:Second version, with all structs: http://codepad.org/etsLsZV538secs. Cut down by 10 secs from last time.
Jun 03 2011
Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation fault
Jun 03 2011
On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation faultWhy not -inline on dmd?
Jun 03 2011
On Fri, Jun 3, 2011 at 11:16 PM, dsimcha <dsimcha yahoo.com> wrote:On 6/4/2011 12:01 AM, Caligo wrote:I don't like the '-inline' option, but here it is. Besides, I usually use GDC or LDC2 and I was expecting them to outperform DMD because they usually do, but not in this case. DMD-32bit v2.52 -O -release -inline [VIRT: 94MB, RES: 92MB] real 0m42.490s user 0m42.480s sys 0m0.000s DMD-32bit v2.53 -O -release -inline [VIRT: 107MB, RES: 104MB] real 0m34.011s user 0m33.930s sys 0m0.070s DMD-64bit v2.53 -O -release -inline segmentation fault DMD-64bit v2.53 -O -release [VIRT: 232MB, RES: 219MB] real 0m44.715s user 0m44.580s sys 0m0.080s P.S. It's a 64-bit system.Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, =A0RES: 174MB] real =A0 =A00m28.407s user =A0 =A00m28.330s sys =A0 =A0 0m0.070s DMD -O -release [VIRT: 94MB, =A0RES: 92MB] real =A0 =A00m43.232s user =A0 =A00m42.980s sys =A0 =A0 0m0.070s GDC -O3 [VIRT: 306MB, =A0RES: 295MB] real =A0 =A01m10.788s user =A0 =A01m10.570s sys =A0 =A0 0m0.190s LDC2 segmentation faultWhy not -inline on dmd?
Jun 04 2011
And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060s
Jun 04 2011
On 06/04/2011 02:10 AM, Caligo wrote:And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060sThanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Jun 04 2011
Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:On 06/04/2011 02:10 AM, Caligo wrote:Could you please give a link to the post. I can't find it.And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060sThanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Jun 04 2011
On 06/04/2011 09:43 AM, Mafi wrote:Am 04.06.2011 16:37, schrieb Andrei Alexandrescu:My id is andralex. http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/c1xqj0w AndreiOn 06/04/2011 02:10 AM, Caligo wrote:Could you please give a link to the post. I can't find it.And just to be fair to C++: g++ -O2 -m32 [VIRT: 94MB, RES: 92MB] real 0m24.567s user 0m24.500s sys 0m0.060sThanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/ Andrei
Jun 04 2011
Here's another run on win32 for bear's struct version: DMD optimized, GC enabled: 38s.015 DMD optimized, GC disabled: 30s.890
Jun 04 2011
AFAIK google also messed with the GC in their paper, so I'd say this is fair-game.
Jun 04 2011
On 6/4/2011 7:37 AM, Andrei Alexandrescu wrote:Thanks to all who participated to this. I've shared these results on reddit: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/Direct link: http://www.reddit.com/r/programming/comments/hqkwk/google_paper_comparing_performance_of_c_java/c1xqj0w
Jun 04 2011
On 4 June 2011 08:03, Caligo <iteronvexor gmail.com> wrote:On Fri, Jun 3, 2011 at 11:16 PM, dsimcha <dsimcha yahoo.com> wrote:Well, I'm not the least bit surprised by DMD outperforming GDC here. Mostly because the program produced by GDC will have 109+ extra array bounds checks, and 300+ extra assertions (hint: the program produced by DMD will have 0). --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';On 6/4/2011 12:01 AM, Caligo wrote:I don't like the '-inline' option, but here it is. =A0Besides, I usually use GDC or LDC2 and I was expecting them to outperform DMD because they usually do, but not in this case.Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, =A0RES: 174MB] real =A0 =A00m28.407s user =A0 =A00m28.330s sys =A0 =A0 0m0.070s DMD -O -release [VIRT: 94MB, =A0RES: 92MB] real =A0 =A00m43.232s user =A0 =A00m42.980s sys =A0 =A0 0m0.070s GDC -O3 [VIRT: 306MB, =A0RES: 295MB] real =A0 =A01m10.788s user =A0 =A01m10.570s sys =A0 =A0 0m0.190s LDC2 segmentation faultWhy not -inline on dmd?
Jun 04 2011
On 6/4/2011 12:01 AM, Caligo wrote:Gentoo/Linux [gcc version 4.4.5, DMD 2.52, latest GDC with GCC 4.4.5, and latest LDC2] g++ -O3 [VIRT: 185MB, RES: 174MB] real 0m28.407s user 0m28.330s sys 0m0.070s DMD -O -release [VIRT: 94MB, RES: 92MB] real 0m43.232s user 0m42.980s sys 0m0.070s GDC -O3 [VIRT: 306MB, RES: 295MB] real 1m10.788s user 1m10.570s sys 0m0.190s LDC2 segmentation faultOh, also, you way want to re-try the benchmark w/ 2.053. It looks rather allocation heavy and I substantially improved the GC performance for 2.053.
Jun 03 2011
Andrei:then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophileYour port segfaults DMD 2.053 with the -g flag (at least on linux). Andrei: You may want to point out on reddit that the code is approx. a 1 to 1 port of the C++ code and not specially tuned. Timon
Jun 04 2011
Andrei:then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophileOne simple but very powerful optimization is to minimize the runs of the GC. I have added a call to GC.disable(); in the beginning of main and then added a GC.collect(); after each 10 test runs. Results on my machine (32bit executables): C++ (-O2): 30.7s, ~170MB. D (-release -O -inline): 29.5s, ~520MB Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;). Timon
Jun 04 2011
Am 04.06.2011 18:25, schrieb Timon Gehr:What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much. What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D). Cheers, - DanielAndrei:then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas > from the original paper).Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And >The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophileOne simple but very powerful optimization is to minimize the runs of the GC. I have added a call to GC.disable(); in the beginning of main and then added a GC.collect(); after each 10 test runs. Results on my machine (32bit executables): C++ (-O2): 30.7s, ~170MB. D (-release -O -inline): 29.5s, ~520MB Ds GC needs to get faster. A concurrent GC would have hidden away most of the overhead on a multi-core processor ;). Timon
Jun 04 2011
Daniel Gibson wrote:What was your time for D without disabling the GC? Probably 40-50s? This certainly is a big improvement, I didn't think the GC slows it down that much. What'd be really interesting is the benchmark with a D-style implementation of the code (if I understood correctly the current versions are more or less direct translations of the C++ code to D). Cheers, - DanielWithout disabling the GC, it runs through in 38.2s. The D-style implementation would probably be about the same, bearophile has already replaced std::map/std::set with associative arrays and std::list with dynamic arrays. The algorithm cannot take advantage of Eg. array slicing. It solves a graph problem. I will try to tune the code some more. Timon
Jun 04 2011
On 06/04/2011 11:08 AM, Timon Gehr wrote:Better yet, feel free to contribute. The more of us participate, the better. AndreiAndrei:then I'll create one or two more optimized versions (one using a memory pool for the nodes, and one trying to apply some of the C++ improvement ideas> from the original paper).Far as I can tell D comes in the second place after C++ at run time. With optimizations and all it could get significantly closer.First version, with just classes, a bit better cleaned up: http://codepad.org/DggCx26d Second version, with all structs: http://codepad.org/etsLsZV5 Tomorrow I'll de-optimize it a bit replacing some structs with classes. And>The number of instances allocated: Class instances: SimpleLoop_counter 3_936_102 LoopStructureGraph_counter 15_051 UnionFindNode_counter 13_017_663 HavlakLoopFinder_counter 15_051 BasicBlockEdge_counter 378_036 BasicBlock_counter 252_013 MaoCFG_counter 1 UnionFindNode probably will give some gain if allocated from a pool. Later, bearophileYour port segfaults DMD 2.053 with the -g flag (at least on linux). Andrei: You may want to point out on reddit that the code is approx. a 1 to 1 port of the C++ code and not specially tuned. Timon
Jun 04 2011
On 6/3/2011 4:09 PM, Adam D. Ruppe wrote:This is similar to other benchmarks I've run in the past. Standard D builds beat standard C++ builds, but gcc's optimizer takes the lead vs dmd's optimizer.It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!
Jun 04 2011
Walter:It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophile
Jun 04 2011
On 6/4/2011 9:15 AM, bearophile wrote:Walter:That's probably right, for two reasons: 1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code. 2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophile
Jun 04 2011
Am 04.06.2011 16:54, schrieb dsimcha:On 6/4/2011 9:15 AM, bearophile wrote:Besides better optimization by the compiler, see Adam Ruppe's post, 3 posts up:Walter:That's probably right, for two reasons: 1. Other than the GC, D doesn't have any "hidden cost" features that would explain it being slower than C++ for similarly written code.It would be nice to figure out what is different. Try using the coverage analyzer and profiler for starters!There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests. Bye, bearophileOn my computer, the D version ran slightly faster (56 seconds vs 63s >for C++) without optimizations turned on.With optimizations turned on, C++ took a nice lead (28 seconds vs 53 seconds for D).So it seems like it's not all the GCs fault.2. A few posts back, it was noted that DMD2.053, which includes my GC optimizations, was substantially faster than 2.052, which doesn't.
Jun 04 2011
On 6/4/2011 6:15 AM, bearophile wrote:There are little differences and inefficiencies here and there, but in the second D version I think most of the performance difference over the C++ code is caused by the GC. I will do some tests.Easy to test, simply disable the gc.
Jun 04 2011
On 6/3/11 5:48 PM, bearophile wrote:Andrei:I think a better tail call optimization would be in order.Does anyone have the time and inclination to port the benchmark so we can take a look?My modified C++0x code: http://codepad.org/iBwINTkJ First D2 translation that seems to work: http://codepad.org/sEEFNlAd Notes on the translation: - The program crashes with zero error messages if you don't increase the stack. Time ago I have written an enhancement request on this: http://d.puremagic.com/issues/show_bug.cgi?id=6088- The final print of the "." done using write() doesn't perform the fflush. So after few tries I have used printf. I have not found a good way to flush. Suggestions welcome.Just call stdout.flush().- The C++ code uses classes, but they are often more like D structs. In the first D version to keep things simple I have used final classes everywhere. I will profile the code and I will create a more efficient version that uses structs where needed.Makes sense.- I have had to track down a null reference error (caused by the nodes array, that contains just references in the first D version). Nonnullable type modifiers avoids similar bugs at compile-time.Stay tuned. Good news coming soon. [snip] This is solid work. Looking forward to seeing measurements! Thanks, Andrei
Jun 03 2011
UnionFindNode struct instances probably doesn't need a memory pool, it's already allocated in (small) arrays. I have not found a simple enough way to allocate SimpleLoop struct instances with a memory pool. I have not found simple ways to use most of the C++ optimization hints of the original paper. The paper is badly written and its contents are worse. I think they have written this as an internal benchmark and only later have created a paper. I think they have used an old CPU because their PC farms often use similar old CPUs. I have created a D version a bit de-optimized, that uses classes instead of structs for the classes that are instantiated only few times. Generally the original C++ code is not good: it's low level where it doesn't need to be low level, and it wastes memory and time where it matters (like in tiny data structures with lot of overhead.) I do not generally write code like that. I don't have desire to work further on this code. I'd like a Set with a very handy API (copied from Python) in Phobos. I'd like the built-in associative arrays to have an optimization for very small number of items. Python dicts have an optimization for 8 items or less. I have done no attempts to tune the GC. I confirm that if you disable the GC and perform a collection only once in a while, the code gets faster. Regarding the improvements of the D GC, I remember I have found an optimization for LDC1 lot of time ago, but I don't remember it well. I think the GC scans the static memory too, etc, this slows down the GC. Google seems to care a lot about compilation times too. The D code compiles in about 1 second or less with full optimization. The C++0x code compiled with G++ doesn't come even close to that. D code compiled with DMD 2.053 with: dmd -O -release -inline -w -L/STACK:5000000 Using -g doesn't crash my code on Windows, so I suggest someone to minimize the code to find this bug. I also suggest someone to minimize the code for the 64 version to find the 64 bit DMD bug. C++0x code compiled with G++ 4.6 with: g++ -std=c++0x -Ofast -Wall -Wextra -s With LDC and GDC I suggest to look much better for the correct switches. Othrerwise this benchmark becomes even more meaningless (it's already not so meaningful. Google has not released the full C++ code). I have created a D version that uses a TinySet, following one of the C++ optimization hints, one of the D versions use it, but I don't currently have a good PC to actually take the timings: import std.typetuple: TypeTuple; extern(C) pure nothrow void exit (in int status); template Range(int stop) { static if (stop <= 0) alias TypeTuple!() Range; else alias TypeTuple!(Range!(stop-1), stop-1) Range; } struct TinySet(T, int maxLength) { T[maxLength] data; size_t length = 0; // readonly property void add(T x) { /*static*/ foreach (i; Range!maxLength) if (x == data[i]) return; if (length < maxLength) { data[this.length] = x; length++; } else exit(1); } T[] opSlice() { return data[0 .. this.length]; } } C++0x code a bit modified, uses unordered_map instead of silly map: http://codepad.org/9NEL9yPo D code with structs and classes (no manual deallocations), less ugly, probably this is the version to show to people: http://codepad.org/Xqg0Y26S Experimental D code with TinySet, with structs and classes (no manual deallocations): http://codepad.org/4ATOvFsV D code with only structs (no manual deallocations): http://codepad.org/SePFDAaT D code with only classes: http://codepad.org/k0DUzP3D Bye, bearophile
Jun 04 2011
void add(T x) { /*static*/ foreach (i; Range!maxLength) if (x == data[i]) return;Sorry, this code is wrong, it has to iterate from 0 to length. If some code doesn't have unittests then it's broken. Bye, bearophile
Jun 04 2011