digitalmars.D - D performance compared with other languages
- christopher diggins (15/15) May 11 2004 I have written an object oriented Heapsort in D at
- Andy Friesen (3/14) May 11 2004 The time decreases significantly if you use both -O and -release.
- Walter (4/19) May 11 2004 Make sure to compile with:
- christopher diggins (12/12) May 11 2004 Thanks for the feedback, Walter. I had only used -O but not -inline
- Helmut Leitner (5/13) May 11 2004 Each interface has a 4 byte/object cost.
- Ben Hinkle (8/20) May 11 2004 Every D object has a synchronized monitor that is right after the vtable...
- Walter (11/20) May 11 2004 Thanks! Could you also please hyperlink the "Digital Mars Compiler" to
- Daniel Horn (12/50) May 11 2004 That's a bit tedius, especially if your class may or may not need to be
- Walter (10/25) May 11 2004 bits
- -scooter- (11/13) May 12 2004 One would think so, but an excellent paper from OOPSLA '02 has hard evid...
- Kevin Bealer (9/22) May 12 2004 In a GC program, you can avoid GC overhead by keeping free lists of anyt...
- Scott Michel (5/18) May 13 2004 Any "modern" general-purpose allocator already does this since the late ...
- Kevin Bealer (10/28) May 13 2004 I've usually heard that phrase when talking about cloning or life/death
- -scooter- (14/31) May 17 2004 My point was that you become responsible for object lifetimes; you're
- Kevin Bealer (8/39) May 18 2004 I was thinking "locality". Okay, this makes sense. Yeah, you can use a...
- -scooter- (15/33) May 19 2004 You're thinking of customized slab allocators or separate heaps for a
- Kevin Bealer (23/56) May 20 2004 Which makes perfect sense. I was writing one because I had a neat idea ...
- Walter (8/20) May 12 2004 Storage
- christopher diggins (17/37) May 11 2004 --
- Walter (79/99) May 11 2004 msec.
- Jan-Eric Duden (4/7) May 12 2004 I am wondering if the difference of execution time is the result of usin...
- Walter (9/17) May 12 2004 I'd have to do testing to make sure, but I'm pretty sure it's due to
-
Jan-Eric Duden
(13/20)
May 12 2004
.) - christopher diggins (14/34) May 12 2004 sort
- Stewart Gordon (21/28) May 12 2004 Can you give an example of an O(n) case of quicksort?
- Jan-Eric Duden (5/12) May 12 2004 Yep, that's why sometimes heapsort is preferred over quicksort.
- Stewart Gordon (12/20) May 12 2004 By 'swaps' do you mean the swapping of an element with the hole into
- Kevin Bealer (26/41) May 12 2004 O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations ...
- Stewart Gordon (30/43) May 13 2004 You seldom get direct proportionality in practice. In fact, it means
-
J Anderson
(10/12)
May 13 2004
- Walter (4/6) May 13 2004 It can accept functions as parameters - both as function pointers, and a...
- J Anderson (4/14) May 14 2004 Cool, now for operators....
- Walter (3/18) May 14 2004 You can use the equivalent name for the operator functions.
- J Anderson (5/12) May 13 2004 If your talking about the website I provided, did you read that shell
- Stewart Gordon (9/12) May 13 2004 My talking?
- J Anderson (4/12) May 13 2004 Whoops, your right. Its the swap sort that is broken.
- Sean Kelly (6/8) May 12 2004 But some versions of heapsort need a considerable memory overhead to
- J Anderson (8/16) May 12 2004 Right! Even sort algorithms like short-circuit bubble sort can be
- Stewart Gordon (9/14) May 12 2004 Even better for nearly sorted data is bidirectional short-circuit bubble...
- C (6/18) May 12 2004 Hmm, I couldn't find anything on this do you have some more info ?
- J Anderson (11/33) May 12 2004 In a bubble sort small elements are *bubbled* up to the top of the
-
Stewart Gordon
(19/24)
May 13 2004
- resistor AT nospam DOT mac DOT com (26/33) May 12 2004 Here's something interesting for everyone. I took the heapsort.d exampl...
- J Anderson (5/41) May 12 2004 Interesting.
- resistor AT nospam DOT mac DOT com (49/56) May 12 2004 begin 0644 test.d
- =?iso-8859-1?q?Knud_S=F8rensen?= (3/8) May 12 2004 Why are your not using introspective sort ??
- Walter (3/11) May 12 2004 Actually, it does just that (switch to heapsort as necessary).
- =?ISO-8859-1?Q?J=F6rg_R=FCppel?= (7/20) May 12 2004 I think I've read in the docs that the D compiler autodetects if a
- Walter (6/24) May 12 2004 usage
- =?ISO-8859-1?Q?J=F6rg_R=FCppel?= (3/13) May 13 2004 Thanks, that's all I wanted to hear ;)
- Kevin Bealer (9/17) May 11 2004 If I have a container of objects that implement interface Printable(), h...
- christopher diggins (23/40) May 11 2004 bytes
- Kevin Bealer (12/17) May 11 2004 Heron's approach of non-inheritance binding is interesting; I take it t...
- christopher diggins (15/31) May 12 2004 I
- Walter (21/24) May 12 2004 fact
- christopher diggins (19/43) May 13 2004 that
- Andy Friesen (4/15) May 11 2004 This is completely tangental, but Microsoft's C++ compiler completes the...
- Matthew (4/19) May 11 2004 Why have you not used DMC++ for the Heron and C++? That way the comparis...
- christopher diggins (13/15) May 11 2004 would
- Walter (9/20) May 11 2004 comparison
- christopher diggins (10/32) May 13 2004 compilers
- C (4/18) May 11 2004 Impressive speeds on Heron! Id like to see so more benchmarks :).
I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.html TIA -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
christopher diggins wrote:I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.htmlThe time decreases significantly if you use both -O and -release. -- andy
May 11 2004
Make sure to compile with: -release -O -inline "christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7qvo6$1256$1 digitaldaemon.com...I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.html TIA -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result. There is still 4 bytes per object that I can't account for. I count 8 bytes for the vector data, 4 bytes for the pointer to the object, 4 bytes for the vtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for? Is there a way to allocate 100,000 objects at once rather then one by one? -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
christopher diggins wrote:Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result. There is still 4 bytes per object that I can't account for. I count 8 bytes for the vector data, 4 bytes for the pointer to the object, 4 bytes for the vtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for?Each interface has a 4 byte/object cost. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
May 11 2004
Every D object has a synchronized monitor that is right after the vtable. That is a pointer to a platform-specific mutex of some sort. Structs don't have any vtable or monitor so they would be the most efficient memory-wise but wouldn't support any dynamic binding or synchronization. "christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result. There is still 4 bytes per object that I can't account for. I count 8bytesfor the vector data, 4 bytes for the pointer to the object, 4 bytes forthevtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for? Is there a way to allocate 100,000 objects at once rather then one by one? -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result.Thanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you use 'final' on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).There is still 4 bytes per object that I can't account for. I count 8bytesfor the vector data, 4 bytes for the pointer to the object, 4 bytes forthevtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for?It's for the monitor.Is there a way to allocate 100,000 objects at once rather then one by one?Yes. Implement a new() function for the class, and have it parcel out bits from a preallocated array.
May 11 2004
Walter wrote:"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...That's a bit tedius, especially if your class may or may not need to be in an array (or you may wish to have the same class in 3 different packed arrays) clearly you could make some sort of template class that inherits and make that have a "new" operator that does the parceling... but it's a hack that only lets me have a single stack per template class (what if I want to have many stacks of said objects) I could have a global "current stack" pointer and set it to my next available stack, but this is beginning to sound more and more contrived. why not just do a structure then-- Do structs support static inheritance (I guess they will with mixins?)Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result.Thanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you use 'final' on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).There is still 4 bytes per object that I can't account for. I count 8bytesfor the vector data, 4 bytes for the pointer to the object, 4 bytes forthevtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for?It's for the monitor.Is there a way to allocate 100,000 objects at once rather then one by one?Yes. Implement a new() function for the class, and have it parcel out bits from a preallocated array.
May 11 2004
"Daniel Horn" <hellcatv hotmail.com> wrote in message news:c7ra0p$1hr4$1 digitaldaemon.com...bitsIs there a way to allocate 100,000 objects at once rather then one by one?Yes. Implement a new() function for the class, and have it parcel outI do the same sorts of optimizations in C++ when I want performance. Storage allocation is always a ripe target for performance tuning.from a preallocated array.That's a bit tedius, especially if your class may or may not need to be in an array (or you may wish to have the same class in 3 different packed arrays)clearly you could make some sort of template class that inherits and make that have a "new" operator that does the parceling... but it's a hack that only lets me have a single stack per template class (what if I want to have many stacks of said objects) I could have a global "current stack" pointer and set it to my next available stack, but this is beginning to sound more and more contrived.That's why the general purpose storage allocator is the default solution. But when speed is your master, writing custom allocators can yield big benefits, and that's why D supports doing it.why not just do a structure then--Structs can't inherit from an interface.Do structs support static inheritance (I guess they will with mixins?)Mixins have the potential to do this. Stay tuned!
May 11 2004
Walter wrote:I do the same sorts of optimizations in C++ when I want performance. Storage allocation is always a ripe target for performance tuning.One would think so, but an excellent paper from OOPSLA '02 has hard evidence to the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time. I haven't taken Berger's HeapLayers or reaps allocation libraries out for a spin yet, but it looks interesting. -scooter
May 12 2004
In article <c7tlva$22e8$1 digitaldaemon.com>, -scooter- says...Walter wrote:In a GC program, you can avoid GC overhead by keeping free lists of anything you allocate. The garbage collector won't run because you aren't calling new(). This was a tip for "real time" programming. Note that you are avoiding benefit the risk of old pointers to supposedly "new" objects, if it wasn't ready to be deleted. On the plus side, if you are not sure its garbage, dont put it on the list: the GC will eventually hunt it down if you are, in fact, leaking. KevinI do the same sorts of optimizations in C++ when I want performance. Storage allocation is always a ripe target for performance tuning.One would think so, but an excellent paper from OOPSLA '02 has hard evidence to the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time. I haven't taken Berger's HeapLayers or reaps allocation libraries out for a spin yet, but it looks interesting. -scooter
May 12 2004
Kevin Bealer wrote:In a GC program, you can avoid GC overhead by keeping free lists of anything you allocate. The garbage collector won't run because you aren't calling new(). This was a tip for "real time" programming. Note that you are avoiding benefit #also run the risk of old pointers to supposedly "new" objects, if it wasn't ready to be deleted. On the plus side, if you are not sure its garbage, dont put it on the list: the GC will eventually hunt it down if you are, in fact, leaking.Any "modern" general-purpose allocator already does this since the late 80s. Preventing free space fragmentation turns out to be the bigger problem. Besides, if you keep a free list, then you're into having to play "God" with object ctor and dtor. :-)
May 13 2004
In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...Kevin Bealer wrote:I've usually heard that phrase when talking about cloning or life/death issues... I'm not sure what rights are reserved to God when talking about writing software... If you mean that the free list can run very large and hog memory, you can of course provide a size bound. If it gets over N elements, clear the "top" pointer, or selectively drop elements down to N. Regarding fragmentation: does the D collector actually copy collect, or does it need to be conservative? I would suspect that using C pointers and copy collection are two great tastes that don't taste great together. KevinIn a GC program, you can avoid GC overhead by keeping free lists of anything you allocate. The garbage collector won't run because you aren't calling new(). This was a tip for "real time" programming. Note that you are avoiding benefit #also run the risk of old pointers to supposedly "new" objects, if it wasn't ready to be deleted. On the plus side, if you are not sure its garbage, dont put it on the list: the GC will eventually hunt it down if you are, in fact, leaking.Any "modern" general-purpose allocator already does this since the late 80s. Preventing free space fragmentation turns out to be the bigger problem. Besides, if you keep a free list, then you're into having to play "God" with object ctor and dtor. :-)
May 13 2004
Kevin Bealer wrote:In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...My point was that you become responsible for object lifetimes; you're responsible for their proper birth and death (and possibly inadvertant ressurection... ok, bad pun sequence.) So keeping your own free list means that you might have faster allocation, but you now become responsible for a lot more than just allocation. But as I pointed out, modern mallocs already do this.Any "modern" general-purpose allocator already does this since the late 80s. Preventing free space fragmentation turns out to be the bigger problem. Besides, if you keep a free list, then you're into having to play "God" with object ctor and dtor. :-)I've usually heard that phrase when talking about cloning or life/death issues... I'm not sure what rights are reserved to God when talking about writing software... If you mean that the free list can run very large and hog memory, you can of course provide a size bound. If it gets over N elements, clear the "top" pointer, or selectively drop elements down to N.Regarding fragmentation: does the D collector actually copy collect, or does it need to be conservative? I would suspect that using C pointers and copy collection are two great tastes that don't taste great together.Wrong type of fragmentation. :-) This happens when the allocator doesn't have specifically sized areanas for certain sized allocations (usually powers of 2, n > 4). If the allocator just tries to find the best fit, breaking up previously allocated blocks without coalescing adjacent freespace, you end up with a very fragmented memory map. -scooter
May 17 2004
In article <c8bg4j$1a4b$1 digitaldaemon.com>, -scooter- says...Kevin Bealer wrote:Okay, that makes sense.In article <c8075s$1ff$1 digitaldaemon.com>, Scott Michel says...My point was that you become responsible for object lifetimes; you're responsible for their proper birth and death (and possibly inadvertant ressurection... ok, bad pun sequence.) So keeping your own free list means that you might have faster allocation, but you now become responsible for a lot more than just allocation. But as I pointed out, modern mallocs already do this.Any "modern" general-purpose allocator already does this since the late 80s. Preventing free space fragmentation turns out to be the bigger problem. Besides, if you keep a free list, then you're into having to play "God" with object ctor and dtor. :-)I've usually heard that phrase when talking about cloning or life/death issues... I'm not sure what rights are reserved to God when talking about writing software... If you mean that the free list can run very large and hog memory, you can of course provide a size bound. If it gets over N elements, clear the "top" pointer, or selectively drop elements down to N.I was thinking "locality". Okay, this makes sense. Yeah, you can use a large program allocator that uses a large number of free lists and chops up blocks (completely unsuitable for small, memory tight, programs), or you can use small-piece allocators and tolerate fragmentation. The ugliest way is to tweak the algorithm to use the same sized object (by making buffers a certain size). KevinRegarding fragmentation: does the D collector actually copy collect, or does it need to be conservative? I would suspect that using C pointers and copy collection are two great tastes that don't taste great together.Wrong type of fragmentation. :-) This happens when the allocator doesn't have specifically sized areanas for certain sized allocations (usually powers of 2, n > 4). If the allocator just tries to find the best fit, breaking up previously allocated blocks without coalescing adjacent freespace, you end up with a very fragmented memory map.-scooter
May 18 2004
Kevin Bealer wrote:You're thinking of customized slab allocators or separate heaps for a particular object type or objects of the same size. Once you allocated from the custom heap, you return it and keep track of the free objects. Yup, modern mallocs (FreeBSD's libc version and Lea's, commonly found in Linux's glibc and usable under Win32) do this already; that's why they have arenas. The allocators you're thinking about that has really nasty allocation algorithms are the old Microsoft C compiler's malloc and the old Kingsley malloc in past BSDs. Basically, there's really not much that's exciting about memory allocation. Berger's paper is probably the most exciting research I've seen in 15 years and it solves an interesting problem when you really need custom allocation or want a region of related objects allocated, initialized, freed and destroyed as a region. The current GP allocators do a really good job and tweaking won't actually buy you much extra performance.I was thinking "locality". Okay, this makes sense. Yeah, you can use a large program allocator that uses a large number of free lists and chops up blocks (completely unsuitable for small, memory tight, programs), or you can use small-piece allocators and tolerate fragmentation. The ugliest way is to tweak the algorithm to use the same sized object (by making buffers a certain size).Regarding fragmentation: does the D collector actually copy collect, or does it need to be conservative? I would suspect that using C pointers and copy collection are two great tastes that don't taste great together.Wrong type of fragmentation. :-) This happens when the allocator doesn't have specifically sized areanas for certain sized allocations (usually powers of 2, n > 4). If the allocator just tries to find the best fit, breaking up previously allocated blocks without coalescing adjacent freespace, you end up with a very fragmented memory map.
May 19 2004
In article <c8hb2a$1o63$1 digitaldaemon.com>, -scooter- says...Kevin Bealer wrote:Which makes perfect sense. I was writing one because I had a neat idea vis a vis persistent objects and relative pointers. You can create data in a subheap, and mmap it in, providing an instant program "state load" effect. The relative pointers keep all objects connected. Except of course virtual pointers; also you need to rewrite every class to use the sub heap, everything on the subheap needs to be connected to a "top" pointer, etc. It works, its good, but its special purpose. A smart language space implementation could take advantage of such a design, but it gets messy in portable-application space. The other time you might want one is when you want a set of data (i.e. linked list) to have really good locality. But the solution to that is a vector or plex. (By plex, I mean a list-of-small-vectors, with a list-like interface. They "should" have good performance in many common cases; they were used in a GNU container library that went away when STL came out.) Vectors are probably good enough for this purpose. I guess the general philosophy of all that is that we should start treating heap storage the way we treat FS storage. If the system runs out of main memory, and all of your data is organized by memory pages, then OS page-out wont cause thrashing until every page of system memory is in-use. Locality is treated as foremost in disk container algorithm design. Swap thrashing occurs because it is not treated that way in heap container class design. To a degree, in most cases, always ski in control, etc. KevinYou're thinking of customized slab allocators or separate heaps for a particular object type or objects of the same size. Once you allocated from the custom heap, you return it and keep track of the free objects. Yup, modern mallocs (FreeBSD's libc version and Lea's, commonly found in Linux's glibc and usable under Win32) do this already; that's why they have arenas. The allocators you're thinking about that has really nasty allocation algorithms are the old Microsoft C compiler's malloc and the old Kingsley malloc in past BSDs. Basically, there's really not much that's exciting about memory allocation. Berger's paper is probably the most exciting research I've seen in 15 years and it solves an interesting problem when you really need custom allocation or want a region of related objects allocated, initialized, freed and destroyed as a region. The current GP allocators do a really good job and tweaking won't actually buy you much extra performance.I was thinking "locality". Okay, this makes sense. Yeah, you can use a large program allocator that uses a large number of free lists and chops up blocks (completely unsuitable for small, memory tight, programs), or you can use small-piece allocators and tolerate fragmentation. The ugliest way is to tweak the algorithm to use the same sized object (by making buffers a certain size).Regarding fragmentation: does the D collector actually copy collect, or does it need to be conservative? I would suspect that using C pointers and copy collection are two great tastes that don't taste great together.Wrong type of fragmentation. :-) This happens when the allocator doesn't have specifically sized areanas for certain sized allocations (usually powers of 2, n > 4). If the allocator just tries to find the best fit, breaking up previously allocated blocks without coalescing adjacent freespace, you end up with a very fragmented memory map.
May 20 2004
"-scooter-" <scottm cs.ucla.edu> wrote in message news:c7tlva$22e8$1 digitaldaemon.com...Walter wrote:StorageI do the same sorts of optimizations in C++ when I want performance.evidenceallocation is always a ripe target for performance tuning.One would think so, but an excellent paper from OOPSLA '02 has hardto the contrary: Reconsidering custom memory allocation (Berger, et. al). Paper is at http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf The only time it seems that a custom memory allocator wins is in the "bulk free" case, when an entire heap of objects (like a compiler's AST for a class <hint>) gets deallocated. Otherwise, custom storage management is almost a complete waste of time. I haven't taken Berger's HeapLayers or reaps allocation libraries out foraspin yet, but it looks interesting.The article is right in that sometimes performance optimizations yield unexpected results. Just goes to show one should rely on a good performance profiler when doing optimization.
May 12 2004
-- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com "Walter" <newshound digitalmars.com> wrote in message news:c7r9ku$1h7b$1 digitaldaemon.com..."christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...usageThanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memoryresult.dropped to 2MB on my system. I have now updated the web page as aThanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you use 'final' on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).Done and done. The final keywords sped it up even more from 710 to 610 msec.isThere is still 4 bytes per object that I can't account for. I count 8bytesfor the vector data, 4 bytes for the pointer to the object, 4 bytes forthevtable in the object, which only brings me to 16 bytes per object, whatWhat about the 4 bytes on top of that for the interfaces what are they for? Any plans on changing this in the future, or making a compile switch?the other 4 bytes for?It's for the monitor.I won't be including that in the example then, it seems like too much of a roundabout way to do things for the scope of the program I included. Thanks again Walter! Christopher Diggins http://www.heron-language.comIs there a way to allocate 100,000 objects at once rather then one by one?Yes. Implement a new() function for the class, and have it parcel out bits from a preallocated array.
May 11 2004
"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7rfbm$1q4l$1 digitaldaemon.com...Done and done. The final keywords sped it up even more from 710 to 610msec.forThere is still 4 bytes per object that I can't account for. I count 8bytesfor the vector data, 4 bytes for the pointer to the object, 4 byteswhatthevtable in the object, which only brings me to 16 bytes per object,isfor? Each interface derived from has a vptr allocated for it.What about the 4 bytes on top of that for the interfaces what are theythe other 4 bytes for?It's for the monitor.Any plans on changing this in the future, or making a compile switch?No, it's needed to get reasonable performance out of interfaces.bitsIs there a way to allocate 100,000 objects at once rather then one by one?Yes. Implement a new() function for the class, and have it parcel outYou're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached. begin 666 sort2.d M('9O:60 4V5T6%DH:6YT(' L(&EN="!Y*2![(&UX(#T >#L (&UY(#T >3L M?3L-"B 9FEN86P :6YT($=E=% H*2![(')E='5R;B!M>#L ?3L-"B 9FEN M86P :6YT($=E=%DH*2![(')E='5R;B!M>3L ?3L-"B 9FEN86P :6YT($=E M=$UA9VYI='5D92 I('L <F5T=7)N("AM>"IM>"D *R H;7DJ;7DI.R!].PT* M92A/8FIE8W0J(' I('L <F5T=7)N($=E=$UA9VYI='5D92 I("T *&-A<W0H M3V)J96-T*B!X*2![(')E='5R;B!'971-86=N:71U9&4H*2 M("AC87-T*%9E M('9O:60 4W=A<"AI;G0 :2P :6YT(&HI.PT*("!A8G-T<F%C="!I;G0 0V]M M<&%R92AI;G0 :2P :6YT(&HI.PT*("!A8G-T<F%C="!I;G0 0V]U;G0H*3L- M"GT-"F-L87-S($%R<F%Y*%0I(#H M:60 4V5T3&5N9W1H*&EN="!N*2![(&T /2!N97< 5%MN73L 9F]R*&EN="!I M/3 [(&D\;CL :2LK*2!M6VE=(#T M070H:6YT(&DL(%0 >"D >R!M6VE=(#T >#L ?3L-"B 9FEN86P =F]I9"!3 M=V%P*&EN="!I+"!I;G0 :BD >R!4('1M<" ]($=E=$%T*&DI.R!3971!="AI M;VUP87)E*&EN="!I+"!I;G0 :BD >R!R971U<FX 1V5T070H:2DN0V]M<&%R M92AC87-T*$]B:F5C="HI1V5T070H:BDI.R!].PT*("!F:6YA;"!I;G0 0V]U M;G0H*2![(')E='5R;B!M+FQE;F=T:#L M($AE87!I9GDH25-O<G1A8FQE(' L(&EN="!I+"!I;G0 ;DAE87!3:7IE*0T* M>PT*("!I;G0 ;DQE9G0 /2!I("H ,CL-"B :6YT(&Y2:6=H=" ](&Y,969T M("L ,3L-"B :6YT(&Y,87)G97-T(#T :3L-"B :68 *"AN3&5F=" \(&Y( M96%P4VEZ92D )B8 *' N0V]M<&%R92AN3&5F="P ;DQA<F=E<W0I(#X ,"DI M(#P M(#X ,"DI('L-"B ("!N3&%R9V5S=" ](&Y2:6=H=#L-"B ?0T*("!I9B H M(" 2&5A<&EF>2AX+"!N3&%R9V5S="P ;DAE87!3:7IE*3L-"B ?0T*?0T* M4V]R=&%B;&4 >"D-"GL-"B 0G5I;&1(96%P*' I.PT*("!F;W( *&EN="!I M(#T M+"!I+3$I.PT*(" (&EF("AN5&UP(#P ,"D-"B (" (')E='5R;B!F86QS M(&-O;G-T(&EN="!!4E)!65]325I%(#T M<C)$/BD /2 E9%QN(BP *'8N8VQA<W-I;F9O+FEN:70N;&5N9W1H("H 82YL M5F5C=&]R,D0 =CL 82D-"B >PT*(" ('8 /2!N97< 5F5C=&]R,D0H*3L- M,B ](&=E=%540W1I;64H*3L-"B ("!I;G0 ;G5M7W1I8VMS(#T M,3L-"B ("!F;&]A="!N=6U?;7-E8R ]("AC87-T*&9L;V%T*6YU;5]T:6-K M<R O(&-A<W0H9FQO870I5&EC:W-097)396-O;F0I("H 8V%S="AF;&]A="DQ M+R]P<FEN=&8H(F%R<F%Y(&ES('-O<G1E9" ]("5B7&XB+"!)<U-O<G1E9"AA ` endfrom a preallocated array.I won't be including that in the example then, it seems like too much of a roundabout way to do things for the scope of the program I included. Thanks again Walter!
May 11 2004
You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.I am wondering if the difference of execution time is the result of using different algorithms (quicksort or heapsort). Or can the code for the sort property be better optimized than a "normal" implementation of a sort algorithm?
May 12 2004
"Jan-Eric Duden" <jeduden whisset.com> wrote in message news:c7sioo$cvh$1 digitaldaemon.com...I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.I am wondering if the difference of execution time is the result of using different algorithms (quicksort or heapsort). Or can the code for the sort property be better optimized than a "normal" implementation of a sort algorithm?
May 12 2004
I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over,andI wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in itIfound 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup<g>.) I thought qsort and heapsort work about at the same speed. I thought the most important difference between a heapsort and q-sort implementation in C is the order of the algorithms: - heapsort has roughly the same performance for all array instanciations O(n*log n) - qsort is a lot slower O(n^2) on some instanciations and a lot faster on almost sorted arrays O(n). the average is of course: O(n*log n) Maybe the difference is that the C q-sort implementation is highly optimizied and the heapsort implementation that was used is not?
May 12 2004
"Jan-Eric Duden" <jeduden whisset.com> wrote in message news:c7spim$n4h$1 digitaldaemon.com...sortI'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I builtonceinto arrays is because people implement inefficient sorts over and over,andI wanted to make it real easy to use the best algorithm. (A job I haditwas to optimize a large program written by others; in poking around inIBoth quicksort and heapsort are O(n log n) on average but the constants involved in Quicksort are much better. Quicksort though depends a lot on whether the partitioning subroutine is balanced. The differences are discussed at length in Introduction to Algorithms by Cormen, Leiserson and Rivest ( http://theory.lcs.mit.edu/~clr/ ) -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.comfound 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup<g>.) I thought qsort and heapsort work about at the same speed. I thought the most important difference between a heapsort and q-sort implementation in C is the order of the algorithms: - heapsort has roughly the same performance for all array instanciations O(n*log n) - qsort is a lot slower O(n^2) on some instanciations and a lot faster on almost sorted arrays O(n). the average is of course: O(n*log n) Maybe the difference is that the C q-sort implementation is highly optimizied and the heapsort implementation that was used is not?
May 12 2004
christopher diggins wrote: <snip>Can you give an example of an O(n) case of quicksort?- qsort is a lot slower O(n^2) on some instanciations and a lot faster on almost sorted arrays O(n). the average is of course: O(n*log n)<snip> Yes, but in the worst case, heapsort wins O(n log n) to O(n^2). When benchmarking a sorting algorithm, it's helpful to try a number of random cases, as well as the control cases of already sorted and already sorted backwards. If you're benchmarking algorithms, rather than compilers or languages, you should try to use the same compiler, the same language and the same compiler options. As such, it makes limited sense to compare a ready-made C library function of one algorithm with your own implementation of another. Of course, if you're comparing languages/compilers, you should compare implementations of the same algorithm, and set as similar levels of optimisation as possible (ITMS). Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.Maybe the difference is that the C q-sort implementation is highly optimizied and the heapsort implementation that was used is not?Both quicksort and heapsort are O(n log n) on average
May 12 2004
Can you give an example of an O(n) case of quicksort?A sorted array. You only get O(n) in a slightly modified version of q-sort that quits if no swaps occured.Yep, that's why sometimes heapsort is preferred over quicksort. My question was more related to the constants. I wasn't aware that heapsort is that much slower.<snip> Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).Maybe the difference is that the C q-sort implementation is highly optimizied and the heapsort implementation that was used is not?Both quicksort and heapsort are O(n log n) on average
May 12 2004
Jan-Eric Duden wrote:By 'swaps' do you mean the swapping of an element with the hole into which the key element is to go? How can one element being already in the right place imply that the whole array is sorted? <snip>Can you give an example of an O(n) case of quicksort?A sorted array. You only get O(n) in a slightly modified version of q-sort that quits if no swaps occured.Yep, that's why sometimes heapsort is preferred over quicksort. My question was more related to the constants. I wasn't aware that heapsort is that much slower.What constants? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 12 2004
In article <c7tbbi$1i8q$1 digitaldaemon.com>, Stewart Gordon says...Jan-Eric Duden wrote:O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations required is affected by the size of the input set. This is called "big O notation". We say a sort takes O(N^2) when the time required is proportional to the square of the input set size (N). This is called "order of N squared". The real speed of the algorithm is something like K * N^2. If one algorithm is twice as fast as another for any value of N, then it is said to have K=2 while the other has K=1. K is the constant. All of this analysis ignores effects like cache locality, because they are difficult to summarize in any simple and general way. Normally, O(...), or "order" is the most important criteria. If O(...) is the same for both designs, then the secondary effects apply. For large data sets, cache and locality issues are most important, whereas for small data sets, the "constant" is probably more important. "Small" here probably means "fits, or almost fits, in memory". Special data sets can trip up some algorithms, for example, carelessly written qsort will take forever on already-sorted data. Bubble sort is the classic "bad" sorting algorithm. It still has its defenders, for special cases. Like insertion sort, it probably survives because it is easy to code for small hand written linked lists in languages like C. Often large, old, programs in C will have dozens or hundreds of data structures of the type "small-structs-in-a-short-list", half of these will have a ten line hand coded insertion sort (shudder). "Don't get out of the boat". I always liked "shell sorts" and "natural merge sorts" but noone uses those any more. Probably because they're really slow... oh well to each his own I guess. KevinBy 'swaps' do you mean the swapping of an element with the hole into which the key element is to go? How can one element being already in the right place imply that the whole array is sorted? <snip>Can you give an example of an O(n) case of quicksort?A sorted array. You only get O(n) in a slightly modified version of q-sort that quits if no swaps occured.Yep, that's why sometimes heapsort is preferred over quicksort. My question was more related to the constants. I wasn't aware that heapsort is that much slower.What constants?
May 12 2004
Kevin Bealer wrote: <snip>O(N), O(N^2), O(N*lnN) and so on, describe how the number of operations required is affected by the size of the input set. This is called "big O notation". We say a sort takes O(N^2) when the time required is proportional to the square of the input set size (N). This is called "order of N squared".You seldom get direct proportionality in practice. In fact, it means that there exists some K for which F(N) < K*N^2, for all N>N0 for some N0. And if you replace < with >, you get big Omega notation. And if you consider both together, you get big Theta notation. I guess that people commonly use big O when they really mean big Theta, probably because O is on most keyboards/character sets, and considering that anything Theta(N^2) is O(N^2) anyway.The real speed of the algorithm is something like K * N^2. If one algorithm is twice as fast as another for any value of N, then it is said to have K=2 while the other has K=1. K is the constant.Yes. Though it would depend on such things as the relative costs of comparison and assignment/swapping. <snip>Bubble sort is the classic "bad" sorting algorithm. It still has its defenders, for special cases. Like insertion sort, it probably survives because it is easy to code for small hand written linked lists in languages like C.I guess for linked lists, insertion sort is as easy or perhaps easier to code than bubble sort. Just thinking about it, on a linked list, insertion sort has the advantage that an insertion is O(1). OTOH, on an array it has the advantage that a search for the right place is O(log N). I suppose the constant for the whole alg is smaller for the linked list.... I guess selection sort (or shaker sort, which I hadn't heard of before but is a perfectly intuitive improvement I had thought up) is equally easy and thus defendable. It even has its strenghts, such as sorting tournament crosstables.... <snip>I always liked "shell sorts" and "natural merge sorts" but noone uses those any more. Probably because they're really slow... oh well to each his own I guess.Strange - JA's source shows shell sort being marginally quicker than heap sort. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 13 2004
Stewart Gordon wrote:Yes. Though it would depend on such things as the relative costs of comparison and assignment/swapping.<snip> On of the great things about ADA generics is that you could easily replace the comparison operator and do tests on how many comparisons were actually performed. Of course you could box all the variables you use in the sort algorithm but that's just not the same. I wish D templates could accept operators/functions as parameters. At the very least its a good way to profile that particular aspect of an algorithm. -- -Anderson: http://badmama.com.au/~anderson/
May 13 2004
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:c7vnbp$2bvs$1 digitaldaemon.com...I wish D templates could accept operators/functions as parameters.It can accept functions as parameters - both as function pointers, and as aliases.
May 13 2004
Walter wrote:"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:c7vnbp$2bvs$1 digitaldaemon.com...Cool, now for operators.... -- -Anderson: http://badmama.com.au/~anderson/I wish D templates could accept operators/functions as parameters.It can accept functions as parameters - both as function pointers, and as aliases.
May 14 2004
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:c81tqo$2h9g$1 digitaldaemon.com...Walter wrote:You can use the equivalent name for the operator functions."J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:c7vnbp$2bvs$1 digitaldaemon.com...Cool, now for operators....I wish D templates could accept operators/functions as parameters.It can accept functions as parameters - both as function pointers, and as aliases.
May 14 2004
Stewart Gordon wrote:If your talking about the website I provided, did you read that shell sort was slightly broken? -- -Anderson: http://badmama.com.au/~anderson/I always liked "shell sorts" and "natural merge sorts" but noone uses those any more. Probably because they're really slow... oh well to each his own I guess.Strange - JA's source shows shell sort being marginally quicker than heap sort. Stewart.
May 13 2004
J Anderson wrote: <snip>If your talkingMy talking?about the website I provided, did you read that shell sort was slightly broken?No. I still don't. Where's that little bit of info buried then? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 13 2004
Stewart Gordon wrote:J Anderson wrote: <snip>Whoops, your right. Its the swap sort that is broken. -- -Anderson: http://badmama.com.au/~anderson/If your talkingMy talking?about the website I provided, did you read that shell sort was slightly broken?No. I still don't. Where's that little bit of info buried then? Stewart.
May 13 2004
Stewart Gordon wrote:Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).But some versions of heapsort need a considerable memory overhead to run. In general, quicksort is the better choice for average sorting situations. People with special needs generally also know how to code for those needs :) Sean
May 12 2004
Sean Kelly wrote:Stewart Gordon wrote:Right! Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application. Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array. -- -Anderson: http://badmama.com.au/~anderson/Yes, but in the worst case, heapsort wins O(n log n) to O(n^2).But some versions of heapsort need a considerable memory overhead to run. In general, quicksort is the better choice for average sorting situations. People with special needs generally also know how to code for those needs :) Sean
May 12 2004
J Anderson wrote: <snip>Right! Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application. Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array.Even better for nearly sorted data is bidirectional short-circuit bubble sort. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 12 2004
Even better for nearly sorted data is bidirectional short-circuit bubble sort.Hmm, I couldn't find anything on this do you have some more info ? Charlie On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon <smjg_1998 yahoo.com> wrote:J Anderson wrote: <snip>-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/Right! Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application. Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array.Even better for nearly sorted data is bidirectional short-circuit bubble sort. Stewart.
May 12 2004
C wrote:In a bubble sort small elements are *bubbled* up to the top of the array. In a bidirectional sort elements sink down to the bottom as well. Of course short-circuit bubble sort still wins in some circumstances. Here's an interesting page on sorting (includes bidirectional): http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html BTW: In uni I had to do a project with 18 completely different sort algorithms. So I really got to know them well. -- -Anderson: http://badmama.com.au/~anderson/Even better for nearly sorted data is bidirectional short-circuit bubble sort.Hmm, I couldn't find anything on this do you have some more info ? Charlie On Wed, 12 May 2004 17:09:30 +0100, Stewart Gordon <smjg_1998 yahoo.com> wrote:J Anderson wrote: <snip>Right! Even sort algorithms like short-circuit bubble sort can be useful if they are used in right application. Short-circuit bubble sort is most useful when you know that most-of-the the array is sorted because then you get o(n). With standard quicksort you'd get O(n^2) for a sorted array.Even better for nearly sorted data is bidirectional short-circuit bubble sort. Stewart.
May 12 2004
C wrote:<snip> Several years ago I wrote a simple database application. When adding a record, it would add it to the end of the file. When deleting one it would fill the gap with the record from the end. When updating it would just update in place. The sort implementation was a bidirectional bubble sort. Hence the records added since the last sort would bubble up to their correct positions, and the records moved to fill gaps would bubble down back towards the end. Updated records (where the change rendered the record out of order) could bubble in either direction. The algorithm was also quite suited to the program's rather simplistic dynamic record loading. At a time it would keep 50 consecutive records in memory, moving the 'window' as necessary. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.Even better for nearly sorted data is bidirectional short-circuit bubble sort.Hmm, I couldn't find anything on this do you have some more info ?
May 13 2004
Here's something interesting for everyone. I took the heapsort.d example, redid it to be a bit more D- like (used a true array rather than an array class, used opCmp instead of a Compare method, etc.) Anyways, I decided to test my version against Walter's QuickSort. Here are the results (the QuickSort and HeapSort files are identical except one calls HeapSort(a) and the other calls a.sort): DMD QuickSort - 93 msecs DMD HeapSort - 62 msecs GDC QuickSort - 76 msecs GDC HeapSort - 62 msecs The times are more or less constant across runs. Only the GDC HeapSort varies, and it only be an msec or two. This brings to light two things: First, that the HeapSort is potentially more viable than the QuickSort. And secondly, that the GDC backend is a good bit more powerful than the DMD backend. Also, I'm pretty sure that GDC's libphobos has debugging enabled, so it might be able to go faster! Comment? Thoughts? Owen P.S. I've attached the code for this test, in case I've made a silly mistake that is favoring the HeapSort. In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)
May 12 2004
resistor AT nospam DOT mac DOT com wrote:Here's something interesting for everyone. I took the heapsort.d example, redid it to be a bit more D- like (used a true array rather than an array class, used opCmp instead of a Compare method, etc.) Anyways, I decided to test my version against Walter's QuickSort. Here are the results (the QuickSort and HeapSort files are identical except one calls HeapSort(a) and the other calls a.sort): DMD QuickSort - 93 msecs DMD HeapSort - 62 msecs GDC QuickSort - 76 msecs GDC HeapSort - 62 msecs The times are more or less constant across runs. Only the GDC HeapSort varies, and it only be an msec or two. This brings to light two things: First, that the HeapSort is potentially more viable than the QuickSort. And secondly, that the GDC backend is a good bit more powerful than the DMD backend. Also, I'm pretty sure that GDC's libphobos has debugging enabled, so it might be able to go faster! Comment? Thoughts? Owen P.S. I've attached the code for this test, in case I've made a silly mistake that is favoring the HeapSort. In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...Interesting. It would also be interesting to see how something like radix sort compares. -- -Anderson: http://badmama.com.au/~anderson/I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)
May 12 2004
Oops. Forgot the attachment. Here it is. In article <c7sncs$k3e$1 digitaldaemon.com>, Walter says...I'd have to do testing to make sure, but I'm pretty sure it's due to quicksort being algorithmically superior. One of the reasons I built sort into arrays is because people implement inefficient sorts over and over, and I wanted to make it real easy to use the best algorithm. (A job I had once was to optimize a large program written by others; in poking around in it I found 3 different hacky implementations of bubblesort. Replaced them all with a call to C's qsort(), and I was an instant hero for the speedup <g>.)begin 0644 test.d M:6UP;W)T('-T9"YR86YD;VT["FEM<&]R="!S=&0N9&%T93L*"F-L87-S(%9E M8W1O<C)$"GL*("!F:6YA;"!V;VED(%-E=%A9*&EN="!X+"!I;G0 >2D >R!M M>"`](' [("!M>2`]('D[('T["B` 9FEN86P :6YT($=E=% H*2![(')E='5R M;B!M>#L ?3L*("!F:6YA;"!I;G0 1V5T62 I('L <F5T=7)N(&UY.R!].PH M(&9I;F%L(&EN="!'971-86=N:71U9&4H*2![(')E='5R;B`H;7 J;7 I("L M*&UY*FUY*3L ?3L*"B` 9FEN86P :6YT(&]P0VUP*%9E8W1O<C)$('8I('L* M("` (')E='5R;B!'971-86=N:71U9&4H*2`M('8N1V5T36%G;FET=61E*"D[ M"B` ?0H*("!I;G0 ;7 ["B` :6YT(&UY.PI]" IV;VED($AE87!I9GDH3V)J M96-T6UT >"P :6YT(&DL(&EN="!N2&5A<%-I>F4I"GL*("!I;G0 ;DQE9G0 M/2!I("H ,CL*("!I;G0 ;E)I9VAT(#T ;DQE9G0 *R`Q.PH (&EN="!N3&%R M9V5S="`](&D["B` :68 *"AN3&5F="`\(&Y(96%P4VEZ92D )B8 *'A;;DQE M9G1=(#X ('A;;DQA<F=E<W1=*2D >PH ("` ;DQA<F=E<W0 /2!N3&5F=#L* M("!]"B` :68 *"AN4FEG:'0 /"!N2&5A<%-I>F4I("8F("AX6VY2:6=H=%T M/B!X6VY,87)G97-T72DI('L*("` (&Y,87)G97-T(#T ;E)I9VAT.PH ('T* M("!I9B`H;DQA<F=E<W0 (3T :2D >PH ("` 3V)J96-T('0Q(#T >%MI73L* M("` ('A;:5T /2!X6VY,87)G97-T73L*("` ('A;;DQA<F=E<W1=(#T M"B` ("!(96%P:69Y*' L(&Y,87)G97-T+"!N2&5A<%-I>F4I.PH ('T*?0IV M;VED($)U:6QD2&5A<"A/8FIE8W1;72!X*0I["B` 9F]R("AI;G0 :2`]("AX M(&DL(' N;&5N9W1H("T ,2D["GT*=F]I9"!(96%P4V]R="A/8FIE8W1;72!X M*0I["B` 0G5I;&1(96%P*' I.PH (&9O<B`H:6YT(&D /2!X+FQE;F=T:"`M M>%LP72`]('A;:5T["B` ("!X6VE=(#T M(&DI.PH ('T*?0IB;V]L($ES4V]R=&5D*$]B:F5C=%M=(' I('L*("!F;W( M*&EN="!I/3$[(&D\>"YL96YG=& [(&DK*RD*("![" H ("` :68 *'A;:5T M/"!X6VDM,5TI"B` ("` (')E='5R;B!F86QS93L*("!]"B` <F5T=7)N('1R M=64["GT*=F]I9"!M86EN*"D*>PH (&-O;G-T(&EN="!!4E)!65]325I%(#T M25I%73L*("`*("!P<FEN=&8H(G-I>F5O9BA696-T;W(R1%M=*2`]("5D7&XB M+"!A+FQE;F=T:"`J(&%;,%TN<VEZ92D["B` <')I;G1F*")S;W)T:6YG("5D M>2!T;R!R86YD;VT =F%L=65S"B` 9F]R96%C:"`H:6YO=70 5F5C=&]R,D0 M=F5C.R!A*2!["B` ("!V96, /2!N97< 5F5C=&]R,D0H*3L*("` ('9E8RY3 M4V]R=&5D*&$I*0H (`EP<FEN=&8H(F%R<F%Y(&ES('-O<G1E9%QN(BD["B` M>PH ("` 9%]T:6UE('0Q(#T M="AA*3L*("` (&$N<V]R=#L*("` (&1?=&EM92!T,B`](&=E=%540W1I;64H M*3L*("` (&EN="!N=6U?=&EC:W, /2!T,B`M('0Q.PH ("` 9FQO870 ;G5M M7VUS96, /2`H8V%S="AF;&]A="EN=6U?=&EC:W, +R!C87-T*&9L;V%T*51I M8VMS4&5R4V5C;VYD*2`J(&-A<W0H9FQO870I,3`P,#L*("` ('!R:6YT9B B M=&EM92!E;&%P<V5D("AI;B!M<V5C*2`]("5F7&XB+"!N=6U?;7-E8RD["B` M?0H (&EF("A)<U-O<G1E9"AA*2D*("` ('!R:6YT9B B87)R87D :7, <V]R M=&5D7&XB*3L*("!P<FEN=&8H(DAA=FEN9R!A(&AA<'!Y(&1A>5QN(BD["GT* !"F5D ` end
May 12 2004
On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.Why are your not using introspective sort ?? http://www.nist.gov/dads/HTML/introspectiveSort.html
May 12 2004
"Knud Sørensen" <knud NetRunner.all-technology.com> wrote in message news:pan.2004.05.12.22.37.18.149835 NetRunner.all-technology.com...On Tue, 11 May 2004 14:47:58 -0700, Walter wrote:Actually, it does just that (switch to heapsort as necessary).You're welcome. You should also note that arrays have a built-in sort property, which uses quicksort. It'll blow away heapsort <g>. Doing that cuts the sort time on my machine from 453 ms to 63 ms. Source attached.Why are your not using introspective sort ?? http://www.nist.gov/dads/HTML/introspectiveSort.html
May 12 2004
Walter wrote:"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...I think I've read in the docs that the D compiler autodetects if a function should be virtual or not. Why is there such a speedgain by explicitely marking methods as final, when the compiler can do this implicitely? Regards, JörgThanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result.Thanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you use 'final' on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).
May 12 2004
"Jörg Rüppel" <joerg sharky-x.de> wrote in message news:c7u7ru$4dt$1 digitaldaemon.com...Walter wrote:usage"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7r63p$1bv1$1 digitaldaemon.com...Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memoryresult.dropped to 2MB on my system. I have now updated the web page as a'final'Thanks! Could you also please hyperlink the "Digital Mars Compiler" to www.digitalmars.com/d/ ? You'll also get faster results if you useBecause the compiler is not sophisticated enough to do it automatically yet.on all member functions that are not overrided (essentially all the functions you don't mark as 'virtual' in the C++ version).I think I've read in the docs that the D compiler autodetects if a function should be virtual or not. Why is there such a speedgain by explicitely marking methods as final, when the compiler can do this implicitely?
May 12 2004
Thanks, that's all I wanted to hear ;) Regards, JörgI think I've read in the docs that the D compiler autodetects if a function should be virtual or not. Why is there such a speedgain by explicitely marking methods as final, when the compiler can do this implicitely?Because the compiler is not sophisticated enough to do it automatically yet.
May 13 2004
In article <c7r63p$1bv1$1 digitaldaemon.com>, christopher diggins says...Thanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result. There is still 4 bytes per object that I can't account for. I count 8 bytes for the vector data, 4 bytes for the pointer to the object, 4 bytes for the vtable in the object, which only brings me to 16 bytes per object, what is the other 4 bytes for? Is there a way to allocate 100,000 objects at once rather then one by one?If I have a container of objects that implement interface Printable(), how does Heron determine which are type Circle and which are type Square, without the use of a vtable? If the language cannot determine which type is which, then you should remove the word "virtual" from the C++ version, and put "final" in the Java (and D?) versions: if the container can only have one type of object, then C++ does not need the virtual keyword, and including it kind of "rigs" the demo, right? Kevin
May 11 2004
"Kevin Bealer" <Kevin_member pathlink.com> wrote in message news:c7rboo$1kqs$1 digitaldaemon.com...In article <c7r63p$1bv1$1 digitaldaemon.com>, christopher diggins says...bytesThanks for the feedback, Walter. I had only used -O but not -inline and -release which sped up the execution to 703 msec and the memory usage dropped to 2MB on my system. I have now updated the web page as a result. There is still 4 bytes per object that I can't account for. I count 8thefor the vector data, 4 bytes for the pointer to the object, 4 bytes forisvtable in the object, which only brings me to 16 bytes per object, whatdoesthe other 4 bytes for? Is there a way to allocate 100,000 objects at once rather then one by one?If I have a container of objects that implement interface Printable(), howHeron determine which are type Circle and which are type Square, withoutthe useof a vtable?Interface references contain an extra pointer to a dispatch table. So the dynamic dispatching is done when needed. The compiler ignores the interface references when it can deduce the concrete types.If the language cannot determine which type is which, then you shouldremove theword "virtual" from the C++ version, and put "final" in the Java (and D?)I do agree with you about the final keyword and I have done that to the D version, and I am going to do it in the Java version.versions: if the container can only have one type of object, then C++does notneed the virtual keyword, and including it kind of "rigs" the demo, right?You are correct, but Vector2D is used to demonstrate a class that is run-time polymorphic but happens to be used in a non-polymorphic manner. I could easily add to the demo a modification that forces the issue of polymorphism, but I thought it would complicate things too much. Do you recommend that I change the demo to make this more explicit? -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
In article <c7rhdv$1t4h$1 digitaldaemon.com>, christopher diggins says... ..You are correct, but Vector2D is used to demonstrate a class that is run-time polymorphic but happens to be used in a non-polymorphic manner. I could easily add to the demo a modification that forces the issue of polymorphism, but I thought it would complicate things too much. Do you recommend that I change the demo to make this more explicit?Heron's approach of non-inheritance binding is interesting; I take it there is a global dispatch table with pointers to all methods appearing in non-deduceable cases? I think the demo you have is good as far as it goes. But I would be more interested to know what happens in both cases for both languages: i.e. C++ run-time and compile-time and Heron run-time and compile-time. If Heron can outperform in *both* cases, that would be much more impressive. But in (performance oriented code) in a language like C++, virtual-ness is usually used only when compile time binding is difficult to achieve. Kevin
May 11 2004
"Kevin Bealer" <Kevin_member pathlink.com> wrote in message news:c7rkjh$2216$1 digitaldaemon.com...In article <c7rhdv$1t4h$1 digitaldaemon.com>, christopher diggins says... ..IYou are correct, but Vector2D is used to demonstrate a class that is run-time polymorphic but happens to be used in a non-polymorphic manner.there iscould easily add to the demo a modification that forces the issue of polymorphism, but I thought it would complicate things too much. Do you recommend that I change the demo to make this more explicit?Heron's approach of non-inheritance binding is interesting; I take ita global dispatch table with pointers to all methods appearing innon-deduceablecases?Close, there are lots of small tables created.I think the demo you have is good as far as it goes. But I would be more interested to know what happens in both cases for both languages: i.e. C++ run-time and compile-time and Heron run-time and compile-time. If Heroncanoutperform in *both* cases, that would be much more impressive.Good point.But in (performance oriented code) in a language like C++, virtual-ness is usually used only when compile time binding is difficult to achieve.In retrospect, I am starting to be of the opinion that the example may fact rigged as you mentioned. There is no way that I can justify a design that uses ICompare as a class with virtual functions. -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 12 2004
"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7t5qi$19oe$1 digitaldaemon.com...In retrospect, I am starting to be of the opinion that the example mayfactrigged as you mentioned. There is no way that I can justify a design that uses ICompare as a class with virtual functions.I'd also be careful about the GetMagnitude() function. The time spent in the two multiplies may swamp other effects. I've seen many benchmark programs that purport to measure X, when they actually are measuring Y that the benchmark writer thought was innocuous. Compiler vendors have been known to exploit these naive benchmarks by optimizing the heck out of Y (which may be much easier than optimizing X), and letting the benchmark reporter misinterpret it into saying that X was optimized. I've even seen crafty vendors create benchmarks specifically to mislead people into thinking X was being optimized when it was actually Y. (This is why I tend to not bother creating benchmarks for any but my own use, I like to use benchmarks written by people not affiliated with DM to avoid even the appearance of impropriety.) In your benchmark, I'd check the multiplies, I'd also check to see if maybe one compiler does a better job of function inlining or register allocation before concluding that the speed difference is due to virtual dispatch. Running a disassembler over the generated code and being familiar with performance assembler coding is essential. The results can be quite surprising.
May 12 2004
"Walter" <newshound digitalmars.com> wrote in message news:c7u37p$2m0t$1 digitaldaemon.com..."christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7t5qi$19oe$1 digitaldaemon.com...thatIn retrospect, I am starting to be of the opinion that the example mayfactrigged as you mentioned. There is no way that I can justify a designtheuses ICompare as a class with virtual functions.I'd also be careful about the GetMagnitude() function. The time spent intwo multiplies may swamp other effects. I've seen many benchmark programs that purport to measure X, when they actually are measuring Y that the benchmark writer thought was innocuous. Compiler vendors have been knowntoexploit these naive benchmarks by optimizing the heck out of Y (which maybemuch easier than optimizing X), and letting the benchmark reporter misinterpret it into saying that X was optimized. I've even seen crafty vendors create benchmarks specifically to mislead people into thinking X was being optimized when it was actually Y. (Thisiswhy I tend to not bother creating benchmarks for any but my own use, Iliketo use benchmarks written by people not affiliated with DM to avoid eventheappearance of impropriety.) In your benchmark, I'd check the multiplies, I'd also check to see ifmaybeone compiler does a better job of function inlining or register allocation before concluding that the speed difference is due to virtual dispatch. Running a disassembler over the generated code and being familiar with performance assembler coding is essential. The results can be quite surprising.You make some excellent points. I have written a brand new benchmark, which only has increments, decrements, == and function calls. Hopefully this is more accurate. Unfortunately I am too slow and clumsy with assembler to be able to do any disassembly, hopefully someone else will come along and do it. -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 13 2004
christopher diggins wrote:I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.htmlThis is completely tangental, but Microsoft's C++ compiler completes the heapsort in 235ms. (Athlon XP 1600+ with 512MB of RAM, using /O2) -- andy
May 11 2004
Why have you not used DMC++ for the Heron and C++? That way the comparison would be truer "christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7qvo6$1256$1 digitaldaemon.com...I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.html TIA -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
"Matthew" <matthew.hat stlsoft.dot.org> wrote in message news:c7rgk8$1s1l$1 digitaldaemon.com...Why have you not used DMC++ for the Heron and C++? That way the comparisonwouldbe truerIf all that would happen is make the heron and C++ code run slower (or faster) I don't really see the point. I am simply using the best compilers I have at my disposal. I also don't think it would make any difference to the whole point of the comparison which is that Heron implements more efficient object oriented code the other languages which support object oriented techniques. -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.com
May 11 2004
"christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7rl11$22nr$1 digitaldaemon.com..."Matthew" <matthew.hat stlsoft.dot.org> wrote in message news:c7rgk8$1s1l$1 digitaldaemon.com...comparisonWhy have you not used DMC++ for the Heron and C++? That way thewouldIbe truerIf all that would happen is make the heron and C++ code run slower (or faster) I don't really see the point. I am simply using the best compilershave at my disposal. I also don't think it would make any difference tothewhole point of the comparison which is that Heron implements moreefficientobject oriented code the other languages which support object oriented techniques.That depends on if point is really to compare inherent efficiency of the language, not the relative implementation vagaries of different compilers. DMD and DMC++ share a common optimizer and code generator.
May 11 2004
"Walter" <newshound digitalmars.com> wrote in message news:c7rn04$25r5$1 digitaldaemon.com..."christopher diggins" <cdiggins users.sourceforge.net> wrote in message news:c7rl11$22nr$1 digitaldaemon.com...compilers"Matthew" <matthew.hat stlsoft.dot.org> wrote in message news:c7rgk8$1s1l$1 digitaldaemon.com...comparisonWhy have you not used DMC++ for the Heron and C++? That way thewouldbe truerIf all that would happen is make the heron and C++ code run slower (or faster) I don't really see the point. I am simply using the bestII am now convinced and the new example ( http://www.heron-language.com/benchmarks/index.html ) uses only the Digital mars compilers. -- Christopher Diggins http://www.cdiggins.com http://www.heron-language.comhave at my disposal. I also don't think it would make any difference tothewhole point of the comparison which is that Heron implements moreefficientobject oriented code the other languages which support object oriented techniques.That depends on if point is really to compare inherent efficiency of the language, not the relative implementation vagaries of different compilers. DMD and DMC++ share a common optimizer and code generator.
May 13 2004
Impressive speeds on Heron! Id like to see so more benchmarks :). C On Tue, 11 May 2004 12:41:34 -0400, christopher diggins <cdiggins users.sourceforge.net> wrote:I have written an object oriented Heapsort in D at http://www.heron-language.com/benchmarks/heapsort-d.html to compare the performance of object oriented code in several languages. The results were somewhat disappointing, when compared with C++, Delphi, Java, and Heron. I was hoping that there might be some suggestions from members of this group about how I might improve the code I wrote while keeping the overall algorithm and design the same. Please note that the program is a port of the program at http://www.heron-language.com/benchmarks/heapsort-heron.html and is discussed further at http://www.heron-language.com/benchmarks/index.html TIA
May 11 2004