digitalmars.D - fun project - improving calcHash
- Walter Bright (10/10) Jun 23 2013 https://github.com/D-Programming-Language/dmd/blob/master/src/root/strin...
- Timon Gehr (5/14) Jun 23 2013 It seems to be easy to make it faster without changing the algorithm
- Juan Manuel Cabo (14/26) Jun 23 2013 I performed a quick test, and I don't think that the original function
- Juan Manuel Cabo (5/40) Jun 23 2013 Oh, it might be improved by loading 128bits at a time instead of 32bits....
- Martin Nowak (2/7) Jun 24 2013 This is limited by the memory bandwidth so your testing the wrong thing.
- Martin Nowak (38/42) Jun 23 2013 You'll find a benchmark at the end of the post.
- Timothee Cour (5/17) Jun 23 2013 implementing this "proposal: lazy compilation model for compiling binari...
- Michael (2/2) Jun 23 2013 https://code.google.com/p/xxhash/
- qznc (8/21) Jun 24 2013 My first idea was to look at Python, since it requires a lot of
- Anders Halager (13/38) Jun 24 2013 Python is one of the slower interpreted languages. It would be
- Walter Bright (3/8) Jun 24 2013 I used that method back in the 1980's, it was well known then, but perha...
- Anders Halager (5/17) Jun 24 2013 I can't imagine all the clever (even if outdated) tricks that
- qznc (9/14) Jun 24 2013 To make dmd 1% faster, calcHash must be reduced to 69% run time.
- bearophile (5/7) Jun 24 2013 I think the main problem to solve is to reduce the amount of
- deadalnix (4/9) Jun 24 2013 Yes, my profiling show that were I spend the most time is
- monarch_dodra (3/15) Jun 24 2013 That, and it means dmd has better chances of finishing its
https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.
Jun 23 2013
On 06/23/2013 11:22 PM, Walter Bright wrote:https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! ...It seems to be easy to make it faster without changing the algorithm significantly. There should be only one switch executed, not one per 4 characters. If the length is short enough from the start, there is no point in doing any arithmetic.
Jun 23 2013
On 06/23/2013 06:22 PM, Walter Bright wrote:https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.I performed a quick test, and I don't think that the original function can be improved for speed (though it might be improved for less collisions): https://gist.github.com/jmcabo/5847036 I used words and lines from the complete works of Shakespeare. I tested separating the loop from the switch, as was suggested in Timon Gehr post. It didn't improve the speed when compiled with "g++ -O3", though it improved it a bit without -O3. I also tested removing the multiplication by 37. It didn't improve the speed. With "g++ -O3" they are all the same. So, unless I'm measuring it wrong, the algorithm is as fast as can be (as fast as just adding chars). --jm
Jun 23 2013
On 06/23/2013 09:20 PM, Juan Manuel Cabo wrote:On 06/23/2013 06:22 PM, Walter Bright wrote:Oh, it might be improved by loading 128bits at a time instead of 32bits... but that would benefit strings of more than 16 bytes. Google's CitiHash seems tuned for large strings. --jmhttps://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.I performed a quick test, and I don't think that the original function can be improved for speed (though it might be improved for less collisions): https://gist.github.com/jmcabo/5847036 I used words and lines from the complete works of Shakespeare. I tested separating the loop from the switch, as was suggested in Timon Gehr post. It didn't improve the speed when compiled with "g++ -O3", though it improved it a bit without -O3. I also tested removing the multiplication by 37. It didn't improve the speed. With "g++ -O3" they are all the same. So, unless I'm measuring it wrong, the algorithm is as fast as can be (as fast as just adding chars). --jm
Jun 23 2013
On 06/24/2013 03:43 AM, Juan Manuel Cabo wrote:This is limited by the memory bandwidth so your testing the wrong thing.I used words and lines from the complete works of Shakespeare. I tested separating the loop from the switch, as was suggested in Timon Gehr post. It didn't improve the speed when compiled with "g++ -O3", though it improved it a bit without -O3.
Jun 24 2013
On 06/23/2013 11:22 PM, Walter Bright wrote:Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.You'll find a benchmark at the end of the post. http://forum.dlang.org/thread/op.v2yky9pdsqugbd dawg-freebsd.lan Obviously the switch in a for loop is a branch prediction killer. This should work much better. for (size_t i = 0; i < len / 4; ++i) { hash *= 37; hash += *(const uint32_t *)str; str += 4; } switch (len & 3) { case 0: break; case 1: hash *= 37; hash += *(const uint8_t *)str; break; case 2: hash *= 37; hash += *(const uint16_t *)str; break; case 3: hash *= 37; hash += (*(const uint16_t *)str << 8) + ((const uint8_t *)str)[2]; break; default: assert(0); } return hash; Finding a faster hash algorithm isn't simple because prime multiplication is so simple. At least we're not diving the hash result by 37 any longer. I think using the Hsieh hash is a good choice because it has quite an OK distribution so it leads to fewer collisions. It's also pretty fast for short strings and we already use it in druntime.
Jun 23 2013
On Sun, Jun 23, 2013 at 2:22 PM, Walter Bright <newshound2 digitalmars.com>wrote:https://github.com/D-**Programming-Language/dmd/blob/** https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.implementing this "proposal: lazy compilation model for compiling binaries" http://forum.dlang.org/post/mailman.1357.1371876331.13711.digitalmars-d puremagic.com would have much larger impact on compile time performance.
Jun 23 2013
On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.My first idea was to look at Python, since it requires a lot of hash calculation dynamically and probably is one of the most tuned implementations. Interesting how naive it is: http://hg.python.org/cpython/file/7aab60b70f90/Objects/object.c#l759 Just a simple for loop over chars. No switch. Wikipedia: "Python Software Foundation License (PSFL) is a BSD-style permissive free software license"
Jun 24 2013
On Monday, 24 June 2013 at 11:11:42 UTC, qznc wrote:On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:Python is one of the slower interpreted languages. It would be more interesting to look at luajit which actually does something clever. If the string is at least 4 chars long it only hashes the first 4 bytes, the last 4, the 4 starting at floor(len/2)-2 and the 4 starting at floor(len/4)-1. Any of these may overlap of course but that isn't a problem. The code is MIT licensed and can be found here: http://repo.or.cz/w/luajit-2.0.git/blob/053041a9f47e3d341f98682ea1e4907a578e4920:/src/lj_str.c#l104 As others have mentioned it will be hard to improve the speed enough to get dmd 1% faster - but if anything can it's dirty tricks.https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up. There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster! A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests. Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.My first idea was to look at Python, since it requires a lot of hash calculation dynamically and probably is one of the most tuned implementations. Interesting how naive it is: http://hg.python.org/cpython/file/7aab60b70f90/Objects/object.c#l759 Just a simple for loop over chars. No switch. Wikipedia: "Python Software Foundation License (PSFL) is a BSD-style permissive free software license"
Jun 24 2013
On 6/24/2013 1:08 PM, Anders Halager wrote:Python is one of the slower interpreted languages. It would be more interesting to look at luajit which actually does something clever. If the string is at least 4 chars long it only hashes the first 4 bytes, the last 4, the 4 starting at floor(len/2)-2 and the 4 starting at floor(len/4)-1. Any of these may overlap of course but that isn't a problem.I used that method back in the 1980's, it was well known then, but perhaps has drifted into obscurity. In fact, I still use it for hashing identifiers in DMC++.
Jun 24 2013
On Monday, 24 June 2013 at 20:19:31 UTC, Walter Bright wrote:On 6/24/2013 1:08 PM, Anders Halager wrote:I can't imagine all the clever (even if outdated) tricks that have disappeared with retired old-timers :) I haven't set up anything for testing but if someone wants to try I've made a quick patch here: http://dpaste.com/hold/1268958/Python is one of the slower interpreted languages. It would be more interesting to look at luajit which actually does something clever. If the string is at least 4 chars long it only hashes the first 4 bytes, the last 4, the 4 starting at floor(len/2)-2 and the 4 starting at floor(len/4)-1. Any of these may overlap of course but that isn't a problem.I used that method back in the 1980's, it was well known then, but perhaps has drifted into obscurity. In fact, I still use it for hashing identifiers in DMC++.
Jun 24 2013
On 6/24/13 5:56 PM, Anders Halager wrote:On Monday, 24 June 2013 at 20:19:31 UTC, Walter Bright wrote:This is significantly faster than anything submitted thus far. Compiled alongside Juan Manuel Cabo's submission, the results are as follows: Times hashing words: Unchanged : 1386 ms One switch: 1338 ms Only add : 1354 ms Anders Haliger : 933 ms Times hashing entire lines: Unchanged : 335 ms One switch: 332 ms Only add : 331 ms Anders Haliger : 125 ms Wonder how much faster can it get? -- Andrew Edwards -------------------- http://www.akeron.co auto getAddress() { string location = " ", period = "."; return ("info" ~ location ~ "afidem" ~ period ~ "org"); }On 6/24/2013 1:08 PM, Anders Halager wrote:I can't imagine all the clever (even if outdated) tricks that have disappeared with retired old-timers :) I haven't set up anything for testing but if someone wants to try I've made a quick patch here: http://dpaste.com/hold/1268958/Python is one of the slower interpreted languages. It would be more interesting to look at luajit which actually does something clever. If the string is at least 4 chars long it only hashes the first 4 bytes, the last 4, the 4 starting at floor(len/2)-2 and the 4 starting at floor(len/4)-1. Any of these may overlap of course but that isn't a problem.I used that method back in the 1980's, it was well known then, but perhaps has drifted into obscurity. In fact, I still use it for hashing identifiers in DMC++.
Jun 24 2013
In the case of high memory usage the input string is unlikely to be in cache, so may be it's better to optimize cache misses instead of computation speed.
Jun 26 2013
On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21 Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.To make dmd 1% faster, calcHash must be reduced to 69% run time. Does not look possible. Also, why is it implemented twice -- in src/root/stringtable.c and in src/root/root.c? Improving AAs by changing the data structure seems to be more worthwhile. They use a linked list internally, which is not exactly cache-friendly. From your gprof run the potential is mostly 5.19% _aaGetRvalue + 1.30% _aaGet.
Jun 24 2013
qznc:To make dmd 1% faster, calcHash must be reduced to 69% run time. Does not look possible.I think the main problem to solve is to reduce the amount of memory used during the compilation. Bye, bearophile
Jun 24 2013
On Monday, 24 June 2013 at 14:36:03 UTC, bearophile wrote:qznc:Yes, my profiling show that were I spend the most time is swapping. This have the effect of making every single part of DMD slower, and have potential for a 10x speedup (easy !).To make dmd 1% faster, calcHash must be reduced to 69% run time. Does not look possible.I think the main problem to solve is to reduce the amount of memory used during the compilation.
Jun 24 2013
On Monday, 24 June 2013 at 15:29:55 UTC, deadalnix wrote:On Monday, 24 June 2013 at 14:36:03 UTC, bearophile wrote:That, and it means dmd has better chances of finishing its compile before crashing (*cough*algorithm.d -unittest*cough*)qznc:Yes, my profiling show that were I spend the most time is swapping. This have the effect of making every single part of DMD slower, and have potential for a 10x speedup (easy !).To make dmd 1% faster, calcHash must be reduced to 69% run time. Does not look possible.I think the main problem to solve is to reduce the amount of memory used during the compilation.
Jun 24 2013