www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AA Performance in Benchmarks

reply "Shammah Chancellor" <s s.com> writes:
So, I was tooling around with one of the benchmarks I saw listed 
on twitter:

https://github.com/kostya/benchmarks/tree/master/brainfuck

This D benchmark spends most of it's time on AA lookups.   
Discussions about the implementation aside, I am curious as to 
why the D AA is so slow even after calling AA.rehash.

I took a look at the Nim tables implementation, and it looks that 
they use power-of-two vectors and use the property of primitive 
roots modulo n to get the next value in the case of a collision:

```
proc nextTry(h, maxHash: THash): THash {.inline.} =
   result = ((5 * h) + 1) and maxHash
```

So, my questions:

1) What are the general purpose performance implications of 
something like this?
2) I saw a thread awhile ago where someone had posted an updated 
implementation of the AA with benchmarks and showed that it was 
substantially faster in all cases.  I can't find the thread again 
-- what happened with this work?

-Shammah
Apr 23 2015
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 4/23/15 10:40 AM, Shammah Chancellor wrote:

 2) I saw a thread awhile ago where someone had posted an updated
 implementation of the AA with benchmarks and showed that it was
 substantially faster in all cases.  I can't find the thread again --
 what happened with this work?
There is a PR being discussed right now. Please take a look and chime in: https://github.com/D-Programming-Language/druntime/pull/1229 It would be interesting to know how the new AA performs in this test. -Steve
Apr 23 2015
next sibling parent reply "Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
On Thursday, 23 April 2015 at 15:29:37 UTC, Steven Schveighoffer 
wrote:
 On 4/23/15 10:40 AM, Shammah Chancellor wrote:

 2) I saw a thread awhile ago where someone had posted an 
 updated
 implementation of the AA with benchmarks and showed that it was
 substantially faster in all cases.  I can't find the thread 
 again --
 what happened with this work?
There is a PR being discussed right now. Please take a look and chime in: https://github.com/D-Programming-Language/druntime/pull/1229 It would be interesting to know how the new AA performs in this test. -Steve
At a slight tangent: is there a way to reserve capacity for an associative array? The obvious way does not seem to work. I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library. (I'm not a C++ guy). Is this entirely spurious, or would it be helpful for D? Laeeth.
Apr 25 2015
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Laeeth Isharc"  wrote in message 
news:hnihyhgtmdzhszkpnftl forum.dlang.org...

 At a slight tangent: is there a way to reserve capacity for an associative 
 array?  The obvious way does not seem to work.
No.
 I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ 
 gives you the ability to tune almost any parameter of the associative 
 array implementation in the standard library.  (I'm not a C++ guy).  Is 
 this entirely spurious, or would it be helpful for D?
Yes, if we had an AA in phobos.
Apr 26 2015
parent Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 04/26/2015 12:58 PM, Daniel Murphy wrote:
 
 Yes, if we had an AA in phobos.
Yes, I think phobos should get a few more optimized containers. SparseSet DenseSet SparseHash DenseHash They should be configurable w.r.t. load factor, equality comparison, and allocation policy (when std.allocator is available). http://sparsehash.googlecode.com/svn/trunk/doc/index.html As the builtin druntime AA will likely always remain a general purpose hash table we can't do all the possible optimizations.
Apr 26 2015
prev sibling parent Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 04/23/2015 05:29 PM, Steven Schveighoffer wrote:
 
 https://github.com/D-Programming-Language/druntime/pull/1229
 
 It would be interesting to know how the new AA performs in this test.
Went from 11s down to 9s, ~20% improvement.
Apr 25 2015
prev sibling parent reply "Chris" <wendlec tcd.ie> writes:
On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor 
wrote:
 So, I was tooling around with one of the benchmarks I saw 
 listed on twitter:

 https://github.com/kostya/benchmarks/tree/master/brainfuck

 This D benchmark spends most of it's time on AA lookups.   
 Discussions about the implementation aside, I am curious as to 
 why the D AA is so slow even after calling AA.rehash.

 I took a look at the Nim tables implementation, and it looks 
 that they use power-of-two vectors and use the property of 
 primitive roots modulo n to get the next value in the case of a 
 collision:

 ```
 proc nextTry(h, maxHash: THash): THash {.inline.} =
   result = ((5 * h) + 1) and maxHash
 ```

 So, my questions:

 1) What are the general purpose performance implications of 
 something like this?
 2) I saw a thread awhile ago where someone had posted an 
 updated implementation of the AA with benchmarks and showed 
 that it was substantially faster in all cases.  I can't find 
 the thread again -- what happened with this work?

 -Shammah
Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
Apr 24 2015
parent reply "Chris" <wendlec tcd.ie> writes:
On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
 On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor 
 wrote:
 So, I was tooling around with one of the benchmarks I saw 
 listed on twitter:

 https://github.com/kostya/benchmarks/tree/master/brainfuck

 This D benchmark spends most of it's time on AA lookups.   
 Discussions about the implementation aside, I am curious as to 
 why the D AA is so slow even after calling AA.rehash.

 I took a look at the Nim tables implementation, and it looks 
 that they use power-of-two vectors and use the property of 
 primitive roots modulo n to get the next value in the case of 
 a collision:

 ```
 proc nextTry(h, maxHash: THash): THash {.inline.} =
  result = ((5 * h) + 1) and maxHash
 ```

 So, my questions:

 1) What are the general purpose performance implications of 
 something like this?
 2) I saw a thread awhile ago where someone had posted an 
 updated implementation of the AA with benchmarks and showed 
 that it was substantially faster in all cases.  I can't find 
 the thread again -- what happened with this work?

 -Shammah
Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand: import std.datetime : benchmark, Duration; import std.string : format; import std.conv : to; import std.stdio : writeln; enum { string[string] words = [ "Data1":"Blah", "Data2":"Blub", ], string[] letters = ["B", "l", "a", "h"] } void main() { words.rehash(); auto result = benchmark!(lookup, make)(100_000); writeln(to!Duration(result[0])); writeln(to!Duration(result[1])); } string lookup() { auto blah = words["Data1"]; return blah; } string make() { string res; foreach (c; letters) res ~= c; return res; } dmd v2.067.0 -release -O -boundscheck=off -inline
Apr 24 2015
parent "Chris" <wendlec tcd.ie> writes:
On Friday, 24 April 2015 at 11:30:14 UTC, Chris wrote:
 On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
 On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor 
 wrote:
 So, I was tooling around with one of the benchmarks I saw 
 listed on twitter:

 https://github.com/kostya/benchmarks/tree/master/brainfuck

 This D benchmark spends most of it's time on AA lookups.   
 Discussions about the implementation aside, I am curious as 
 to why the D AA is so slow even after calling AA.rehash.

 I took a look at the Nim tables implementation, and it looks 
 that they use power-of-two vectors and use the property of 
 primitive roots modulo n to get the next value in the case of 
 a collision:

 ```
 proc nextTry(h, maxHash: THash): THash {.inline.} =
 result = ((5 * h) + 1) and maxHash
 ```

 So, my questions:

 1) What are the general purpose performance implications of 
 something like this?
 2) I saw a thread awhile ago where someone had posted an 
 updated implementation of the AA with benchmarks and showed 
 that it was substantially faster in all cases.  I can't find 
 the thread again -- what happened with this work?

 -Shammah
Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand: import std.datetime : benchmark, Duration; import std.string : format; import std.conv : to; import std.stdio : writeln; enum { string[string] words = [ "Data1":"Blah", "Data2":"Blub", ], string[] letters = ["B", "l", "a", "h"] } void main() { words.rehash(); auto result = benchmark!(lookup, make)(100_000); writeln(to!Duration(result[0])); writeln(to!Duration(result[1])); } string lookup() { auto blah = words["Data1"]; return blah; } string make() { string res; foreach (c; letters) res ~= c; return res; } dmd v2.067.0 -release -O -boundscheck=off -inline
Recte: I meant to say creating the data is slower here, of course. Ah, well, it's Friday!
Apr 24 2015