www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - string hash significant speedup

reply Johnson Jones <JJ Dynomite.com> writes:
when using T[string], hashing is used. Computing the hash is 
slow(relatively speaking).

Does D cache the hashes? Strings are immutable so there is 
absolutely no reason why the hash ever need to be computed more 
than once.

Essentially the hash should be attached to strings like their 
length and address.

Does D compute the hash of a string every time it is looked up? 
If so, any way to optimize that it?
Aug 10 2017
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/10/17 3:36 PM, Johnson Jones wrote:
 when using T[string], hashing is used. Computing the hash is 
 slow(relatively speaking).
 
 Does D cache the hashes? Strings are immutable so there is absolutely no 
 reason why the hash ever need to be computed more than once.
It computes them on insertion, and caches the result in the structure of the hash table. -Steve
Aug 10 2017
next sibling parent reply Johnson Jones <JJ Dynomite.com> writes:
On Thursday, 10 August 2017 at 20:07:35 UTC, Steven Schveighoffer 
wrote:
 On 8/10/17 3:36 PM, Johnson Jones wrote:
 when using T[string], hashing is used. Computing the hash is 
 slow(relatively speaking).
 
 Does D cache the hashes? Strings are immutable so there is 
 absolutely no reason why the hash ever need to be computed 
 more than once.
It computes them on insertion, and caches the result in the structure of the hash table. -S
Thanks. What is the cache size?
Aug 10 2017
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/10/17 5:10 PM, Johnson Jones wrote:
 On Thursday, 10 August 2017 at 20:07:35 UTC, Steven Schveighoffer wrote:
 On 8/10/17 3:36 PM, Johnson Jones wrote:
 when using T[string], hashing is used. Computing the hash is 
 slow(relatively speaking).

 Does D cache the hashes? Strings are immutable so there is absolutely 
 no reason why the hash ever need to be computed more than once.
It computes them on insertion, and caches the result in the structure of the hash table.
Thanks. What is the cache size?
size_t.sizeof https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L173 -Steve
Aug 14 2017
prev sibling parent reply HyperParrow <dlas nowhere.se> writes:
On Thursday, 10 August 2017 at 20:07:35 UTC, Steven Schveighoffer 
wrote:
 On 8/10/17 3:36 PM, Johnson Jones wrote:
 when using T[string], hashing is used. Computing the hash is 
 slow(relatively speaking).
 
 Does D cache the hashes? Strings are immutable so there is 
 absolutely no reason why the hash ever need to be computed 
 more than once.
It computes them on insertion, and caches the result in the structure of the hash table. -Steve
But i think he speaks about the strings that are tested for inclusion (i.e opIn RHS), not those who are inserted, for which obviously the hash is known.
Aug 10 2017
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/10/17 6:30 PM, HyperParrow wrote:
 On Thursday, 10 August 2017 at 20:07:35 UTC, Steven Schveighoffer wrote:
 On 8/10/17 3:36 PM, Johnson Jones wrote:
 when using T[string], hashing is used. Computing the hash is 
 slow(relatively speaking).

 Does D cache the hashes? Strings are immutable so there is absolutely 
 no reason why the hash ever need to be computed more than once.
It computes them on insertion, and caches the result in the structure of the hash table.
But i think he speaks about the strings that are tested for inclusion (i.e opIn RHS), not those who are inserted, for which obviously the hash is known.
If this is the case, then no, there is no cache. It would have to be a cache of the hash per pointer/length combo. I don't know that it's worth the effort, certainly not for built-in AA. -Steve
Aug 14 2017