digitalmars.D - Strings hash value memoized
- bearophile (5/5) Jul 12 2008 I presume AAs are used often enough in D. If D 2.x strings are immutable...
- Jarrett Billingsley (4/21) Jul 12 2008 Or, you store the precomputed hash in the structure (i.e. AA) that uses
- bearophile (4/6) Jul 12 2008 Having the hash value in the string has some advantages, if have two str...
- Koroskin Denis (5/24) Jul 12 2008 Why distiguish between strings and other data type? Just wrap any
- JAnderson (11/18) Jul 12 2008 Sounds like a good case for a standard lib function. Potentially the
I presume AAs are used often enough in D. If D 2.x strings are immutable, then their hash function doesn't change, so it can be useful to store the hash value inside the data structure of the string (Python works like this), by default it can be initialized to a value like -1, so it's computed only the first time it's actually needed, avoiding wasting time to compute the hash value if it's not used. Having the hash value inside strings and other immutable data structures allows quite faster AAs (and sets too, even if they are inside an external library). Currently strings aren't special structures, they are just immutable dynamical arrays of chars, so the has value (32 or 64 bit according to the word size of the CPU, I presume) can't be stored in the dynamic array record, so it may be stored in the -4/-8 bytes of the data block, but this can't work for string slices. A possible solution is to create a special string type that's just like the dynamic array struct, plus the hash value, so this struct becomes bigger (12 instead of 8 bytes on 32 bit CPUs), slowing down some operations a little. Bye, bearophile
Jul 12 2008
"bearophile" <bearophileHUGS lycos.com> wrote in message news:g5a8h5$gmk$1 digitalmars.com...I presume AAs are used often enough in D. If D 2.x strings are immutable, then their hash function doesn't change, so it can be useful to store the hash value inside the data structure of the string (Python works like this), by default it can be initialized to a value like -1, so it's computed only the first time it's actually needed, avoiding wasting time to compute the hash value if it's not used. Having the hash value inside strings and other immutable data structures allows quite faster AAs (and sets too, even if they are inside an external library). Currently strings aren't special structures, they are just immutable dynamical arrays of chars, so the has value (32 or 64 bit according to the word size of the CPU, I presume) can't be stored in the dynamic array record, so it may be stored in the -4/-8 bytes of the data block, but this can't work for string slices. A possible solution is to create a special string type that's just like the dynamic array struct, plus the hash value, so this struct becomes bigger (12 instead of 8 bytes on 32 bit CPUs), slowing down some operations a little.Or, you store the precomputed hash in the structure (i.e. AA) that uses them. Which is exactly what the default AA implementation does, actually.
Jul 12 2008
Jarrett Billingsley:Or, you store the precomputed hash in the structure (i.e. AA) that uses them. Which is exactly what the default AA implementation does, actually.Having the hash value in the string has some advantages, if have two strings and both of them have their hash value is already computed, you have a way to test quickly if they are different. You can compute intersection and union between two string sets quickly. If the same string is in more AAs/sets you compute and store its hash value only once for its life. Bye, bearophile
Jul 12 2008
On Sat, 12 Jul 2008 16:36:53 +0400, bearophile <bearophileHUGS lycos.com> wrote:I presume AAs are used often enough in D. If D 2.x strings are immutable, then their hash function doesn't change, so it can be useful to store the hash value inside the data structure of the string (Python works like this), by default it can be initialized to a value like -1, so it's computed only the first time it's actually needed, avoiding wasting time to compute the hash value if it's not used. Having the hash value inside strings and other immutable data structures allows quite faster AAs (and sets too, even if they are inside an external library). Currently strings aren't special structures, they are just immutable dynamical arrays of chars, so the has value (32 or 64 bit according to the word size of the CPU, I presume) can't be stored in the dynamic array record, so it may be stored in the -4/-8 bytes of the data block, but this can't work for string slices. A possible solution is to create a special string type that's just like the dynamic array struct, plus the hash value, so this struct becomes bigger (12 instead of 8 bytes on 32 bit CPUs), slowing down some operations a little. Bye, bearophileWhy distiguish between strings and other data type? Just wrap any invariant data type into a special structure that lazily computes and stores hash, and use it in a AA instead.
Jul 12 2008
bearophile wrote:I presume AAs are used often enough in D. If D 2.x strings are immutable, then their hash function doesn't change, so it can be useful to store the hash value inside the data structure of the string (Python works like this), by default it can be initialized to a value like -1, so it's computed only the first time it's actually needed, avoiding wasting time to compute the hash value if it's not used. Having the hash value inside strings and other immutable data structures allows quite faster AAs (and sets too, even if they are inside an external library). Currently strings aren't special structures, they are just immutable dynamical arrays of chars, so the has value (32 or 64 bit according to the word size of the CPU, I presume) can't be stored in the dynamic array record, so it may be stored in the -4/-8 bytes of the data block, but this can't work for string slices. A possible solution is to create a special string type that's just like the dynamic array struct, plus the hash value, so this struct becomes bigger (12 instead of 8 bytes on 32 bit CPUs), slowing down some operations a little. Bye, bearophileSounds like a good case for a standard lib function. Potentially the hashing part could be a generic class for any type with a particular spealization for strings. It could also be lazily computed (ie compute it the first time its needed then cache it). Maybe something like: alias hashed(string) hashed_string; alias hashed(int[]) hashed_array; I guess one downside is that compile time string hashes may have to be figured out at run-time with that approach. -Joel
Jul 12 2008