www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - is there any reason to use SuperFastHash in druntime? (except

reply ketmar <ketmar ketmar.no-ip.org> writes:
SuperFastHash has known distribution problems[1]. there is fasthash[2]=20
function, which is comparable to murmur, but requires one multiplication=20
per 8 bytes (and it yields 64-bit hash values) and using ulongs. i think=20
that there is some sense in trading some speed for better distribution.=20
what do you think?

p.s. i made a stupid tester which using "/usr/share/dict/words" to test=20
hashing functions. it hashes each word as all-lower, all-upper and all-
lower with first char as upper. the results are:

fastHash64 ... perfect for 115857 words
fastHash32 ... perfect for 115857 words
sfHash ... 6 collisions in total

that's not that bad, but fasthash is definitely better. ;-)

and i believe that people that care about performance will roll their own=20
hash functions anyway, so having not lightning fast, but good hash in=20
druntime is better than having fast, but worse function.


[1] google://superfasthash+problems
[2] http://code.google.com/p/fast-hash/=
May 27 2015
next sibling parent reply "Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 SuperFastHash has known distribution problems[1]. there is 
 fasthash[2]
Is there an easy to tweak the default hashing of key type, say `K`, when using a builtin D AA, say `V[K]`?
May 28 2015
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 28 May 2015 12:02:44 +0000, Per Nordl=C3=B6w wrote:

 On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 SuperFastHash has known distribution problems[1]. there is fasthash[2]
=20 Is there an easy to tweak the default hashing of key type, say `K`, when using a builtin D AA, say `V[K]`?
nope.=
May 28 2015
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 SuperFastHash has known distribution problems[1].
BTW, something I noticed while browsing Rust docs today:
 The hashes are all keyed by the thread-local random number 
 generator on creation by default. This means that the ordering 
 of the keys is randomized, but makes the tables more resistant 
 to denial-of-service attacks (Hash DoS).
http://doc.rust-lang.org/std/collections/struct.HashMap.html As I understand, D is certainly vulnerable to Hash DoS.
May 28 2015
next sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 28 May 2015 at 12:40:07 UTC, Vladimir Panteleev 
wrote:
 As I understand, D is certainly vulnerable to Hash DoS.
https://issues.dlang.org/show_bug.cgi?id=14414
May 28 2015
prev sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 28 May 2015 12:39:52 +0000, Vladimir Panteleev wrote:

 On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 SuperFastHash has known distribution problems[1].
=20 BTW, something I noticed while browsing Rust docs today: =20
 The hashes are all keyed by the thread-local random number generator on
 creation by default. This means that the ordering of the keys is
 randomized, but makes the tables more resistant to denial-of-service
 attacks (Hash DoS).
=20 http://doc.rust-lang.org/std/collections/struct.HashMap.html =20 As I understand, D is certainly vulnerable to Hash DoS.
yes. and the interesting thing is that `toHash()` accepts seed, but it's=20 not used in other code. looks to me like low-hanging fruit.=
May 28 2015
prev sibling parent reply "Martin Nowak" <code dawg.eu> writes:
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 i think
 that there is some sense in trading some speed for better 
 distribution.
 what do you think?
We discussed most part of this already. http://forum.dlang.org/post/mff4id$hj8$1 digitalmars.com And we're already using MurmurHash3 for the new CTFEable hash function. https://github.com/D-Programming-Language/druntime/blob/master/src/core/internal/hash.d Though I don't get any feedback for the library AA. http://forum.dlang.org/post/mjsma6$196h$1 digitalmars.com
 and i believe that people that care about performance will roll 
 their own
 hash functions anyway
Anyone will use the built-in hash, performance is of utmost importance.
 [2] http://code.google.com/p/fast-hash/
It's called fast hash, but how fast is it?
May 29 2015
next sibling parent reply =?utf-8?Q?Daniel_Koz=C3=A1k?= via Digitalmars-d writes:
A little bit faster than murnurhash3 in my scenerio. But I prefer farmhash.=
 http://code.google.com/p/farmhash/

----- P=C5=AFvodn=C3=AD zpr=C3=A1va -----
Od:"Martin Nowak via Digitalmars-d" <digitalmars-d puremagic.com>
Odesl=C3=A1no:=E2=80=8E29. =E2=80=8E5. =E2=80=8E2015 11:41
Komu:"digitalmars-d puremagic.com" <digitalmars-d puremagic.com>
P=C5=99edm=C4=9Bt:Re: is there any reason to use SuperFastHash in druntime?=
 (exceptspeed)

On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 i think
 that there is some sense in trading some speed for better=20
 distribution.
 what do you think?
We discussed most part of this already. http://forum.dlang.org/post/mff4id$hj8$1 digitalmars.com And we're already using MurmurHash3 for the new CTFEable hash=20 function. https://github.com/D-Programming-Language/druntime/blob/master/src/core/int= ernal/hash.d Though I don't get any feedback for the library AA. http://forum.dlang.org/post/mjsma6$196h$1 digitalmars.com
 and i believe that people that care about performance will roll=20
 their own
 hash functions anyway
Anyone will use the built-in hash, performance is of utmost=20 importance.
 [2] http://code.google.com/p/fast-hash/
It's called fast hash, but how fast is it?
May 29 2015
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Fri, 29 May 2015 11:52:58 +0200, Daniel Koz=C3=A1k via Digitalmars-d wro=
te:

 A little bit faster than murnurhash3 in my scenerio. But I prefer
 farmhash. http://code.google.com/p/farmhash/
thank you, i didn't hit that one before.=
May 29 2015
prev sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Fri, 29 May 2015 09:36:22 +0000, Martin Nowak wrote:

 On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
 i think that there is some sense in trading some speed for better
 distribution.
 what do you think?
We discussed most part of this already. http://forum.dlang.org/post/mff4id$hj8$1 digitalmars.com
i was more interested in reasons behind choosing SFH, yet worded my=20 question poorly. sorry.
 Though I don't get any feedback for the library AA.
 http://forum.dlang.org/post/mjsma6$196h$1 digitalmars.com
me has nothing valuable to say. ;-) sadly, i can see things done wrong=20 only after they were done, not in the design stage.
 and i believe that people that care about performance will roll their
 own hash functions anyway
Anyone will use the built-in hash, performance is of utmost importance.
i believe that it should be "acceptable", not "fastest possible". i like=20 to trade some speed for better output, for example. what is the reason of=20 having superfast, but not good hash function? code will spend alot of=20 time resolving collisions then. ;-)
 [2] http://code.google.com/p/fast-hash/
It's called fast hash, but how fast is it?
i didn't do any serious measurement, but for me it's slightly faster than=20 Murmur3 (which is seen even in code, as it uses one multiplication where=20 Murmur3 is using two), and the quality of output is comparable.=
May 29 2015