www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - MurmurHash3 behaviour

reply Cauterite <cauterite gmail.com> writes:
Regarding the MurmurHash3 implementation in core.internal.hash, 
it is my understanding that:
     // assuming a and b are uints
     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
Is this correct?
I'm just not quite certain of this property when I try to read 
the code myself, and I don't know much about hash algorithms.
Aug 19 2016
next sibling parent reply Seb <seb wilzba.ch> writes:
On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
 Regarding the MurmurHash3 implementation in core.internal.hash, 
 it is my understanding that:
     // assuming a and b are uints
     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
 Is this correct?
 I'm just not quite certain of this property when I try to read 
 the code myself, and I don't know much about hash algorithms.
Phobos nightly contains MurmurHash3 in std.digest (I don't know why there are two implementations though), but I think this one will be easier to use ;-) http://dlang.org/phobos-prerelease/std_digest_murmurhash.html The API for bytesHash is size_t bytesHash(const(void)* buf, size_t len, size_t seed) so shouldn't be? auto arr = [a,b]; bytesHash(&arr, arr.length, 42);
 I'm just not quite certain of this property when I try to read
AFAIK the output of a hash isn't generally equal to the its upcoming seed as pre & postpreprocessing may apply. In the case of Murmurhash3 its post-processing (h1 ^= len): void main() { import std.stdio; import core.internal.hash; uint a = 4, b = 23; auto arr = [a, b]; auto b1 = bytesHash(&arr, arr.length, 42); auto l = arr[0..1], r = arr[1..$]; auto b2a = bytesHash(&l, 1, 42); auto b2b= bytesHash(&r, 1, b2a); assert(b1 == b2b); } However with std.digest you can "save" the current state of a hash function and continue to give new items to the hasher: void main() { import std.digest.murmurhash; // required Phobos nightly uint a = 4, b = 23; auto arr = [a, b]; auto l = arr[0..1], r = arr[1..$]; MurmurHash3!32 hasher; hasher.put(cast(ubyte[]) l); hasher.put(cast(ubyte[]) r); auto v1 = hasher.finish(); hasher = MurmurHash3!32(); hasher.put(cast(ubyte[]) arr); auto v2 = hasher.finish(); assert(v1 == v2); }
Aug 19 2016
parent reply Cauterite <cauterite gmail.com> writes:
On Friday, 19 August 2016 at 21:03:22 UTC, Seb wrote:
 http://dlang.org/phobos-prerelease/std_digest_murmurhash.html
Ah great, I just finished writing my own murmurhash digest module ( https://github.com/Cauterite/phobos/blob/murmur/std/digest/murmur.d ), and now I discover someone's already done it. Glad I wasted the last 3 hours of my life ;_;
Aug 19 2016
parent Seb <seb wilzba.ch> writes:
On Friday, 19 August 2016 at 21:29:43 UTC, Cauterite wrote:
 On Friday, 19 August 2016 at 21:03:22 UTC, Seb wrote:
 http://dlang.org/phobos-prerelease/std_digest_murmurhash.html
Ah great, I just finished writing my own murmurhash digest module ( https://github.com/Cauterite/phobos/blob/murmur/std/digest/murmur.d ), and now I discover someone's already done it. Glad I wasted the last 3 hours of my life ;_;
Well, we all learn for life? For example you could start to keep your fork up to date with upstream. It was merged two months ago ;-)
Aug 20 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
 Regarding the MurmurHash3 implementation in core.internal.hash, 
 it is my understanding that:
     // assuming a and b are uints
     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
 Is this correct?
 I'm just not quite certain of this property when I try to read 
 the code myself, and I don't know much about hash algorithms.
DRuntime has Murmurhash2, not 3.
Aug 20 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 20 August 2016 at 09:15:00 UTC, Ilya Yaroshenko 
wrote:
 On Friday, 19 August 2016 at 20:28:13 UTC, Cauterite wrote:
 Regarding the MurmurHash3 implementation in 
 core.internal.hash, it is my understanding that:
     // assuming a and b are uints
     bytesHash([a, b], 0) == bytesHash([b], bytesHash([a], 0))
 Is this correct?
 I'm just not quite certain of this property when I try to read 
 the code myself, and I don't know much about hash algorithms.
DRuntime has Murmurhash2, not 3.
Oh, maybe I am wrong. Anyway 128bit Murmurhash3 hash from std.digest would much faster on 64 bit CPUs.
Aug 20 2016