digitalmars.D.announce - MurmurHash3
- Guillaume Chatelet (7/7) Dec 10 2015 Here is an implementation of MurmurHash [1] for D.
- Brad Anderson (4/11) Dec 10 2015 Seems like it'd be good to have it ready and in place as the
- Ilya (8/15) Dec 10 2015 Great!
- Ilya (11/18) Dec 10 2015 http://dpaste.dzfl.pl/1b94ed0aa96e#line-222 - seed is uint, can
- Guillaume Chatelet (9/18) Dec 11 2015 Done
- Ilya (10/29) Dec 11 2015 Current version is suitable for arrays but not ranges or types.
- Guillaume Chatelet (6/14) Dec 12 2015 I created https://github.com/gchatelet/murmurhash3_d and updated
- Marc =?UTF-8?B?U2Now7x0eg==?= (14/31) Dec 13 2015 AFAICS this doesn't conform to the digest interface. For example,
- Guillaume Chatelet (8/19) Dec 13 2015 The structs themselves do not but the alias at the beginning of
- Guillaume Chatelet (3/13) Dec 13 2015 Fixed in last commit. Thx for the heads up Marc.
- Guillaume Chatelet (7/7) Dec 19 2015 The last version of the code is available here and is feature
- Marc =?UTF-8?B?U2Now7x0eg==?= (11/18) Dec 20 2015 I was the one who introduced it, and I chose bits instead of
Here is an implementation of MurmurHash [1] for D. http://dpaste.dzfl.pl/1b94ed0aa96e I'll do a proper pull request later on for addition to std.digest if the community feels like it's a valuable addition. Guillaume -- 1 - https://en.wikipedia.org/wiki/MurmurHash
Dec 10 2015
On Thursday, 10 December 2015 at 22:25:21 UTC, Guillaume Chatelet wrote:Here is an implementation of MurmurHash [1] for D. http://dpaste.dzfl.pl/1b94ed0aa96e I'll do a proper pull request later on for addition to std.digest if the community feels like it's a valuable addition. Guillaume -- 1 - https://en.wikipedia.org/wiki/MurmurHashSeems like it'd be good to have it ready and in place as the upcoming containers work starts materializing.
Dec 10 2015
On Thursday, 10 December 2015 at 22:25:21 UTC, Guillaume Chatelet wrote:Here is an implementation of MurmurHash [1] for D. http://dpaste.dzfl.pl/1b94ed0aa96e I'll do a proper pull request later on for addition to std.digest if the community feels like it's a valuable addition. Guillaume -- 1 - https://en.wikipedia.org/wiki/MurmurHashGreat! Could you please add an optimized interface to compute hashes for `uint` , `ulong` and `ulong[2]`? It would very good both for upcoming containers and multidimensional slices https://github.com/D-Programming-Language/phobos/pull/3397 .
Dec 10 2015
On Thursday, 10 December 2015 at 22:25:21 UTC, Guillaume Chatelet wrote:Here is an implementation of MurmurHash [1] for D. http://dpaste.dzfl.pl/1b94ed0aa96e I'll do a proper pull request later on for addition to std.digest if the community feels like it's a valuable addition. Guillaume -- 1 - https://en.wikipedia.org/wiki/MurmurHashhttp://dpaste.dzfl.pl/1b94ed0aa96e#line-222 - seed is uint, can it be ulong? Mutmur hash has three stages: 1. Computation of hash for blocks (32bit or 128bit) 2. Compitation of hash for tail (remainder) 3. Finalization. I will be very happy, if step 1 will be represented as an output range. Then it can be used directly like reduce aggregator for ranges and multidimensional slices.
Dec 10 2015
On Friday, 11 December 2015 at 01:51:09 UTC, Ilya wrote:http://dpaste.dzfl.pl/1b94ed0aa96e#line-222 - seed is uint, can it be ulong?DoneMutmur hash has three stages: 1. Computation of hash for blocks (32bit or 128bit) 2. Compitation of hash for tail (remainder) 3. Finalization. I will be very happy, if step 1 will be represented as an output range. Then it can be used directly like reduce aggregator for ranges and multidimensional slices.Done Not thoroughly tested but updated for range and taking an ulong seed for MurmurHash3_x64_128: http://dpaste.dzfl.pl/1b94ed0aa96e Not sure I got what you meant about the optimized version. For the return value ? I haven't done any benchmarking yet.
Dec 11 2015
On Friday, 11 December 2015 at 22:43:00 UTC, Guillaume Chatelet wrote:On Friday, 11 December 2015 at 01:51:09 UTC, Ilya wrote:Current version is suitable for arrays but not ranges or types. Few examples: 1. Compute hash of ulong. 2. Compute hash of all elements in matrix column (element are in different arrays). I have created output range API draft http://dpaste.dzfl.pl/a24050042758 Ilyahttp://dpaste.dzfl.pl/1b94ed0aa96e#line-222 - seed is uint, can it be ulong?DoneMutmur hash has three stages: 1. Computation of hash for blocks (32bit or 128bit) 2. Compitation of hash for tail (remainder) 3. Finalization. I will be very happy, if step 1 will be represented as an output range. Then it can be used directly like reduce aggregator for ranges and multidimensional slices.Done Not thoroughly tested but updated for range and taking an ulong seed for MurmurHash3_x64_128: http://dpaste.dzfl.pl/1b94ed0aa96e Not sure I got what you meant about the optimized version. For the return value ? I haven't done any benchmarking yet.
Dec 11 2015
On Saturday, 12 December 2015 at 02:59:21 UTC, Ilya wrote:Current version is suitable for arrays but not ranges or types. Few examples: 1. Compute hash of ulong. 2. Compute hash of all elements in matrix column (element are in different arrays). I have created output range API draft http://dpaste.dzfl.pl/a24050042758 IlyaI created https://github.com/gchatelet/murmurhash3_d and updated the code a bit. It conforms to the digest template interface, allows pushing ulong[2] and accept ranges. PR welcome :)
Dec 12 2015
On Saturday, 12 December 2015 at 20:12:49 UTC, Guillaume Chatelet wrote:On Saturday, 12 December 2015 at 02:59:21 UTC, Ilya wrote:AFAICS this doesn't conform to the digest interface. For example, there should be a `finish` method that returns the hash as a static array (see the ExampleDigest [1]). More importantly, I believe your `put()` implementation only works if it is fed the entire data at once. I haven't tested it, but I believe that the following two calls will have a different result, while they should result in the same hash: hash.put([1,2,3,4,5,6]); vs hash.put([1,2,3]); hash.put([4,5,6]);Current version is suitable for arrays but not ranges or types. Few examples: 1. Compute hash of ulong. 2. Compute hash of all elements in matrix column (element are in different arrays). I have created output range API draft http://dpaste.dzfl.pl/a24050042758 IlyaI created https://github.com/gchatelet/murmurhash3_d and updated the code a bit. It conforms to the digest template interface, allows pushing ulong[2] and accept ranges. PR welcome :)
Dec 13 2015
On Sunday, 13 December 2015 at 12:44:06 UTC, Marc Schütz wrote:AFAICS this doesn't conform to the digest interface. For example, there should be a `finish` method that returns the hash as a static array (see the ExampleDigest [1]).The structs themselves do not but the alias at the beginning of the file make sure they do. alias MurmurHash3_x86_32 = Digester!SMurmurHash3_x86_32; alias MurmurHash3_x86_128 = Digester!SMurmurHash3_x86_128; alias MurmurHash3_x64_128 = Digester!SMurmurHash3_x64_128;More importantly, I believe your `put()` implementation only works if it is fed the entire data at once. I haven't tested it, but I believe that the following two calls will have a different result, while they should result in the same hash: hash.put([1,2,3,4,5,6]); vs hash.put([1,2,3]); hash.put([4,5,6]);I suspect this as well although I haven't tested. I'll add more tests and add the missing logic if needed.
Dec 13 2015
On Sunday, 13 December 2015 at 16:24:35 UTC, Guillaume Chatelet wrote:On Sunday, 13 December 2015 at 12:44:06 UTC, Marc Schütz wrote:Fixed in last commit. Thx for the heads up Marc.[...]The structs themselves do not but the alias at the beginning of the file make sure they do. alias MurmurHash3_x86_32 = Digester!SMurmurHash3_x86_32; alias MurmurHash3_x86_128 = Digester!SMurmurHash3_x86_128; alias MurmurHash3_x64_128 = Digester!SMurmurHash3_x64_128;[...]I suspect this as well although I haven't tested. I'll add more tests and add the missing logic if needed.
Dec 13 2015
The last version of the code is available here and is feature complete AFAICT https://github.com/gchatelet/murmurhash3_d/blob/master/murmurhash3.d Last concern, I declared blockSize in bytes where std.digest.digest says it should be in bits. Why does it need to be bits ? It looks like HMAC (which needs it) is explicitly making sure it's always a multiple of 8 bits.
Dec 19 2015
On Saturday, 19 December 2015 at 22:15:14 UTC, Guillaume Chatelet wrote:The last version of the code is available here and is feature complete AFAICT https://github.com/gchatelet/murmurhash3_d/blob/master/murmurhash3.d Last concern, I declared blockSize in bytes where std.digest.digest says it should be in bits. Why does it need to be bits ? It looks like HMAC (which needs it) is explicitly making sure it's always a multiple of 8 bits.I was the one who introduced it, and I chose bits instead of bytes because I didn't want to exclude the possibility that there are hashing algos that have a block size not divisible by 8. The algorithms are usually described in terms of bits, not bytes, so it's not unconceivable that such hash functions exist, though I don't know any. (Of course, HMAC wouldn't work with those, but that doesn't mean that other algorithms couldn't.) I'd suggest you change `blockSize` to the number of bits, and introduce an enum `blockSizeInBytes` for internal use.
Dec 20 2015