www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Surprised by hashes of arrays of arrays

reply Jens Mueller <jens.k.mueller gmx.de> writes:
Dear list,

I stumbled over odd behavior which took quite some time of debugging.
Sharing my results may help find a solution or just make others aware
and reduce their debugging time.

To illustrate consider the code:

auto array1 = [ [1, 2], [3, 4] ];
auto array2 = array1.dup;
array2[0] = array2[0].dup;

Does either hash(&array1) == hash(&array2) hold or hash(&array1) !=
hash(&array2) where hash is defined as
auto hash = &(typeid(array1).getHash);
?

So I may be totally off here (please tell me), but it turns out that
that both arrays have different hashes due to the way hashing is
implemented. This, e.g., implies that equal arrays may be hashed to
different values. Hence when you are using them as keys in an
associative array results may be surprising.
Note that I can make the problem more difficult to spot by using a
struct that uses pointers or an array internally which is hidden by
encapsulation.

I find the behavior non-obvious but maybe there is reason. The current
implementation considers only the direct contents of the struct's memory
(the memory starting at array.ptr to array.length * size(array[0]) in
case of dynamic arrays) and does *not* follow indirections (e.g. via
pointers).

This implies that a hash can be computed without considering all memory
occupied by a value. In my current mental model of hashing I assumed
that all bytes should be read by default.

As you may conclude from this post I find this behavior odd. I expect a
hash implementation to follow indirections by calling the hashing
functions recursively. It looks to me as a case where the default is
badly designed/implemented.

Jens
Feb 27 2014
next sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Thursday, 27 February 2014 at 10:12:46 UTC, Jens Mueller wrote:
 As you may conclude from this post I find this behavior odd. I  
 expect a
 hash implementation to follow indirections by calling the  
 hashing
 functions recursively.
How shall it work with cycles in object graphs?
Feb 27 2014
prev sibling parent reply Shammah Chancellor <anonymous coward.com> writes:
On 2014-02-27 10:11:53 +0000, Jens Mueller said:

 Dear list,
 
 I stumbled over odd behavior which took quite some time of debugging.
 Sharing my results may help find a solution or just make others aware
 and reduce their debugging time.
 
 To illustrate consider the code:
 
 auto array1 = [ [1, 2], [3, 4] ];
 auto array2 = array1.dup;
 array2[0] = array2[0].dup;
 
 Does either hash(&array1) == hash(&array2) hold or hash(&array1) !=
 hash(&array2) where hash is defined as
 auto hash = &(typeid(array1).getHash);
 ?
 
 So I may be totally off here (please tell me), but it turns out that
 that both arrays have different hashes due to the way hashing is
 implemented. This, e.g., implies that equal arrays may be hashed to
 different values. Hence when you are using them as keys in an
 associative array results may be surprising.
 Note that I can make the problem more difficult to spot by using a
 struct that uses pointers or an array internally which is hidden by
 encapsulation.
 
 I find the behavior non-obvious but maybe there is reason. The current
 implementation considers only the direct contents of the struct's memory
 (the memory starting at array.ptr to array.length * size(array[0]) in
 case of dynamic arrays) and does *not* follow indirections (e.g. via
 pointers).
 
 This implies that a hash can be computed without considering all memory
 occupied by a value. In my current mental model of hashing I assumed
 that all bytes should be read by default.
 
 As you may conclude from this post I find this behavior odd. I expect a
 hash implementation to follow indirections by calling the hashing
 functions recursively. It looks to me as a case where the default is
 badly designed/implemented.
 
 Jens
You should get the same answer if in both cases they were static arrays, but I believe array1.dup returns a slice to a heap allocated array instead of the statically-sized array you originally had. -S.
Feb 27 2014
parent Jens Mueller <jens.k.mueller gmx.de> writes:
Shammah Chancellor wrote:
 On 2014-02-27 10:11:53 +0000, Jens Mueller said:
 
Dear list,

I stumbled over odd behavior which took quite some time of debugging.
Sharing my results may help find a solution or just make others aware
and reduce their debugging time.

To illustrate consider the code:

auto array1 = [ [1, 2], [3, 4] ];
auto array2 = array1.dup;
array2[0] = array2[0].dup;

Does either hash(&array1) == hash(&array2) hold or hash(&array1) !=
hash(&array2) where hash is defined as
auto hash = &(typeid(array1).getHash);
?

So I may be totally off here (please tell me), but it turns out that
that both arrays have different hashes due to the way hashing is
implemented. This, e.g., implies that equal arrays may be hashed to
different values. Hence when you are using them as keys in an
associative array results may be surprising.
Note that I can make the problem more difficult to spot by using a
struct that uses pointers or an array internally which is hidden by
encapsulation.

I find the behavior non-obvious but maybe there is reason. The current
implementation considers only the direct contents of the struct's memory
(the memory starting at array.ptr to array.length * size(array[0]) in
case of dynamic arrays) and does *not* follow indirections (e.g. via
pointers).

This implies that a hash can be computed without considering all memory
occupied by a value. In my current mental model of hashing I assumed
that all bytes should be read by default.

As you may conclude from this post I find this behavior odd. I expect a
hash implementation to follow indirections by calling the hashing
functions recursively. It looks to me as a case where the default is
badly designed/implemented.

Jens
You should get the same answer if in both cases they were static arrays, but I believe array1.dup returns a slice to a heap allocated array instead of the statically-sized array you originally had.
The original array is a dynamic array. But interesting that it is obvious that the hashes are not the same. This was really surprising to me, probably still is. Jens
Feb 27 2014