digitalmars.D - Surprised by hashes of arrays of arrays
- Jens Mueller (33/33) Feb 27 2014 Dear list,
- thedeemon (2/7) Feb 27 2014 How shall it work with cycles in object graphs?
- Shammah Chancellor (5/47) Feb 27 2014 You should get the same answer if in both cases they were static
- Jens Mueller (5/53) Feb 27 2014 The original array is a dynamic array. But interesting that it is
Dear list, I stumbled over odd behavior which took quite some time of debugging. Sharing my results may help find a solution or just make others aware and reduce their debugging time. To illustrate consider the code: auto array1 = [ [1, 2], [3, 4] ]; auto array2 = array1.dup; array2[0] = array2[0].dup; Does either hash(&array1) == hash(&array2) hold or hash(&array1) != hash(&array2) where hash is defined as auto hash = &(typeid(array1).getHash); ? So I may be totally off here (please tell me), but it turns out that that both arrays have different hashes due to the way hashing is implemented. This, e.g., implies that equal arrays may be hashed to different values. Hence when you are using them as keys in an associative array results may be surprising. Note that I can make the problem more difficult to spot by using a struct that uses pointers or an array internally which is hidden by encapsulation. I find the behavior non-obvious but maybe there is reason. The current implementation considers only the direct contents of the struct's memory (the memory starting at array.ptr to array.length * size(array[0]) in case of dynamic arrays) and does *not* follow indirections (e.g. via pointers). This implies that a hash can be computed without considering all memory occupied by a value. In my current mental model of hashing I assumed that all bytes should be read by default. As you may conclude from this post I find this behavior odd. I expect a hash implementation to follow indirections by calling the hashing functions recursively. It looks to me as a case where the default is badly designed/implemented. Jens
Feb 27 2014
On Thursday, 27 February 2014 at 10:12:46 UTC, Jens Mueller wrote:As you may conclude from this post I find this behavior odd. I expect a hash implementation to follow indirections by calling the hashing functions recursively.How shall it work with cycles in object graphs?
Feb 27 2014
On 2014-02-27 10:11:53 +0000, Jens Mueller said:Dear list, I stumbled over odd behavior which took quite some time of debugging. Sharing my results may help find a solution or just make others aware and reduce their debugging time. To illustrate consider the code: auto array1 = [ [1, 2], [3, 4] ]; auto array2 = array1.dup; array2[0] = array2[0].dup; Does either hash(&array1) == hash(&array2) hold or hash(&array1) != hash(&array2) where hash is defined as auto hash = &(typeid(array1).getHash); ? So I may be totally off here (please tell me), but it turns out that that both arrays have different hashes due to the way hashing is implemented. This, e.g., implies that equal arrays may be hashed to different values. Hence when you are using them as keys in an associative array results may be surprising. Note that I can make the problem more difficult to spot by using a struct that uses pointers or an array internally which is hidden by encapsulation. I find the behavior non-obvious but maybe there is reason. The current implementation considers only the direct contents of the struct's memory (the memory starting at array.ptr to array.length * size(array[0]) in case of dynamic arrays) and does *not* follow indirections (e.g. via pointers). This implies that a hash can be computed without considering all memory occupied by a value. In my current mental model of hashing I assumed that all bytes should be read by default. As you may conclude from this post I find this behavior odd. I expect a hash implementation to follow indirections by calling the hashing functions recursively. It looks to me as a case where the default is badly designed/implemented. JensYou should get the same answer if in both cases they were static arrays, but I believe array1.dup returns a slice to a heap allocated array instead of the statically-sized array you originally had. -S.
Feb 27 2014
Shammah Chancellor wrote:On 2014-02-27 10:11:53 +0000, Jens Mueller said:The original array is a dynamic array. But interesting that it is obvious that the hashes are not the same. This was really surprising to me, probably still is. JensDear list, I stumbled over odd behavior which took quite some time of debugging. Sharing my results may help find a solution or just make others aware and reduce their debugging time. To illustrate consider the code: auto array1 = [ [1, 2], [3, 4] ]; auto array2 = array1.dup; array2[0] = array2[0].dup; Does either hash(&array1) == hash(&array2) hold or hash(&array1) != hash(&array2) where hash is defined as auto hash = &(typeid(array1).getHash); ? So I may be totally off here (please tell me), but it turns out that that both arrays have different hashes due to the way hashing is implemented. This, e.g., implies that equal arrays may be hashed to different values. Hence when you are using them as keys in an associative array results may be surprising. Note that I can make the problem more difficult to spot by using a struct that uses pointers or an array internally which is hidden by encapsulation. I find the behavior non-obvious but maybe there is reason. The current implementation considers only the direct contents of the struct's memory (the memory starting at array.ptr to array.length * size(array[0]) in case of dynamic arrays) and does *not* follow indirections (e.g. via pointers). This implies that a hash can be computed without considering all memory occupied by a value. In my current mental model of hashing I assumed that all bytes should be read by default. As you may conclude from this post I find this behavior odd. I expect a hash implementation to follow indirections by calling the hashing functions recursively. It looks to me as a case where the default is badly designed/implemented. JensYou should get the same answer if in both cases they were static arrays, but I believe array1.dup returns a slice to a heap allocated array instead of the statically-sized array you originally had.
Feb 27 2014