digitalmars.D - object.d and hash_t confusion?
- kris (66/66) Jun 21 2006 In object.d, there's an alias declaration for hash_t like so:
- James Pelcis (25/76) Jun 26 2006 Yes. In the internal\object.d file, it is hash_t. This is now Bugzilla...
- kris (6/10) Jun 26 2006 Well, the hope was that such an easy-to-make 'mistake' would be caught
- James Pelcis (9/22) Jun 26 2006 Alas, no. It's similar to (for example) using ubyte instead of GLubyte....
- xs0 (5/7) Jun 27 2006 Don't you think a hash of 64 (or even 32) bits should always be enough?
- Lionello Lunesu (5/13) Jun 27 2006 In fact, I think that a hash of 32-bit should indeed be enough for
In object.d, there's an alias declaration for hash_t like so:
------------
alias size_t hash_t;
-----------
This indicates that the hash_t type will be 32bit on a 32bit system, and 
64bit on that system; yes? Is this so that a pointer can be directly 
returned as a hash value?
Then, also in object.d, we have the decl for class Object:
-----------
class Object
{
     void print();
     char[] toString();
     uint toHash();
     int opCmp(Object o);
     int opEquals(Object o);
}
-----------
Notice that the toHash() method returns a uint? Is that supposed to be 
hash_t instead?
For the moment, let's suppose it is meant to be hash_t. The rest of this 
post is based upon that notion, so if I'm wrong here, no harm done :)
Using hash_t as the return type would mean the toHash() method returns a 
different type depending upon which platform it's compiled upon. This 
may have some ramifications, so let's explore what they might be:
1) because an alias is used, type-safety does not come into play. Thus, 
when someone overrides Object.toHash like so:
------------
override uint toHash() {...}
------------
a 32bit compiler will be unlikely to complain (remember, hash_t is an 
alias).
When this code is compiled in 64bit land, luckily, the compiler will 
probably complain about the uint/ulong mismatch. However, because the 
keyword "override" is not mandatory, most programmers will do this 
instead (in an class):
-----------
uint toHash() {....}
-----------
the result will perhaps be a good compile but a bogus override? Or will 
the compiler flag this as not being covariant? Either way, shouldn't 
this be handled in a more suitable manner?
I suppose one way to ensure consistency is to use a typedef instead of 
an alias ... but will that cause errors when the result is used in an 
arithmetic expression? In this situation, is typedef too type-safe and 
alias not sufficient?
2) It's generally not a great idea to change the signature/types of 
overridable methods when moving platforms. You have to ensure there's 
absolute consistency in the types used, otherwise the vaguely brittle 
nature of the override mechanism can be tripped.
So the question here is "why does toHash() need to change across 
platforms?". Isn't 32bits sufficient?
If the answer to that indicates a 64bit value being more applicable 
(even for avoiding type-conversion warnings), then it would seem to 
indicate a new integral-type is required? One that has type-safety (a la 
typedef) but can be used in arithmetic expression without warnings or 
errors? This new type would be equivalent to size_t vis-a-vis byte size.
I know D is supposed to have fixed-size basic integer types across 
platforms, and for good reason. Yet here's a situation where, it *seems* 
that the most fundamental class in the runtime is perhaps flaunting 
that? Perhaps there's a few other corners where similar concerns may 
crop up?
I will note a vague distaste for the gazilion C++ style meta-types 
anyway; D does the right thing in making almost all of them entirely 
redundant. But, if there is indeed a problem with toHash(), then I 
suspect we need a more robust solution. What say you?
 Jun 21 2006
kris wrote:Notice that the toHash() method returns a uint? Is that supposed to be hash_t instead?Yes. In the internal\object.d file, it is hash_t. This is now Bugzilla 225.1) because an alias is used, type-safety does not come into play. Thus, when someone overrides Object.toHash like so: ------------ override uint toHash() {...} ------------ a 32bit compiler will be unlikely to complain (remember, hash_t is an alias).The compiler would be right, too. It is the same type (for 32 bits).When this code is compiled in 64bit land, luckily, the compiler will probably complain about the uint/ulong mismatch. However, because the keyword "override" is not mandatory, most programmers will do this instead (in an class): ----------- uint toHash() {....} ----------- the result will perhaps be a good compile but a bogus override? Or will the compiler flag this as not being covariant? Either way, shouldn't this be handled in a more suitable manner?This is a programmer error, not a language error. Fortunately, it would be marked as not being covariant.I suppose one way to ensure consistency is to use a typedef instead of an alias ... but will that cause errors when the result is used in an arithmetic expression? In this situation, is typedef too type-safe and alias not sufficient?If a typedef was used, hash_t could still be used in expressions, but the result would need to be casted to go back to hash_t.2) It's generally not a great idea to change the signature/types of overridable methods when moving platforms. You have to ensure there's absolute consistency in the types used, otherwise the vaguely brittle nature of the override mechanism can be tripped. So the question here is "why does toHash() need to change across platforms?". Isn't 32bits sufficient?toHash definitely needs to change across platforms. Here's the current implementation: Ignoring the fact that the function won't currently work on 64-bit either (since it is marked as having a bug, although for a different reason), the result needs to be big enough to return a pointer. 32-bits won't always do that.If the answer to that indicates a 64bit value being more applicable (even for avoiding type-conversion warnings), then it would seem to indicate a new integral-type is required? One that has type-safety (a la typedef) but can be used in arithmetic expression without warnings or errors? This new type would be equivalent to size_t vis-a-vis byte size.On some platforms and at some time, even 64-bits won't be enough to handle toHash.I know D is supposed to have fixed-size basic integer types across platforms, and for good reason. Yet here's a situation where, it *seems* that the most fundamental class in the runtime is perhaps flaunting that? Perhaps there's a few other corners where similar concerns may crop up? I will note a vague distaste for the gazilion C++ style meta-types anyway; D does the right thing in making almost all of them entirely redundant. But, if there is indeed a problem with toHash(), then I suspect we need a more robust solution. What say you?Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed? If a change does need to be made though, the alias could be changed into a typedef. That would check for the problem regardless of the platform.
 Jun 26 2006
James Pelcis wrote:Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed?Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)If a change does need to be made though, the alias could be changed into a typedef. That would check for the problem regardless of the platform.Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
 Jun 26 2006
kris wrote:James Pelcis wrote:Alas, no. It's similar to (for example) using ubyte instead of GLubyte. Both are legal. In fact, we don't normally even want the compiler to complain about it.Since the only non-bug problem I noticed here was a programmer error (using uint instead of hash_t), why should it be changed?Well, the hope was that such an easy-to-make 'mistake' would be caught by the compiler :)Casting wouldn't be necessary when using a typedef'ed version of hash_t, but it would still be needed whenever it's assigned to a variable. Personally, I don't think it's necessary and it definitely isn't desirable to need to use casting for the Object class. I vote to leave it as is (with the bug fixed).If a change does need to be made though, the alias could be changed into a typedef. That would check for the problem regardless of the platform.Yep, but probably requires casting. Walter has noted on a number of ocassions that a cast is not exactly intended for general purposes. I just wonder if this should be considered a special-case or not
 Jun 26 2006
On some platforms and at some time, even 64-bits won't be enough to handle toHash.Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all.. xs0
 Jun 27 2006
xs0 wrote:In fact, I think that a hash of 32-bit should indeed be enough for anything. Even a 64-bit pointer should be hashable in 32-bits, by using some logical operations (hi ^ lo?). L.On some platforms and at some time, even 64-bits won't be enough to handle toHash.Don't you think a hash of 64 (or even 32) bits should always be enough? If your hashing function is bad, no amount of bits will help, and if it's good, 32 bits is enough for most everything, and 64 is definitely enough for anything at all..
 Jun 27 2006








 
  
  
 
 James Pelcis <jpelcis gmail.com>
 James Pelcis <jpelcis gmail.com> 