www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ordering comparisons

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
While reviewing work on array comparisons, Vladimir found an odd issue:

https://issues.dlang.org/show_bug.cgi?id=17244

Investigation reveals that during array comparison for inequality, 
structs are compared by means of memcmp - even if they don't define 
opCmp, and regardless of the types of their fields. This has obvious 
unpleasant consequences:

* Floating point values don't compare well with memcmp

* A struct S { int x; } compares differently on little endian and big 
endian system (!)

* A struct containing a field that defines opCmp ignores it

* ... and many other nefarious issues

The question is what to do to minimize breakage yet "break the bad 
code". The most backward-compatible solution is to define opCmp 
automatically to do a field-by-field lexicographical comparison. The 
most radical solution is disable ordering comparisons unless explicitly 
implemented by the user.


Andrei
Mar 06 2017
next sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Monday, March 06, 2017 20:27:56 Andrei Alexandrescu via Digitalmars-d 
wrote:
 While reviewing work on array comparisons, Vladimir found an odd issue:

 https://issues.dlang.org/show_bug.cgi?id=17244

 Investigation reveals that during array comparison for inequality,
 structs are compared by means of memcmp - even if they don't define
 opCmp, and regardless of the types of their fields. This has obvious
 unpleasant consequences:
 The question is what to do to minimize breakage yet "break the bad
 code". The most backward-compatible solution is to define opCmp
 automatically to do a field-by-field lexicographical comparison. The
 most radical solution is disable ordering comparisons unless explicitly
 implemented by the user.
If we can, we should probably make structs that have opCmp use opCmp like they should, and those that don't continue to use memcmp for the moment but get a deprecation message. Then after whatever amount of time we think is appropriate, we disable the comparisons. So, we don't break anything immediately, but ultimately end up with the correct situation of comparisons not working unless opCmp being defined. That being said, depending on how all of that is implemented, I don't know how easy that approach is. - Jonathan M Davis
Mar 06 2017
parent Dominikus Dittes Scherkl <Dominikus.Scherkl continental-corporation.com> writes:
On Tuesday, 7 March 2017 at 02:51:37 UTC, Jonathan M Davis wrote:

 If we can, we should probably make structs that have opCmp use 
 opCmp like they should, and those that don't continue to use 
 memcmp for the moment but get a deprecation message. Then after 
 whatever amount of time we think is appropriate, we disable the 
 comparisons. So, we don't break anything immediately, but 
 ultimately end up with the correct situation of comparisons not 
 working unless opCmp being defined.
+1 This is the only sensible approach.
Mar 07 2017
prev sibling next sibling parent reply "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
 * A struct S { int x; } compares differently on little endian and big
 endian system (!)
This one is very surprising. How is that so, if both structs being compared are of the same endian-ness?
Mar 06 2017
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
Nick Sabalausky (Abscissa) wrote:

 On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
 * A struct S { int x; } compares differently on little endian and big
 endian system (!)
This one is very surprising. How is that so, if both structs being compared are of the same endian-ness?
if we're talking about equality, yes. but opCmp is "less/greater" comparison. so, 0x290a will be: 0x0a 0x29 on little-endian, and: 0x29 0x0a on big-endian. let's take 0x1930, and compare it with 0x290a, using memcmp: for big-endian, first byte: 0x29:0x19: hit, 0x290a is greater. for little-endian, first byte: 0x0a:0x30: hit, 0x290a is *lesser* than 0x1930. oops.
Mar 06 2017
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2017-03-07 04:44, Nick Sabalausky (Abscissa) wrote:

 This one is very surprising. How is that so, if both structs being
 compared are of the same endian-ness?
The structs for a given run will be of the same endian-ness. But if you run the same code on two different systems, one with little endian and one with big endian, you will get different results when sorting an array, for example. If I understand this correctly. -- /Jacob Carlborg
Mar 06 2017
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/06/2017 10:44 PM, Nick Sabalausky (Abscissa) wrote:
 On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
 * A struct S { int x; } compares differently on little endian and big
 endian system (!)
This one is very surprising. How is that so, if both structs being compared are of the same endian-ness?
On a big endian system, comparing integers with memcmp works correctly because higher-rank bytes are compared before lower-rank bytes. On a little endian system, lower-rank bytes are compared first, which would make e.g. 256 less than 1. Porting code that relies on comparing arrays of structs for ordering is likely to be a nightmare. We really need to fix this. -- Andrei
Mar 07 2017
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
On Tuesday, 7 March 2017 at 01:27:56 UTC, Andrei Alexandrescu 
wrote:
 The question is what to do to minimize breakage yet "break the 
 bad code". The most backward-compatible solution is to define 
 opCmp automatically to do a field-by-field lexicographical 
 comparison. The most radical solution is disable ordering 
 comparisons unless explicitly implemented by the user.
There should be no assumption that structs are comparable, so the later.
Mar 07 2017
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Mar 06, 2017 at 08:27:56PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
 While reviewing work on array comparisons, Vladimir found an odd issue:
 
 https://issues.dlang.org/show_bug.cgi?id=17244
 
 Investigation reveals that during array comparison for inequality,
 structs are compared by means of memcmp - even if they don't define
 opCmp, and regardless of the types of their fields. This has obvious
 unpleasant consequences:
Couple of points: (1) I may be remembering wrong, but I thought structs had always been intended to be compared field-wise? I remember when working on AA's that the compiler would emit a default implementation of opEquals that did member-wise comparisons. I had always assumed that something similar was done with inequalities... or was that just unfounded extrapolation? (2) I didn't realize that arrays allowed inequalities by default, though in retrospect it does make sense (since we do define string inequalities, and string are arrays). But I would have expected that structs would *not* allow inequalities by default, because it's not clear that a default ordering relation makes sense for every struct. Consider for example, a struct representation of a complex number: struct Complex { float re, im; } It would be wrong to assume a default ordering relation on this struct because the complex numbers do not have a linear ordering. [...]
 The question is what to do to minimize breakage yet "break the bad
 code".  The most backward-compatible solution is to define opCmp
 automatically to do a field-by-field lexicographical comparison. The
 most radical solution is disable ordering comparisons unless
 explicitly implemented by the user.
[...] I wouldn't say disabling ordering comparisons is "radical". In fact, I think it makes the most sense by assuming the least about the user's intended semantics for the struct. Lexicographical ordering by default makes sense for arrays (e.g., strings), but I don't think it makes sense for general structs. Without the user stating what his intents are, it seems unfounded to presume lexicographic ordering by default. If the user has a struct of orthogonal information, e.g., name, phone number, address, there is no reason why changing the order of fields (during code refactoring, for example) should result in a completely different ordering just because the language assumes lexicographical ordering by default. It's better to make it an error to involve structs in inequalities if the user didn't explicitly define opCmp, than to silently accept such comparisons, which are likely to be buggy, and then have unusual behaviour later when the user reorders fields and a seemingly unrelated part of the code suddenly begins producing different output. I'd say allowing inequalities on structs by default is surprising behaviour, whereas asking the user to explicitly specify an ordering is more expected. T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Mar 07 2017
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/07/2017 12:54 PM, H. S. Teoh via Digitalmars-d wrote:
 (1) I may be remembering wrong, but I thought structs had always been
 intended to be compared field-wise?  I remember when working on AA's
 that the compiler would emit a default implementation of opEquals that
 did member-wise comparisons. I had always assumed that something
 similar was done with inequalities... or was that just unfounded
 extrapolation?
We currently do memcmp. Equality by memberwise comparison is almost always meaningful; ordering by lexicographic comparison of members is not. Andrei
Mar 07 2017
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 7 March 2017 at 19:40:53 UTC, Andrei Alexandrescu 
wrote:
 On 03/07/2017 12:54 PM, H. S. Teoh via Digitalmars-d wrote:
 (1) I may be remembering wrong, but I thought structs had 
 always been
 intended to be compared field-wise?  I remember when working 
 on AA's
 that the compiler would emit a default implementation of 
 opEquals that
 did member-wise comparisons. I had always assumed that 
 something
 similar was done with inequalities... or was that just 
 unfounded
 extrapolation?
We currently do memcmp. Equality by memberwise comparison is almost always meaningful; ordering by lexicographic comparison of members is not. Andrei
We should deprecate that array comparison behavior immediately. It should not break much code, otherwise the issue would have popped up before.
Mar 07 2017
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Mar 07, 2017 at 07:53:37PM +0000, Stefan Koch via Digitalmars-d wrote:
 On Tuesday, 7 March 2017 at 19:40:53 UTC, Andrei Alexandrescu wrote:
 On 03/07/2017 12:54 PM, H. S. Teoh via Digitalmars-d wrote:
 (1) I may be remembering wrong, but I thought structs had always
 been intended to be compared field-wise?  I remember when working
 on AA's that the compiler would emit a default implementation of
 opEquals that did member-wise comparisons. I had always assumed
 that something similar was done with inequalities... or was that
 just unfounded extrapolation?
We currently do memcmp. Equality by memberwise comparison is almost always meaningful; ordering by lexicographic comparison of members is not. Andrei
We should deprecate that array comparison behavior immediately. It should not break much code, otherwise the issue would have popped up before.
Unfortunately, because strings are arrays, deprecating array comparisons *will* break a *lot* of code. I'm all for deprecating comparisons between arrays of structs, though. The best way to do this is to deprecate implicit struct comparisons (the user has to define opCmp if he wants to compare structs). T -- What doesn't kill me makes me stranger.
Mar 07 2017
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/07/2017 03:49 PM, H. S. Teoh via Digitalmars-d wrote:
 Unfortunately, because strings are arrays, deprecating array comparisons
 *will* break a *lot* of code.
Misunderstanding, string comparisons are legit because character comparisons are legit. This is about deprecating array comparisons that don't have a corresponding element comparison. -- Andrei
Mar 07 2017