www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Deprecate implicit conversion between signed and unsigned integers

reply Paul Backus <snarwin gmail.com> writes:
D inherited these implicit conversions from C and C++, where they 
are widely regarded as a source of bugs.

* In his 2018 paper, ["Subscripts and sizes should be 
signed"][1], Bjarne Stroustrup gives several examples of bugs 
caused by the use of unsigned sizes and indices in C++. About 
half of them are caused by wrapping subtraction; the other half 
are caused by implicit conversion from signed to unsigned.

* The C++20 standard includes [safe integer comparison 
functions][2] specifically to avoid these implicit conversions 
when comparing integers of different signedness.

* Both [GCC][3] and [Clang][4] provide a `-Wsign-conversion` flag 
to warn about these implicit conversions.

In D, they cause the additional problem of breaking value-range 
propagation (VRP):

     enum byte a = -1;
     enum uint b = a; // Ok, but...
     enum byte c = b; // Error - original value range has been lost

D would be a simpler, easier-to-use language if these implicit 
conversions were removed. The first step to doing that is to 
deprecate them.

While this is a breaking change, migration of old code would be 
very simple: simply insert an explicit `cast` to silence the 
error and restore the original behavior. In many cases, migration 
could be performed automatically with a tool that uses the DMD 
frontend as a library.

I believe this change would be received positively by existing D 
programmers, since D's willingness to discard C and C++'s 
mistakes is one of the things that draws programmers to D in the 
first place.

[1]: 
https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
[2]: https://en.cppreference.com/w/cpp/utility/intcmp
[3]: 
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wsign-conversion
[4]: 
https://clang.llvm.org/docs/DiagnosticsReference.html#wsign-conversion
May 12
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 12 May 2024 at 13:32:36 UTC, Paul Backus wrote:
 D
 I believe this change would be received positively by existing 
 D programmers, since D's willingness to discard C and C++'s 
 mistakes is one of the things that draws programmers to D in 
 the first place.
D often takes a worse of both worlds; bad defaults and verbose handling size_t(-1) >0 is a problem with indexes being unsigned for a theatrical computer with >9334 petabytes of ram (2^63) being the wrong tradeoff No; types would need better defaults before even considering adding verbosity id rather see breaking changes.
May 12
prev sibling next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Sunday, 12 May 2024 at 13:32:36 UTC, Paul Backus wrote:
 In D, they cause the additional problem of breaking value-range 
 propagation (VRP):

     enum byte a = -1;
     enum uint b = a; // Ok, but...
     enum byte c = b; // Error - original value range has been 
 lost

 D would be a simpler, easier-to-use language if these implicit 
 conversions were removed. The first step to doing that is to 
 deprecate them.
Signed to unsigned should be deprecated (except where VRP can tell the source was not negative). Unsigned to signed can preserve the value range when the signed type is bigger than the unsigned type, e.g.: extern ubyte x; short y = x; // OK, short.max >= ubyte.max byte z = x; // Deprecate, byte.max < ubyte.max These deprecations should be for the next edition of D.
 While this is a breaking change, migration of old code would be 
 very simple: simply insert an explicit `cast` to silence the 
 error and restore the original behavior.
`cast` can be bug-prone if the original type gets changed. It would be better to have druntime template functions `signed` and `unsigned` to do the casts with IFTI to avoid changing the size of the type.
 In many cases, migration could be performed automatically with 
 a tool that uses the DMD frontend as a library.
Can you give some examples?
 I believe this change would be received positively by existing 
 D programmers, since D's willingness to discard C and C++'s 
 mistakes is one of the things that draws programmers to D in 
 the first place.
Yes, implicit conversion to a type with incompatible value range is too bug-prone, D should prevent that. It is particularly galling that decent C compilers have had warnings for this for such a long time. What about comparisons between incompatible signed and unsigned, deprecate too?
May 12
next sibling parent Nick Treleaven <nick geany.org> writes:
On Sunday, 12 May 2024 at 20:20:10 UTC, Nick Treleaven wrote:
 `cast` can be bug-prone if the original type gets changed. It 
 would be better to have druntime template functions `signed` 
 and `unsigned` to do the casts with IFTI to avoid changing the 
 size of the type.
I forgot, those already exist, at least in Phobos: https://dlang.org/phobos/std_conv.html#.unsigned
May 12
prev sibling next sibling parent reply Dom DiSc <dominikus scherkl.de> writes:
On Sunday, 12 May 2024 at 20:20:10 UTC, Nick Treleaven wrote:
 What about comparisons between incompatible signed and 
 unsigned, deprecate too?
We have a working solution that always returns the correct result (see https://issues.dlang.org/show_bug.cgi?id=259). I never understood why anyone would rely on a wrong comparison result, so this should not be considered a breaking change.
May 13
next sibling parent Dom DiSc <dominikus scherkl.de> writes:
On Tuesday, 14 May 2024 at 06:59:16 UTC, Dom DiSc wrote:
 We have a working solution that always returns the correct 
 result (see https://issues.dlang.org/show_bug.cgi?id=259).
And by the way: This solution doesn't involve integer propagation at all and also works for comparison of long with ulong. For beginners this is by far the worst bug in D, and its there since 18 (in words: eightteen) years - feels like lingering there longer than D itself. This is so distracting. It's ten lines of code and costs nothing (except if you indeed compare differnt signed types, and then it's still very cheep): ```d int opCmp(T, U)(const(T) a, const(U) b) pure safe nogc nothrow if(isIntegral!T && isIntegral!U && is(Unqual!T != Unqual!U)) { static if(isSigned!T && isUnsigned!U && T.sizeof <= U.sizeof) return (a < 0) ? -1 : opCmp(cast(U)a, b); else static if(isUnsigned!T && isSigned!U && T.sizeof >= U.sizeof) return (b < 0) ? 1 : opCmp(a, cast(T)b); else // use common type as ever: return opCmp(cast(CommonType!(T, U))a, cast(CommonType!(T, U))b); } ```
May 14
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 14 May 2024 at 06:59:16 UTC, Dom DiSc wrote:
 On Sunday, 12 May 2024 at 20:20:10 UTC, Nick Treleaven wrote:
 What about comparisons between incompatible signed and 
 unsigned, deprecate too?
We have a working solution that always returns the correct result (see https://issues.dlang.org/show_bug.cgi?id=259). I never understood why anyone would rely on a wrong comparison result, so this should not be considered a breaking change.
As I said in my reply to Nick, this proposal makes no distinction between conversions done in the context of a comparison and conversions done in any other context. I would rather not introduce a special case for comparisons, since special cases generally make the language more complex and harder to use. However, if you think this is a good idea, I encourage you to submit it as a separate proposal.
May 14
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 12 May 2024 at 20:20:10 UTC, Nick Treleaven wrote:
 Signed to unsigned should be deprecated (except where VRP can 
 tell the source was not negative).

 Unsigned to signed can preserve the value range when the signed 
 type is bigger than the unsigned type, e.g.:

     extern ubyte x;
     short y = x; // OK, short.max >= ubyte.max
     byte z = x;  // Deprecate, byte.max < ubyte.max
Agreed.
 `cast` can be bug-prone if the original type gets changed. It 
 would be better to have druntime template functions `signed` 
 and `unsigned` to do the casts with IFTI to avoid changing the 
 size of the type.
I assume by "changing the size of the type" you are referring specifically to *narrowing* conversions, not widening ones. If so, then yes, it's probably a good idea to use a helper template to avoid that.
 In many cases, migration could be performed automatically with 
 a tool that uses the DMD frontend as a library.
Can you give some examples?
Easier to give examples of the cases where it won't work: templates, because there's no reliable way to only apply the migration to specific instantiations; and string mixins, because there's no reliable way to find the source code corresponding to a mixed-in expression (if it even exists--it could be generated by CTFE).
 What about comparisons between incompatible signed and 
 unsigned, deprecate too?
All binary operators, including comparison operators, use the same implicit conversions, so yes, comparisons would be covered by this proposal.
May 14
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
Paul Backus kirjoitti 12.5.2024 klo 16.32:
 D would be a simpler, easier-to-use language if these implicit 
 conversions were removed. The first step to doing that is to deprecate 
 them.
Ditching all backwards-compatibility issues, it would be a good idea. But, this would cause *tremendous* amounts of breakage. Before, I would have said it simply isn't worth it. But since we're going to have editions, maybe. I'm still somewhat sceptical though. Nothing will break without a warning and people can stay at older editions if they want, but it's going to add a lot of work for someone migrating 100_000 lines to a new edition. That amount of code will likely have hundreds or even thousands of deprecations to fix. I tend to think that if we will write an official automatic tool to add the needed casts, it's probably worth it. Otherwise not.
May 13
parent reply Nick Treleaven <nick geany.org> writes:
On Monday, 13 May 2024 at 12:48:04 UTC, Dukc wrote:
 Paul Backus kirjoitti 12.5.2024 klo 16.32:
 Ditching all backwards-compatibility issues, it would be a good 
 idea. But, this would cause *tremendous* amounts of breakage.

 Before, I would have said it simply isn't worth it. But since 
 we're going to have editions, maybe. I'm still somewhat 
 sceptical though. Nothing will break without a warning and 
 people can stay at older editions if they want, but it's going 
 to add a lot of work for someone migrating 100_000 lines to a 
 new edition. That amount of code will likely have hundreds or 
 even thousands of deprecations to fix.
I think even with editions we need to avoid making it hard to port code to a newer edition. So instead of a deprecation, we could make it a `-w` warning instead.
 I tend to think that if we will write an official automatic 
 tool to add the needed casts, it's probably worth it. Otherwise 
 not.
May 13
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, May 13, 2024 8:04:34 AM MDT Nick Treleaven via dip.ideas wrote:
 On Monday, 13 May 2024 at 12:48:04 UTC, Dukc wrote:
 Paul Backus kirjoitti 12.5.2024 klo 16.32:
 Ditching all backwards-compatibility issues, it would be a good
 idea. But, this would cause *tremendous* amounts of breakage.

 Before, I would have said it simply isn't worth it. But since
 we're going to have editions, maybe. I'm still somewhat
 sceptical though. Nothing will break without a warning and
 people can stay at older editions if they want, but it's going
 to add a lot of work for someone migrating 100_000 lines to a
 new edition. That amount of code will likely have hundreds or
 even thousands of deprecations to fix.
I think even with editions we need to avoid making it hard to port code to a newer edition. So instead of a deprecation, we could make it a `-w` warning instead.
Deprecations are the language's tool for making changes where code will later become illegal, and normally, the only result is that a message is printed. No code is broken until the language is actually changed to remove the deprecated feature. In contrast, with how warnings are typically used in D, adding a warning is as good as adding an error, since it's extremely common to compile with -w, which makes all warnings errors, whereas arguably, -wi would be the better choice (but -w has been around longer and is shorter). Warnings are also an utterly terrible idea in general and really should never have been added to the compiler. Even if you compile them in the fashion that most compilers do and have them actually be warnings and not errors, you inevitably end up in one of two situations with warnings: 1. You ignore many of them, because many of them are actually fine (since they typically warn of something that's potentially a problem and not something that's definitively a problem), and the ultimate result is that you get a wall of warnings, burying any useful messages where they'll never be seen, meaning that even the ones that should be fixed don't get fixed. 2. In order to avoid having a bunch of messages being printed and to avoid burying warnings that really should be fixed, you "fix" all warnings. In many cases, this requires changing code that is actually perfectly fine, but whether the code was fine or not, the fact that you're always making sure to remove any warnings that pop up makes it so that they might as well have been errors instead of warnings. The end result is that warnings are utterly useless. Either they should have been errors, or they're better left to a linting tool. So, we really should not be adding to that problem by introducing more warnings. And the fact that D's type introspection often checks whether a particular piece of code compiles in order to construct the checks for template constraints and static ifs and the like means that having flags which change whether a particular piece of code compiles or not is particularly bad for D, and adding more warnings can actually change what code compiles or not (or can even change which overload of a template is used). So, we really shouldn't be adding more warnings. Deprecations don't have any of those problems unless you choose to compile with -de, which makes them warnings and which arguably shouldn't be a thing for the same reasons that it's problematic that -w turns warnings into errors. It actually affects conditional compilation and can do so in ways that are not easy to detect. - Jonathan M Davis
May 14
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
I think yes, we should ban signed/unsigned conversions, but I 
also think implicit conversions when VRP has validated all the 
values are representable is fine (e.g. `ubyte` should implicitly 
convert to `short` or `int`). This should cut down on the false 
positives.

-Steve
May 13
prev sibling next sibling parent An Pham <home home.com> writes:
On Sunday, 12 May 2024 at 13:32:36 UTC, Paul Backus wrote:
 D inherited these implicit conversions from C and C++, where 
 they are widely regarded as a source of bugs.
Just focus on sign vs unsign is not good enough. Sometime you need to specify a range of values. There is module std.CheckedInt which 1. Should extend it to be a runtime system module which does not need to 'import' 2. Add range template parameters Checked!(int, X, Y, ...) like Checked!(int, -5, 200, ...) which only hold values from -5 to 200 inclusively 3. Extend language to allow implicit passing parameter void foo(Checked!(int, X, Y, ...) z) and can be called by foo(10) is ok but foo(1000) should failed
May 14
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, May 12, 2024 7:32:36 AM MDT Paul Backus via dip.ideas wrote:
 D would be a simpler, easier-to-use language if these implicit
 conversions were removed. The first step to doing that is to
 deprecate them.
In my experience, this hasn't been a big enough issue for me to care, and it's seemed like more of an academic concern than an actual problem, but I probably just don't typically write the kind of code that runs into problems because of it. So, I don't mind the status quo, but I'm also fine with getting rid of such implicit conversions. The main question IMHO is how annoying it'll be in practice. The primary case I can think of where there would likely be problems would be code that returns -1 for an index with size_t (e.g. some of the Phobos functions do that when the item being searched for isn't found). It's something that works perfectly fine in general, but it means comparing a signed type and an unsigned type. It also sometimes mean explicitly assigning -1 to an unsigned type. Those can be replace with using the type's max instead, so it's not the end of the world buy any means, but it will require code changes, and the result is arguably uglier. As Steven pointed out though, VRP should still allow the conversion where appropriate, which should reduce how much code would need to be changed. A related problem is that the compiler allows implicit conversions between character types and integer types. And personally, I care about that one far more and would love to see that changed, but I'm not against the idea of getting rid of implicit conversions between signed and unsigned integer types. - Jonathan M Davis
May 14