digitalmars.D - flagging unsigned subtraction assigned to bigger signed number?
- Steven Schveighoffer (57/57) May 19 I just spent about 15 minutes working on a problem where an
- Steven Schveighoffer (4/17) May 19 Typo there, should have been `DrawText(v.ptr, 100, ...)`
- Jonathan M Davis (34/42) May 19 Personally, I think that we should abolish all warnings in general. So, ...
- Walter Bright (4/4) May 20 The original concept of D was to have zero warnings. Code would be accep...
- Guillaume Piolat (4/6) May 20 Alt view: provide .ilength as a signed equivalent that would be
- Hipreme (9/17) May 20 Such a great idea! Even though it is strange having 2 properties
- Steven Schveighoffer (4/11) May 20 In this case, that wouldn't help. Because everything signed is
- Guillaume Piolat (6/18) May 20 The only cure that doesn't break everything is getting rid of
- Guillaume Piolat (4/7) May 20 Well that also doesn't help that foreach prefer unsigned by
- Walter Bright (2/3) May 20 For an index, foreach always defaults to size_t.
- Richard (Rikki) Andrew Cattermole (22/22) May 20 Counter proposal: throw data flow analysis at it, rather than change the...
- Walter Bright (43/43) May 20 This is a common issue. Unfortunately, nobody has come up with a solutio...
- Richard (Rikki) Andrew Cattermole (24/29) May 20 ``$ gcc --analyzer file.c``
- Walter Bright (12/12) May 21 Interesting example! Yes, the DFA done by this dials it up a notch, and ...
- Steven Schveighoffer (23/36) May 21 Well, that doesn't mean we shouldn't try. AFAIK, the `if(cond);`
- Walter Bright (30/30) May 21 Some good arguments. But it comes down to a value judgement against comp...
- Araq (6/8) May 21 Not really. You can just follow the instructions here to emulate
- Derek Fawcus (9/16) May 21 It has been a long time since I last used it, but I vaguely
- Derek Fawcus (6/13) May 21 Yup: https://home.nvg.org/~skars/programming/C_Rationale.pdf
- Walter Bright (3/6) May 21 At that time, 90% of C programming was done on DOS, not Unix. And there ...
- Kagamin (5/9) May 21 Doubtful. Previously this prejudice reached the point where
- Kagamin (1/11) Jun 01
- Walter Bright (1/1) Jun 01 I bet that implementation of malloc is using unsigned types.
- Walter Bright (3/3) May 21 In my not-so-humble opinion, every cast is a bug, as it breaks the type ...
- monkyyy (2/3) May 21 How do you make fast inverse sqrt, sumtypes, allocators?
- Walter Bright (2/6) May 21 You call it @system code!
- Steven Schveighoffer (9/11) May 21 The problem isn't the cast, the problem is that is the tool D
- Lance Bachmeier (4/10) May 21 None of this has prevented you from supporting safe by default.
- kdevel (4/17) May 21 Looks okay. Or is the `reverse` part of the cleverness? I mean the
- Walter Bright (2/7) May 21 It's "clever" because it isn't clear why it is structured this way.
- Steven Schveighoffer (4/13) May 21 It's so clever, in fact, that it doesn't work. I'll let you
- Walter Bright (5/6) May 23 LOL, you're right. While the types are correct, the wrap around still ma...
I just spent about 15 minutes working on a problem where an unsigned subtraction converted to a float caused a totally unexpected value. I had to printf debug this, because it was happening only at a certain point. The code was something like this: ```d foreach(i, v; messages) // messages is a string array { float vpos = 600 - 30 * (messages.length - i); DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE); } ``` This is raylib code, and what ends up happening is that once messages.length exceeds 20, you get into a situation where `600 - 30 * (messages.length - i)` wraps to ulong.max - 29. And obviously, that doesn't convert to an int. So the diabolical thing about this, is that I focused on the `messages.length - i` because that is unsigned math, but there is zero chance this could wrap. So I was puzzled how this could be causing overflow. But the unsigned-ness percolates throughout the whole expression! I was brainstorming how we could flag this as error prone without flooding all projects with warnings that won't (usually) be a problem. I thought of this possible filter: 1. Unsigned subtraction happens 2. The result is used as a signed number 3. The signed number's range encloses the range of all values of the unsigned value. So that last point might seem puzzling, why do we not care about converting signed/unsigned integers of the same width? We don't care because actually it works out exactly as expected: e.g. ```d uint x = 100; int v = x - 200; assert(v == -100); // OK! ``` But if you assign to a type which actually can represent all the uint (or at least the range of uint), then you get bombs: ```d int x = 100; double v = x - 200; assert(v == -100); // BOOM long v2 = x - 200; assert(v2 == -100); // BOOM ``` So why not just flag all unsigned/signed conversions? The problem with this idea (and technically it is a sound idea) is that it will flag so many things that are likely fine. Flagging all conversions just is going to be extremely noisy, and require casts everywhere. Subtraction is where the problems are. In cases where the compiler knows that unsigned subtraction won't wrap, it can also skip the warning. Thoughts? -Steve
May 19
On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:I just spent about 15 minutes working on a problem where an unsigned subtraction converted to a float caused a totally unexpected value. I had to printf debug this, because it was happening only at a certain point. The code was something like this: ```d foreach(i, v; messages) // messages is a string array { float vpos = 600 - 30 * (messages.length - i); DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE); } ```Typo there, should have been `DrawText(v.ptr, 100, ...)` -Steve
May 19
On Monday, May 19, 2025 9:11:47 PM Mountain Daylight Time Steven Schveighoffer via Digitalmars-d wrote:So why not just flag all unsigned/signed conversions? The problem with this idea (and technically it is a sound idea) is that it will flag so many things that are likely fine. Flagging all conversions just is going to be extremely noisy, and require casts everywhere. Subtraction is where the problems are. In cases where the compiler knows that unsigned subtraction won't wrap, it can also skip the warning. Thoughts?Personally, I think that we should abolish all warnings in general. So, I'm very much against adding more. If you have warnings, ultimately, you either get stuck with a list of warnings that you have to wade through to see if there's anything that actually needs to be fixed, or you're forced to "fix" them even when they're not broken. I very much do not want the compiler spitting out stuff that I have to ignore or "fix" just to shut it up. And the fact that we currently have -w makes it all that much worse, because it means that warnings can change the results for type introspection and thus actually change what a program does rather than just warn the programmer of a potential problem. But even with -w being removed (or changed to behave like -wi), I still think that it's a mistake for the compiler to have warnings. Sometimes, a warning will show an actual problem, but in general, it just adds more stuff that programmers have to "fix" to shut the compiler up. If a problem is truly bad enough that it merits telling the programmer about it, then we should be looking at a reasonable set of conditions which are clearly an error and should be treated that way. So, if we can come up with a compiler error that unequivocally indicates bad logic that actually does need to be fixed, then I'm all for adding it, but I think that anything along the lines of a warning should be left to a linting tool. And really, this particular issue comes from the general issues of implicit conversions between signed and unsigned. Such implicit conversions are sometimes useful, but they also easily cause bugs which can be hard to find. I do wonder if we'd be better off if we just treated converting between signed and unsigned as the narrowing conversion that it is, or whether it would just create a need for too many casts. VRP would help, but it's not very sophisticated, so I don't know if it would help enough to matter. I suspect that with my code at least, making it an error would be absolutely fine, but it's likely to be very dependent on the kind of code that's written. Either way, from what I recall, Walter is very much against treating converting signedness as a narrowing conversion (even thouhgh it is), so I doubt that he'd agree to it. - Jonathan M Davis
May 19
The original concept of D was to have zero warnings. Code would be accepted or rejected, no wishy-washy semantics. Well, after a few years, warnings crept in with the best of intentions. We wound up exactly in the soup I tried to avoid.
May 20
On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:Thoughts? -SteveAlt view: provide .ilength as a signed equivalent that would be here ptrdiff_t
May 20
On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:Such a great idea! Even though it is strange having 2 properties meaning the same, only the type being different, I would actually really like having a feature like that: it doesn't break anything and solve actual problems. I was even saying like yesterday the same problems I find about not having integer division vs floating division operator. As I'm writing code more iteratively, that seems to become quite common in my day.Thoughts? -SteveAlt view: provide .ilength as a signed equivalent that would be here ptrdiff_t
May 20
On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:On Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:In this case, that wouldn't help. Because everything signed is promoted to unsigned. -SteveThoughts?Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t
May 20
On Tuesday, 20 May 2025 at 15:20:10 UTC, Steven Schveighoffer wrote:On Tuesday, 20 May 2025 at 11:12:03 UTC, Guillaume Piolat wrote:The only cure that doesn't break everything is getting rid of unsigned at every API level, the mindset "this is always >= 0 so should be unsigned" is what cause all those unsigned bugs. But it's too late for .lengthOn Tuesday, 20 May 2025 at 03:11:47 UTC, Steven Schveighoffer wrote:In this case, that wouldn't help. Because everything signed is promoted to unsigned. -SteveThoughts?Alt view: provide .ilength as a signed equivalent that would be here ptrdiff_t
May 20
On Tuesday, 20 May 2025 at 15:20:10 UTC, Steven Schveighoffer wrote:In this case, that wouldn't help. Because everything signed is promoted to unsigned. -SteveWell that also doesn't help that foreach prefer unsigned by default...
May 20
On 5/20/2025 9:36 AM, Guillaume Piolat wrote:Well that also doesn't help that foreach prefer unsigned by default...For an index, foreach always defaults to size_t.
May 20
Counter proposal: throw data flow analysis at it, rather than change the language. It would be able to catch things like: ```d int[] array = [1, 2]; size_t offset; foreach(i; 0 .. array.length) { offset = i + 1; } array[offset]; // Error out of bounds! ``` But not: ```d void func(int[] array, size_t offset) { array[offset]; } ``` To catch the second would require false positives to be acceptable, and the default pain tolerance for the D community is lower. Yes this is more complex to implement than a language change, but the support can be improved over time, it isn't fixed, and we can have the slower DFA that will error out in the latter example.
May 20
This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years. Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code. Changing the conversion rules will break a lot of existing behavior. There's no way around it. There is hope, however. Try the following rules: 1. use `size_t` for all indices and pointer offsets 2. use `ptrdiff_t` for all deltas of the form `size_t - size_t` Given: ```d float vpos = 600 - 30 * (messages.length - i); ``` the types are: ``` float = int - int * (size_t - size_t); ``` The rvalue integral promotion rules turn everything into size_t, which is unsigned. The difficulty arises from `(messages.length - i)` which is computing a delta between two `size_t`s being another `size_t`. What you'd like is the result of the expression being a `ptrdiff_t`. ```d ptrdiff_t delta = messages.length - i; float ypos = 600 - delta; ``` That should give you coherent and robust results. No forced cast is needed. (Forced casts should be avoided, as they can hide bugs.) You could also do this: ```d foreach_reverse(ptrdiff_t i, v; messages) // messages is a string array { float vpos = 600 - 30 * i; DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE); } ``` but one could argue it's a bit too clever. P.S. since size_t is just an alias for uint or ulong, a special status for size_t is impractical. So you cannot just say that `size_t - size_t` should be typed as `ptrdiff_t`. P.P.S. Some languages, like Java, decided the solution is to not have an unsigned type. This worked until programmers resorted to desperate measures to fake an unsigned type. P.P.P.S. Forced casting results in hidden risk of losing significant bits. Not a good plan for robust code.
May 20
On 21/05/2025 4:44 PM, Walter Bright wrote:This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years. Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code. Changing the conversion rules will break a lot of existing behavior. There's no way around it.``$ gcc --analyzer file.c`` ```c++ #include <stdio.h> void test(unsigned int len, int* ptr) { for(int i = 0; i < len; i++) { int j = i - 1; printf("%d\n", ptr[j]); } } int main() { int val = 0; test(1, &val); return 0; } ``` Some highlights of output (its a giant dump of awesomeness): ``` <source>:6:15: warning: stack-based buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds] out-of-bounds read from byte -4 till byte -1 but 'val' starts at byte 0 ``` Clang-analyzer doesn't appear to have a solution to this (yet), but gcc's does appear to catch the obvious scenario here.
May 20
Interesting example! Yes, the DFA done by this dials it up a notch, and it will catch some errors. Some points: 1. it shouldn't issue a warning - it should issue an error. If the programmer wanted this code to execute anyway, he could engage point 2 to defeat the DFA and do an out-of-bounds read. But I have no influence over C, the C community can do what they want 2. it's the old halting problem again. No matter how good the DFA is, it cannot solve the problem in general. It's the same limitation that statically detecting null pointer dereferences has 3. D's approach with array bounds checked arrays does solve the problem in the general case (at some cost to runtime performance). The more advanced DFA could help in removing unnecessary bounds checks.
May 21
On Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years.Well, that doesn't mean we shouldn't try. AFAIK, the `if(cond);` issue existed for decades before we fixed it as well.Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code.How much code? Is it worth it? Won't it find bugs in existing code that are dormant?Changing the conversion rules will break a lot of existing behavior. There's no way around it.I don't want to change any existing conversion rules. I want to flag likely error prone behavior. Like `if(cond);`.There is hope, however. Try the following rules:This completely misses the point. There are solutions, obviously. An inexperienced programmer is going to run headlong into this wall and spend hours/days dealing with the fallout. What I want is for the compiler to tell me when I likely got it wrong. The errors of `if(a = b)` and `if(cond);` have been a godsend for me. Every time I hit that error, I thank the D gods that I was just saved an hour of head scratching. And I have 30 years experience developing software!P.P.S. Some languages, like Java, decided the solution is to not have an unsigned type. This worked until programmers resorted to desperate measures to fake an unsigned type.This is not as dramatic as you make it sound. I think it was the correct decision. But obviously, we can't go that route now.P.P.P.S. Forced casting results in hidden risk of losing significant bits. Not a good plan for robust code.Use `to` instead of `cast` if you are worried about losing bits. Which should be just about never. C developers get through life just fine, and lose bits left and right. -Steve
May 21
Some good arguments. But it comes down to a value judgement against competing interests. The `if (cond);` has no useful purpose, and so making it illegal is a no-brainer. I'm surprised that the C/C++ Standard still hasn't at least deprecated it. An unsigned type is of less utility in a language like Java that is not a systems programming language (trying to make a memory allocator with no unsigned types is going to wind up pretty ugly). But, people still fake unsigned types and use them. I, too, have found that the prohibition of `if (a = b)` has been of benefit, with pretty much zero downside. Such a construction, even if intended, will always look suspicious, so professional code should eschew it. While you are correct that `(messages.length - i)` will never wrap, as you wrote any subtraction in unsigned math has the potential for wrapping, here it is the `600 -` that's doing it. I've been programming for a very long time, and reflexively double check any subtractions (particularly in for-loops that count down to zero), and any < <= > >= comparisons. They're all potential signed/unsigned bugs. There are a lot more signed/unsigned interactions than just subtraction. Addressing one means what about the others? Do we really want to flag a conversion from unsigned to floating point? Back in 1989 when the first C standard was being developed, half the existing compilers used "sign preserving" integral conversions, while the other half used "value preserving". Couldn't have both, one set of compilers was going to lose. But never on the table was getting rid of mixed signed/unsigned expressions. They were too valuable. (Value preserving was selected.) Another factor is I've used languages that required explicit casting to use mixed signed/unsigned. I understood the reasoning behind it. But it was just unpleasant and the code looked ugly. Just not worth it. P.S. this is one of those perennial topics that regularly comes up.
May 21
On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote:(trying to make a memory allocator with no unsigned types is going to wind up pretty ugly).Not really. You can just follow the instructions here to emulate unsigned in a principled way: https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java It's not much harder than your rules that effectively ban the minus operator for unsigned arithmetic.
May 21
On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote:Back in 1989 when the first C standard was being developed, half the existing compilers used "sign preserving" integral conversions, while the other half used "value preserving". Couldn't have both, one set of compilers was going to lose. But never on the table was getting rid of mixed signed/unsigned expressions. They were too valuable. (Value preserving was selected.)It has been a long time since I last used it, but I vaguely recall that sign preserving was what K&R specified, and what the Bell Labs / AT&T compilers did. Hence this was actually a change by the ANSI committee. (Maybe the then existing value preserving compilers did so for a good reason) DF
May 21
On Wednesday, 21 May 2025 at 19:19:37 UTC, Derek Fawcus wrote:It has been a long time since I last used it, but I vaguely recall that sign preserving was what K&R specified, and what the Bell Labs / AT&T compilers did. Hence this was actually a change by the ANSI committee. (Maybe the then existing value preserving compilers did so for a good reason)Yup: https://home.nvg.org/~skars/programming/C_Rationale.pdf (page 40 & 41) "After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving."
May 21
On 5/21/2025 12:24 PM, Derek Fawcus wrote:"After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving."At that time, 90% of C programming was done on DOS, not Unix. And there were a lot of value preserving compilers.
May 21
On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote:An unsigned type is of less utility in a language like Java that is not a systems programming language (trying to make a memory allocator with no unsigned types is going to wind up pretty ugly).Doubtful. Previously this prejudice reached the point where cryptography was declared impossible in java due to [absence of unsigned integers](https://en.wikipedia.org/wiki/Criticism_of_Java#Unsigned_integer_types).
May 21
On Wednesday, 21 May 2025 at 18:19:19 UTC, Walter Bright wrote:Just found: https://man7.org/linux/man-pages/man3/malloc.3.htmlAn unsigned type is of less utility in a language like Java that is not a systems programming language (trying to make a memory allocator with no unsigned types is going to wind up pretty ugly).Attempting to allocate more than PTRDIFF_MAX bytes is considered an error, as an object that large could cause later pointer subtraction to overflow. malloc() and related functions rejected sizes greater than PTRDIFF_MAX starting in glibc 2.30.
Jun 01
I bet that implementation of malloc is using unsigned types.
Jun 01
In my not-so-humble opinion, every cast is a bug, as it breaks the type system. I posted an example that showed how this particular problem can be dealt with without casting.
May 21
On Wednesday, 21 May 2025 at 20:53:13 UTC, Walter Bright wrote:every cast is a bug,How do you make fast inverse sqrt, sumtypes, allocators?
May 21
On 5/21/2025 3:00 PM, monkyyy wrote:On Wednesday, 21 May 2025 at 20:53:13 UTC, Walter Bright wrote:You call it system code!every cast is a bug,How do you make fast inverse sqrt, sumtypes, allocators?
May 21
On Wednesday, 21 May 2025 at 20:53:13 UTC, Walter Bright wrote:In my not-so-humble opinion, every cast is a bug, as it breaks the type system.The problem isn't the cast, the problem is that is the tool D gives you to use for this kind of thing. I somewhat agree -- I do not like pulling out cast for things like this. But the alternative is horrible (type constructor + bitwise and). I think std.conv probably should have a `truncate` function that does the right thing. -Steve
May 21
On Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:This is a common issue. Unfortunately, nobody has come up with a solution to this in the last 45 years. Since every combination of signed and unsigned has well-defined behavior, prohibiting one of those behaviors is going to break a lot of code. Changing the conversion rules will break a lot of existing behavior. There's no way around it.None of this has prevented you from supporting safe by default. Heck, I thought the whole purpose behind your work on the borrow checker was to break code.
May 21
On Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:1. use `size_t` for all indices and pointer offsets 2. use `ptrdiff_t` for all deltas of the form `size_t - size_t`Thanks for that advice![...] You could also do this: ```d foreach_reverse(ptrdiff_t i, v; messages) // messages is a string array { float vpos = 600 - 30 * i; DrawText(messages.ptr, 100, vpos.to!int, 20, Colors.WHITE); } ``` but one could argue it's a bit too clever.Looks okay. Or is the `reverse` part of the cleverness? I mean the ordinary (forward) foreach also seem to work.
May 21
On 5/21/2025 12:43 PM, kdevel wrote:On Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:It's "clever" because it isn't clear why it is structured this way.but one could argue it's a bit too clever.Looks okay. Or is the `reverse` part of the cleverness? I mean the ordinary (forward) foreach also seem to work.
May 21
On Wednesday, 21 May 2025 at 23:07:20 UTC, Walter Bright wrote:On 5/21/2025 12:43 PM, kdevel wrote:It's so clever, in fact, that it doesn't work. I'll let you discover why. -SteveOn Wednesday, 21 May 2025 at 04:44:35 UTC, Walter Bright wrote:It's "clever" because it isn't clear why it is structured this way.but one could argue it's a bit too clever.Looks okay. Or is the `reverse` part of the cleverness? I mean the ordinary (forward) foreach also seem to work.
May 21
On 5/21/2025 6:27 PM, Steven Schveighoffer wrote:It's so clever, in fact, that it doesn't work. I'll let you discover why.LOL, you're right. While the types are correct, the wrap around still makes it wrong. BTW, if you typed everything but the float as `int`, you still have the same problem. It's not the unsigned conversion that is wrong.
May 23