www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [OT] The Usual Arithmetic Confusions

reply Paul Backus <snarwin gmail.com> writes:
https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html

 There are a lot of aspects of C++ that are not well understood 
 and lead to all sorts of confusion. The *usual arithmetic 
 conversions* and the *integral promotions* are two such 
 aspects. [...] This is one of the areas in C++ that comes 
 directly from C, so pretty much all of these examples applies 
 to C as well as C++.
Unfortunately, this is also one of the areas of D that comes directly from C, so D programmers have to watch out for these as well. It's been argued in the past, on these forums, that these conversions are "just something you have to learn" if you want to do system-level programming. But if C++ programmers are still getting this stuff wrong, after all these years, perhaps the programmers aren't the problem. Is it possible that these implicit conversions are just too inherently error-prone for programmers to reliably use correctly?
Jan 27 2022
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 28, 2022 at 02:15:51AM +0000, Paul Backus via Digitalmars-d wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html
[...]
 Unfortunately, this is also one of the areas of D that comes directly
 from C, so D programmers have to watch out for these as well.
 
 It's been argued in the past, on these forums, that these conversions
 are "just something you have to learn" if you want to do system-level
 programming. But if C++ programmers are still getting this stuff
 wrong, after all these years, perhaps the programmers aren't the
 problem. Is it possible that these implicit conversions are just too
 inherently error-prone for programmers to reliably use correctly?
I agree. The question, though, is how to convince Walter. :-P T -- Старый друг лучше новых двух.
Jan 27 2022
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Friday, 28 January 2022 at 02:33:18 UTC, H. S. Teoh wrote:
 On Fri, Jan 28, 2022 at 02:15:51AM +0000, Paul Backus via 
 Digitalmars-d wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html
[...]
 Unfortunately, this is also one of the areas of D that comes 
 directly from C, so D programmers have to watch out for these 
 as well.
 
 It's been argued in the past, on these forums, that these 
 conversions are "just something you have to learn" if you want 
 to do system-level programming. But if C++ programmers are 
 still getting this stuff wrong, after all these years, perhaps 
 the programmers aren't the problem. Is it possible that these 
 implicit conversions are just too inherently error-prone for 
 programmers to reliably use correctly?
I agree. The question, though, is how to convince Walter. :-P T
Now that we have importC, the argument for easier porting C to D code is less valid. D no longer needs to closely follow the usual arithmetic "confusions" of C, just so that some C code can have the same meaning when copied to D. I prefer to get compliler errors when porting C to D than preserving the landmines from the original code.
Jan 28 2022
prev sibling next sibling parent reply forkit <forkit gmail.com> writes:
On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 Unfortunately, this is also one of the areas of D that comes 
 directly from C, so D programmers have to watch out for these 
 as well.

 It's been argued in the past, on these forums, that these 
 conversions are "just something you have to learn" if you want 
 to do system-level programming. But if C++ programmers are 
 still getting this stuff wrong, after all these years, perhaps 
 the programmers aren't the problem. Is it possible that these 
 implicit conversions are just too inherently error-prone for 
 programmers to reliably use correctly?
I would really like a compile time option, that outputs warnings on implicit conversions (just warnings, not errors). It should still compile nonetheless. I would use that feature, a lot!
Jan 27 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 28 January 2022 at 03:44:25 UTC, forkit wrote:

-profile=implicitConversions
Jan 27 2022
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Friday, 28 January 2022 at 03:46:00 UTC, forkit wrote:
 [..]
 -profile=implicitConversions
Nit: I think it should be `-vimplicit-conversions`, as the other diagnostic switches start with `-v` (`-vtls`, `-vtemplates`, `-vgc`, etc). You can see the docs here: https://dlang.org/dmd-linux.html
Jan 28 2022
prev sibling next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html

 There are a lot of aspects of C++ that are not well understood 
 and lead to all sorts of confusion. The *usual arithmetic 
 conversions* and the *integral promotions* are two such 
 aspects. [...] This is one of the areas in C++ that comes 
 directly from C, so pretty much all of these examples applies 
 to C as well as C++.
Unfortunately, this is also one of the areas of D that comes directly from C, so D programmers have to watch out for these as well. It's been argued in the past, on these forums, that these conversions are "just something you have to learn" if you want to do system-level programming. But if C++ programmers are still getting this stuff wrong, after all these years, perhaps the programmers aren't the problem. Is it possible that these implicit conversions are just too inherently error-prone for programmers to reliably use correctly?
Not surprisingly, the system programming languages outside C family tend to go with explicit conversions.
Jan 28 2022
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html
For what it is worth, the first exemple is fixed in D. The multiplication one is particularly interesting. The other ones really banal wrap around behavior, which your really can't do without if youw ant to make anything fast, IMO. I'm surprised there aren't more exemple of signed -> unsigned conversion because that one is a real mind fuck.
Jan 28 2022
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 28 January 2022 at 18:39:54 UTC, deadalnix wrote:
 The other ones really banal wrap around behavior, which your 
 really can't do without if youw ant to make anything fast, IMO.
Modern programing languages tend to have separate operators or intrinsincs for wrapped and non-wrapped (trap on overflow) arithmetic operations. In the vast majority of cases having an arithmetic overflow is a bug in the code. And being able to catch such bugs is as useful as having bounds checking for arrays. I think that it's only a matter of time until processors start adding the missing instructions to make this fast. That's a typical chicken/egg problem. If Rust language becomes really popular, then the hardware will adapt.
Jan 28 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 29 January 2022 at 01:06:45 UTC, Siarhei Siamashka 
wrote:
 Modern programing languages tend to have separate operators or 
 intrinsincs for wrapped and non-wrapped (trap on overflow) 
 arithmetic operations.
Well, in C++ "unsigned int" is defined to have modular arithmetics. The problem in C++ isn't that it is modular, but that people use "unsigned int" as a value range type, rather than when they want modular arithmetics. This is just a design influenced by classic machine language instructions rather than principled thoughts about math. This problem would probably have been much less if the type had been named "modular int" and not "unsigned int"! Syntax matters. :-D In Ada I believe this is done explicitly, which clearly is much better. As for practical problems: In C++ you can still enable signed overflow checks. In D you cannot even do that, because in D all integer operations are defined to be modular. Interestingly, the main complaint about modular arithmetics in C++ is not about correctness however, but about poor optimizations. As a result, thoughtfully designed C++ libraries avoid using unsigned integers in interfaces. Why are modular arithmetics performing worse than regular arithmetics? It is often much more difficult (in some cases impossible) to optimize computations that are mapped onto a circle than computations that are mapped onto a straight line! This is something that D should fix!!
 I think that it's only a matter of time until processors start 
 adding the missing instructions to make this fast.
No. It isn't crazy slow because of the check. It is crazy slow because it prevents optimizations in complex expressions. In theory you could compute the overflow as a separate expression and do speculative computations, then switch to a slow path on overflow, but that would be more of a high level approach than a system level approach. In low level programming the programmer wants the code to map to machine language instructions without blowing up the code size in ways that are hard to predict. You want some transparency in how the code you write maps to the hardware instructions. To make overflow checks really cast you need a much more advanced type system with constraints, so that the compiler can know what values an integer picked up from the heap can have.
Jan 29 2022
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 29 January 2022 at 10:39:10 UTC, Ola Fosheim Grøstad 
wrote:
 As for practical problems: In C++ you can still enable signed 
 overflow checks. In D you cannot even do that, because in D all 
 integer operations are defined to be modular.
Yes, this is a problem and C++ is doing a somewhat better job here. Imagine an alternative reality, where processors supported efficient modular indexing in arrays and had special instructions for this (specialized DSP processors might have this even now in some form). So that `a[i % a.length]` has no performance disadvantages compared to `a[i]`. My guess is that the D language design in this alternative reality would define arrays indexing as a modular operation and consider the "memory safety" goal perfectly achieved (no out of bounds accesses, yay!). Ignoring the fact that landing on a wrong array index is a source of all kind of hard to debug problems, leading to wrong computations or even security vulnerabilities.
 Interestingly, the main complaint about modular arithmetics in 
 C++ is not about correctness however, but about poor 
 optimizations.
I'm not sure if I can fully agree with that. Correctness is a major concern in C++.
 As a result, thoughtfully designed C++ libraries avoid using 
 unsigned integers in interfaces.
My understanding is that the primary source of unsigned types in applications is (or at least used to be) the `size_t` type. Which exists, because a memory buffer may technically span over more than half of the address space, at least on a 32-bit system. API functions, which process memory buffers, have to deal with `size_t` and this is a headache. But now with 64-bit processors, this is less of an issue and buffer sizes can become signed with no practical loss of functionality.
 Why are modular arithmetics performing worse than regular 
 arithmetics? It is often much more difficult (in some cases 
 impossible) to optimize computations that are mapped onto a 
 circle than computations that are mapped onto a straight line!
Yes, things are a bit tricky here. Modular arithmetics has some really nice math properties: https://math.stackexchange.com/questions/27336/associativity-commutativity-and-distributivity-of-modulo-arithmetic When doing modular arithmetics, an expression `a + b - c` can be always safely transformed into `a - c + b`. But if we are dealing with potential integer overflow traps, then reordering operations in an expression can't be done safely anymore (will `a + b` overflow and cause an exception? or will `a - c` underflow and cause an exception?).
 This is something that D should fix!!
Do you have a good suggestion?
 I think that it's only a matter of time until processors start 
 adding the missing instructions to make this fast.
No. It isn't crazy slow because of the check. It is crazy slow because it prevents optimizations in complex expressions. In theory you could compute the overflow as a separate expression and do speculative computations, then switch to a slow path on overflow, but that would be more of a high level approach than a system level approach. In low level programming the programmer wants the code to map to machine language instructions without blowing up the code size in ways that are hard to predict. You want some transparency in how the code you write maps to the hardware instructions.
We lost this transparency a long time ago. Compilers are allowed to optimize out big parts of expressions. Integer divisions by a constant are replaced by multiplications and shifts, etc. Functions are inlined, loops are unrolled and/or vectorized.
 To make overflow checks really cast you need a much more 
 advanced type system with constraints, so that the compiler can 
 know what values an integer picked up from the heap can have.
Well, the reality is that this is not just a theoretical discussion anymore. Trapping of arithmetic overflows already exists in the existing programming languages. And programming languages will keep evolving to handle it even better in the future.
Jan 29 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 29 January 2022 at 12:23:28 UTC, Siarhei Siamashka 
wrote:
 performance disadvantages compared to `a[i]`. My guess is that 
 the D language design in this alternative reality would define 
 arrays indexing as a modular operation and consider the "memory 
 safety" goal perfectly achieved (no out of bounds accesses, 
 yay!).
Yes, the focus on memory safety is too narrow and can be counter productive.
 I'm not sure if I can fully agree with that. Correctness is a 
 major concern in C++.
It is a concern, but the driving force for switching from unsigned int to signed int appears to be more about enabling optimization. At least that is my impression. Another issue is that having multiple integer types can lead to multiple instances of templates, which is kinda pointless.
 My understanding is that the primary source of unsigned types 
 in applications is (or at least used to be) the `size_t` type. 
 Which exists, because a memory buffer may technically span over 
 more than half of the address space, at least on a 32-bit 
 system.
Yes, *sizeof* returns *size_t* which is unsigned. I think that has more to do with history than practical programming. But there is no reason for modern containers to return unsigned (or rather modular ints). Well, other than being consistent with STL, but not sure why that would be important.
 This is something that D should fix!!
Do you have a good suggestion?
How many programs rely on signed wrap-over? Probably none. You could just make a breaking change and provide a compiler flag for getting the old behaviour? Then another compiler flag for trapping on overflow.
 We lost this transparency a long time ago. Compilers are 
 allowed to optimize out big parts of expressions. Integer 
 divisions by a constant are replaced by multiplications and 
 shifts, etc. Functions are inlined, loops are unrolled and/or 
 vectorized.
But you can control most of those by hints in the source-code or compilation flags. There is a big difference between optimizations that lead to faster code (or at least code with consistent performance) and code gen that leads to uneven performance. Sometimes hardware is bad at consistent performance too, like computations with float values near zero (denormal numbers), and that is very unpopular among realtime/audio-programmers. You don't want the compiler to add more such issues.
 Well, the reality is that this is not just a theoretical 
 discussion anymore. Trapping of arithmetic overflows already 
 exists in the existing programming languages. And programming 
 languages will keep evolving to handle it even better in the 
 future.
Yes, but trapping overflows have always existed in languages geared towards higher level programming. C and its descendants are the outliers. It is true though that processor speed and branch-prediction has made it more attractive also for those that aim at lower level programming. The best solution for a modern language is probably to: 1. Improve the type system so that the compiler more often can prove that overflow never can happen for an expression. This can also lead to better optimizations. 2. Make signed overflow checks the default, but provide an inline annotation to disable it. I think in general that optimizing all code paths for performance is kinda pointless. Usually critical performance is limited to a smaller set of functions.
Jan 30 2022
prev sibling parent Nick Treleaven <nick geany.org> writes:
On Saturday, 29 January 2022 at 12:23:28 UTC, Siarhei Siamashka 
wrote:
 So that `a[i % a.length]` has no performance disadvantages 
 compared to `a[i]`. My guess is that the D language design in 
 this alternative reality would define arrays indexing as a 
 modular operation and consider the "memory safety" goal 
 perfectly achieved (no out of bounds accesses, yay!). Ignoring 
 the fact that landing on a wrong array index is a source of all 
 kind of hard to debug problems, leading to wrong computations 
 or even security vulnerabilities.
I doubt that. There are many things in the design of D and Phobos that prevent bugs that aren't related to memory safety, sometimes even at the expense of efficiency. E.g. initializing integers by default is not necessary for safe, even though in certain unoptimizable cases initialization is unnecessary and slower. Override the default when necessary. As with bound checks by default, this is the best design.
Jan 30 2022
prev sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Saturday, 29 January 2022 at 10:39:10 UTC, Ola Fosheim Grøstad 
wrote:
 As for practical problems: In C++ you can still enable signed 
 overflow checks. In D you cannot even do that, because in D all 
 integer operations are defined to be modular.

 Interestingly, the main complaint about modular arithmetics in 
 C++ is not about correctness however, but about poor 
 optimizations.
I don't buy this, even for a moment, and challenge anyone to demonstrate a significant (say, >5%) change in overall performance for a non-trivial c or c++ program from using -fwrapv.
Jan 30 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 05:06:00 UTC, Elronnd wrote:
 I don't buy this, even for a moment, and challenge anyone to 
 demonstrate a significant (say, >5%) change in overall 
 performance for a non-trivial c or c++ program from using 
 -fwrapv.
In high level template code you get conditionals that always are false, if you dont remove those then the code size will increase. A good optimizer encourage you to write more high level code.
Jan 30 2022
parent reply Elronnd <elronnd elronnd.net> writes:
On Monday, 31 January 2022 at 06:36:37 UTC, Ola Fosheim Grøstad 
wrote:
 In high level template code you get conditionals that always 
 are false, if you dont remove those then the code size will 
 increase. A good optimizer encourage you to write more high 
 level code.
I have no doubt it comes up _at all_. What I am asking is that I do not believe it has an _appreciable_ effect on any real software.
Jan 30 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 07:33:00 UTC, Elronnd wrote:
 I have no doubt it comes up _at all_.  What I am asking is that 
 I do not believe it has an _appreciable_ effect on any real 
 software.
Not if you work around it, but ask yourself: is it a good idea to design your language in such a way that the compiler is unable to remove this: ``` if (x < x + 1) { … } ``` Probably not.
Jan 31 2022
next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Monday, 31 January 2022 at 08:38:28 UTC, Ola Fosheim Grøstad 
wrote:
 Not if you work around it
Maybe I was insufficiently clear. I'm not talking about the case where you work around it, but the case where you leave the 'dead' code in..
 but ask yourself: is it a good idea to design your language
 in such a way that the compiler is unable to remove this
If you use modular arithmetic, then yes, you should not permit the compiler to remove that condition.
Jan 31 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 09:48:42 UTC, Elronnd wrote:
 Maybe I was insufficiently clear.  I'm not talking about the 
 case where you work around it, but the case where you leave the 
 'dead' code in..
It can certainly be a problem in an inner loop. This is like with caches, at some point you hit the threshold for the loop where it it is pushed out of the loop-buffer in the CPU pipeline and then it matters quite a bit. Anyway, the argument you made is not suitable if you create a language that you want to be competitive in system level programming. When programmers get hit, they get hit, and then it is an issue. You can remove many individual optimizations with only a small effect on the average program, but each optimization you remove make you less competitive. Currently most C/C++ code bases are not written in a high level fashion in performance critical functions, but we are moving towards more higher level programming in performance code now that compilers are getting "smarter" and hardware is getting more diverse. The more diverse hardware you have, the more valuable is high quality optimization. Or rather, the cost of tuning code is increasing…
 but ask yourself: is it a good idea to design your language
 in such a way that the compiler is unable to remove this
If you use modular arithmetic, then yes, you should not permit the compiler to remove that condition.
In D you always use modular arithmetics and you also don't have constraints on integers. Thus you get extra bloat. It matters when it matters, and then people ask themselves: why not use language X where this is not an issue?
Jan 31 2022
parent reply Elronnd <elronnd elronnd.net> writes:
On Monday, 31 January 2022 at 10:06:40 UTC, Ola Fosheim Grøstad 
wrote:
 It matters when it matters
Please show me a case where it matters. I already asked for this: show me a case where a large-scale c or c++ application performs appreciably better because signed overflow is UB. It is easy to test this: simply tell gcc or clang -fwrapv and they will stop treating overflow as UB. I will add: does it also not matter when the compiler makes an assumption that does not accord with your intent, causing a bug? I consider this more important than performance.
Jan 31 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 11:00:26 UTC, Elronnd wrote:
 Please show me a case where it matters.  I already asked for 
 this: show me a case where a large-scale c or c++ application 
 performs appreciably better because signed overflow is UB.  It 
 is easy to test this: simply tell gcc or clang -fwrapv and they 
 will stop treating overflow as UB.
I've already answered this. You can say this about most individual optimizations. That does not mean that they don't have an impact when you didn't get them.
 I will add: does it also not matter when the compiler makes an 
 assumption that does not accord with your intent, causing a 
 bug?  I consider this more important than performance.
What does this mean in this context?
Jan 31 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 16:09:33 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 31 January 2022 at 11:00:26 UTC, Elronnd wrote:
 Please show me a case where it matters.  I already asked for 
 this: show me a case where a large-scale c or c++ application 
 performs appreciably better because signed overflow is UB.  It 
 is easy to test this: simply tell gcc or clang -fwrapv and 
 they will stop treating overflow as UB.
I've already answered this. You can say this about most individual optimizations. That does not mean that they don't have an impact when you didn't get them.
That sentence came out wrong. What I meant is that missing an optimization is impactfull where it matters. In this case I pointed out that this is most impactfull in inner loops, but that current C/C++ codebases tend not to use high level programming in performance sensitive functions. Meaning: the whole argument you are presenting is pointless. You will obviously not find conditionals that are always true/false in handt-tuned inner loops. The crux is this: people don't want to hand tune inner loops if they can avoid it!
Jan 31 2022
prev sibling next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Monday, 31 January 2022 at 08:38:28 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 31 January 2022 at 07:33:00 UTC, Elronnd wrote:
 I have no doubt it comes up _at all_.  What I am asking is 
 that I do not believe it has an _appreciable_ effect on any 
 real software.
Not if you work around it, but ask yourself: is it a good idea to design your language in such a way that the compiler is unable to remove this: ``` if (x < x + 1) { … } ``` Probably not.
One such language would be Go[0], it doesn't seem to impact Docker, Kubernetes, gVisor, USB Armory, Android GPU debugger, containerd, TinyGo, as some of the proper systems programming where Go is used despite not being designed as such. [0] - "A compiler may not optimize code under the assumption that overflow does not occur. For instance, it may not assume that x < x + 1 is always true.". Go Language Specification.
Jan 31 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 10:23:59 UTC, Paulo Pinto wrote:
 One such language would be Go[0], it doesn't seem to impact 
 Docker, Kubernetes, gVisor, USB Armory, Android GPU debugger, 
 containerd, TinyGo, as some of the proper systems programming 
 where Go is used despite not being designed as such.
You can still remove it, you just need to assert the condition before you get any side-effects (I/O), but you can delay that test so it occurs outside the loop. There is a difference between a language spec and consequences for what compilers can do.
Jan 31 2022
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Monday, 31 January 2022 at 16:05:23 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 31 January 2022 at 10:23:59 UTC, Paulo Pinto wrote:
 One such language would be Go[0], it doesn't seem to impact 
 Docker, Kubernetes, gVisor, USB Armory, Android GPU debugger, 
 containerd, TinyGo, as some of the proper systems programming 
 where Go is used despite not being designed as such.
You can still remove it, you just need to assert the condition before you get any side-effects (I/O), but you can delay that test so it occurs outside the loop. There is a difference between a language spec and consequences for what compilers can do.
Compilers can do whatever they feel like, except when one doesn't follow "A compiler may not optimize code under the assumption that overflow does not occur" is no longer compliant with the language specification, no matter what. Similarly a Scheme compiler that doesn't do tail call recursion isn't a Scheme proper, as the standard has specific details how tail recursion is required to exist. Naturally on the cowboy land of what ISO says and the holes it leaves for UB and implementation defined on C and C++ compilers is another matter.
Jan 31 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 17:22:42 UTC, Paulo Pinto wrote:
 Compilers can do whatever they feel like, except when one 
 doesn't follow "A compiler may not optimize code under the 
 assumption that overflow does not occur" is no longer compliant 
 with the language specification, no matter what.
If you get the same output/response from the same input then you haven't deviated from the specification. Thus if you have overflow checks on integer artithmetics then this: ``` for(int i=1; i<99999; i++){ int x = next_monotonically_increasing_int_with_no_sideffect(); if (x < x+i){…} } ``` has the same effect as this: ``` int x; for(int i=1; i<99998; i++){ x = next_monotonically_increasing_int_with_no_sideffect(); } assert(x <= maximum_integer_value - 99998); ```
 Similarly a Scheme compiler that doesn't do tail call recursion 
 isn't a Scheme proper, as the standard has specific details how 
 tail recursion is required to exist.
A well written language specification should only specify the requirements for observable behaviour (including memory requirements and interfacing requirements). If it is is observable in Scheme, then it makes sense, otherwise it makes no sense.
Jan 31 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 17:52:17 UTC, Ola Fosheim Grøstad 
wrote:
 ```
 int x;
 for(int i=1; i<99998; i++){
    x = next_monotonically_increasing_int_with_no_sideffect();
 }
 assert(x <= maximum_integer_value - 99998);

 ```
Typo, should have been a "…" in the loop, assuming no sideffects.
Jan 31 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 January 2022 at 18:12:32 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 31 January 2022 at 17:52:17 UTC, Ola Fosheim Grøstad 
 wrote:
 ```
 int x;
 for(int i=1; i<99998; i++){
    x = next_monotonically_increasing_int_with_no_sideffect();
 }
 assert(x <= maximum_integer_value - 99998);

 ```
Typo, should have been a "…" in the loop, assuming no sideffects.
Another typo: the loop termination should remain "i<99999", so to avoid further confusion… It is equivalent to: ``` int x; for(int i=1; i<99999; i++){ x = next_monotonically_increasing_int_with_no_sideffect(); … } assert(x <= maximum_integer_value - 99998); ``` I hope I got it right now… Hm. Of course a more drastic example would be code that test the negated conditional (always false), if you then can deduce the last value of x by computation then you can remove the loop entirely and only keep the assert statement. I don't see how that would break the Go spec.
Jan 31 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 31 January 2022 at 08:38:28 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 31 January 2022 at 07:33:00 UTC, Elronnd wrote:
 I have no doubt it comes up _at all_.  What I am asking is 
 that I do not believe it has an _appreciable_ effect on any 
 real software.
Not if you work around it, but ask yourself: is it a good idea to design your language in such a way that the compiler is unable to remove this: ``` if (x < x + 1) { … } ``` Probably not.
It is a good idea. You can manually optimise that `if` out if performance is important. Manual optimisation is a must to get performant code anyway so not really a big deal. In the opposite case we would have undefined behaviour at ` safe` code. We have array bounds checks for the exact same reason. They do penaltise the performance a bit, but prevent undefined behaviour and can be manually optimised out when performace is more important than memory protection.
Feb 02 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 2 February 2022 at 21:42:43 UTC, Dukc wrote:
 In the opposite case we would have undefined behaviour at 
 ` safe` code.
People in the D community has the wrong understanding of what "undefined behaviour" means in a standard specification… this is getting tiresome, but to state the obvious: it does not mean that the compiler cannot provide guarantees. The fact that C++ choose performance over other options does not make this a necessity. It is a choice, not a consequence.
Feb 02 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 3 February 2022 at 01:26:04 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 2 February 2022 at 21:42:43 UTC, Dukc wrote:
 In the opposite case we would have undefined behaviour at 
 ` safe` code.
People in the D community has the wrong understanding of what "undefined behaviour" means in a standard specification… this is getting tiresome, but to state the obvious: it does not mean that the compiler cannot provide guarantees.
What's your point? Even when a compiler provides more guarantees than the language spec, you should still avoid undefined behaviour if you can. Otherwise you're deliberately making your program non-spec compliant and therefore likely to malfunction with other compilers.
Feb 03 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 16:41:45 UTC, Dukc wrote:
 What's your point? Even when a compiler provides more 
 guarantees than the language spec, you should still avoid 
 undefined behaviour if you can. Otherwise you're deliberately 
 making your program non-spec compliant and therefore likely to 
 malfunction with other compilers.
If you don't get my point, what can I do about it? Modular arithmetics doesn't help at all, it makes it worse. It is better to have a conditional correctly removed than wrongly get it inverted, the latter is disastrous for correctness. So no, undefined behaviour is not worse than defined behaviour when the defined behaviour is the kind of behaviour nobody wants!
Feb 03 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 3 February 2022 at 17:05:01 UTC, Ola Fosheim Grøstad 
wrote:
 Modular arithmetics doesn't help at all, it makes it worse. It 
 is better to have a conditional correctly removed than wrongly 
 get it inverted, the latter is disastrous for correctness.

 So no, undefined behaviour is not worse than defined behaviour 
 when the defined behaviour is the kind of behaviour nobody 
 wants!
Oh now I understand what you're saying. I don't agree though. With overflow at least you can clearly reason what's happening. If compiled code starts to mysteriously disappear when you have overflows there is potential for some very incomprehensible bugs. It probably would not be that bad in the `x < x + 1` example but in the real world you might have careless multiplying of integers, for instance. Lets say I do this: ```d fun(aLongArray[x]); x *= 0x10000; ``` If the array is long enough, with semantics you're advocating the compiler might reason: 1. `x` can't overflow, so it must be 0x7FFF at most before the multicipation. 2. I know `aLongArr` is longer than that, so I can elide the bounds check. Overflows are much less an issue than stuff like that.
Feb 03 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 17:33:35 UTC, Dukc wrote:
 If the array is long enough, with semantics you're advocating 
 the compiler might reason:

 1. `x` can't overflow, so it must be 0x7FFF at most before the 
 multicipation.
 2. I know `aLongArr` is longer than that, so I can elide the 
 bounds check.

 Overflows are much less an issue than stuff like that.
I advocated trapping overflows except where you explicitly disable it. I would also advocate for having both modular operators and operators that clamp.
Feb 03 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 3 February 2022 at 18:07:43 UTC, Ola Fosheim Grøstad 
wrote:
 I advocated trapping overflows except where you explicitly 
 disable it.
I don't know if you meant to do it the C++ way (unsigned overflows normally, signed may do anything on overflow), or some other way. Regardless, probably a bad idea. We could allow undefined behaviour only in ` system` code, and realistically, where would you want integers that behave that way? You're supposed to be a bit desperate before you disable safety features for performance, at that point you have probably already hand-optimised away the code that the compiler could remove for you.
Feb 03 2022
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 20:56:04 UTC, Dukc wrote:
 We could allow undefined behaviour only in ` system` code, and 
 realistically,
How exactly is this relevant for safe?
Feb 03 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 3 February 2022 at 21:01:30 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 3 February 2022 at 20:56:04 UTC, Dukc wrote:
 We could allow undefined behaviour only in ` system` code, and 
 realistically,
How exactly is this relevant for safe?
We cannot allow undefined behaviour in ` safe` code. That means that any integer that would have undefined semantics for overflows could not be used at ` safe`. Well, asserting no overflow would be fine. With a `-release` switch, it'd behave like the c++ signed int. But not otherwise. In fact this is already doable: ```D import core.checkedint; bool check; auto x = mulu(a,b,check); assert(!check); ``` Not sure if the compiler will take advantage of overflow being undefined behaviour here in release mode, though.
Feb 03 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:
 We cannot allow undefined behaviour in ` safe` code.
Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code. No need to overthink this.
 That means that any integer that would have undefined semantics 
 for overflows could not be used at ` safe`.
It can be left to the compiler by the language standard, but still impose generic memory safety requirements on the compiler. Anyway, I tested overflow with -O3, and it did not remove the "bounds check". So there is no reason to believe that the optimization passes cannot be tuned in such a way that the compiler cannot upheld memory safety.
Feb 03 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim Grøstad 
wrote:
 "bounds check". So there is no reason to believe that the 
 optimization passes cannot be tuned in such a way that the 
 compiler cannot upheld memory safety.
Typo: what I tried to say that it is up to the compiler vendor to make sure that the optimization passes are tuned such that it upholds memory safety.
Feb 03 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:
 We cannot allow undefined behaviour in ` safe` code.
Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code.
That is a different solution. Implementation defined != undefined. With the implementation-defined solution, there is the issue that potentially any change may break memory safety. Some other functions memory safety may be depending on correct behaviour of ` safe` function that has an overflowing integer. So you'd have to start defining arbitrary rules on what the compiler can and what it cannot do on overflow. Just saying "preserve memory safety" does not work, because it depends on situation what is necessary for memory safety. Even without that issue, I would not be supportive. D is old and used enough that any changing of overflow semantics of D integers is too disruptive to be worth it.
Feb 03 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 22:12:10 UTC, Dukc wrote:
 With the implementation-defined solution, there is the issue 
 that potentially any change may break memory safety. Some other 
 functions memory safety may be depending on correct behaviour 
 of ` safe` function that has an overflowing integer.
You mean in trusted code, but then you need to be more specific. If it actually was an overflow that same argument would can be made about a wrap-around. Maybe the trusted code did not expect a negative value… If there is an overflow in computing x, then it makes sense that the value of x is an arbitrary bit-pattern constrained to the bit-width. You can constrain it further like that if that turns out to be needed. Of course, this will only be relevant in safe code sections where you disable trapping of overflow.
Feb 03 2022
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 3 February 2022 at 22:12:10 UTC, Dukc wrote:
 On Thursday, 3 February 2022 at 21:36:19 UTC, Ola Fosheim 
 Grøstad wrote:
 On Thursday, 3 February 2022 at 21:23:10 UTC, Dukc wrote:
 We cannot allow undefined behaviour in ` safe` code.
Why not, make it implementation defined, with the requirement that memory safety is upheld by compiled code.
That is a different solution. Implementation defined != undefined.
"implementation defined" means that the vendor must document the semantics. "undefined behaviour" means that the vendor isn't required to document the behaviour, but that does not mean that they are discouraged from doing so. This was introduced in the C language spec to account for situations where hardware has undefined behaviour. Competition between C++ compilers made them exploit this for the most hardcore optimization options.
Feb 03 2022
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 3 February 2022 at 20:56:04 UTC, Dukc wrote:
 On Thursday, 3 February 2022 at 18:07:43 UTC, Ola Fosheim 
 Grøstad wrote:
 I advocated trapping overflows except where you explicitly 
 disable it.
I don't know if you meant to do it the C++ way (unsigned overflows normally, signed may do anything on overflow), or some other way.
I assume "trapping overflows" means something like GCC's `-ftrapv`, where integer overflow causes the program to crash.
Feb 03 2022
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 28 January 2022 at 18:39:54 UTC, deadalnix wrote:
 On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html
For what it is worth, the first exemple is fixed in D.
For C++ this is fixed with warnings, e.g. -Wconversion People can choose their own level of strictness in C++. Of course, if you use templated libraries then you might get warnings you don't want. So having a good syntax for silencing warnings inline is an important language design issue, because then you are encouraged to turn on maximum strictness and just selectively disable it where it is convenient. Warnings with inline-silencing is actually one of the main advantages of using TypeScript over javascript and Python with PyCharm as well. Unfortunately the syntax for warning silencing is often just an afterthought... It should be part of language-design in my opinion.
 The multiplication one is particularly interesting. The other 
 ones really banal wrap around behavior, which your really can't 
 do without if youw ant to make anything fast, IMO.

 I'm surprised there aren't more exemple of signed -> unsigned 
 conversion because that one is a real mind fuck.
I actually can't remember having these issues in my own code, it isn't really all that difficult to get these right. But you have to pay special attention to these things when writing templates, fortunately you can restrict templated parameters based on signedness. The only case that I can remember where this turned into a headache was with signed modulo operator, but it was resolved by writing my own overloaded "modulo(x,y)" function. Of course, if you don't enable warnings you've basically said you are ok with the semantics of a lazy 1970s approach designed to make it possible to write really terse code. People get what they ask for by choosing terseness.
Jan 29 2022
prev sibling parent reply Ivan Kazmenko <gassa mail.ru> writes:
On Friday, 28 January 2022 at 18:39:54 UTC, deadalnix wrote:
 On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 https://shafik.github.io/c++/2021/12/30/usual_arithmetic_confusions.html
For what it is worth, the first exemple is fixed in D. The multiplication one is particularly interesting. The other ones really banal wrap around behavior, which your really can't do without if youw ant to make anything fast, IMO. I'm surprised there aren't more exemple of signed -> unsigned conversion because that one is a real mind fuck.
I wholeheartedly agree with the latter notion. The dreaded signed-to-unsigned conversion has definitely bitten me more than once in D. And not only conversion. Writing something innocent like `auto var = arr.length`, because `auto` types are generally more geared towards later code changes. Then doing arithmetic with `var`... one subtraction, and boom! getting the overflow all of a sudden. At least in 64-bit programs, I don't really see the benefit of the sizes being unsigned anymore. Even C++ has `cont.ssize()` now for signed size. Ivan Kazmenko.
Jan 30 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 30 January 2022 at 17:22:18 UTC, Ivan Kazmenko wrote:
 At least in 64-bit programs, I don't really see the benefit of 
 the sizes being unsigned anymore.  Even C++ has `cont.ssize()` 
 now for signed size.
In C++20 they have provided a function std::ssize(container) that will return a signed integer type. That way you can write templates that assume regular arithmetics and that also will work with old containers. Of course, C++ is a language where you need to learn idioms to program well… which makes it a challenging language to pick up. Smaller languages would be better off with breaking changes I think. (I don't think they have changed the existing containers?)
Jan 31 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 It's been argued in the past, on these forums, that these 
 conversions are "just something you have to learn" if you want 
 to do system-level programming. But if C++ programmers are 
 still getting this stuff wrong, after all these years, perhaps 
 the programmers aren't the problem. Is it possible that these 
 implicit conversions are just too inherently error-prone for 
 programmers to reliably use correctly?
As many downsides as warnings have in general, perhaps this is where we should go for them. Those conversions are probably too common to outright deprecate them. Still, old code would keep compiling but the langauge would still clearly endorse explicit conversions for new code. We probably should not even warn on integer promotion. Code that would explicitly cast on every place where that is done would be incredibly ugly. But we could warn on unsigned/signed conversions. Implicitly conversions to larger integers with same signedness are not an antipattern imo, those can remain.
Feb 02 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2022 2:14 PM, Dukc wrote:
 We probably should not even warn on integer promotion. Code that would 
 explicitly cast on every place where that is done would be incredibly ugly.
It also *causes* bugs. When code gets refactored, and the types change, those forced casts may not be doing what is desired, and can do things like unexpectedly truncating integer values. One of the (largely hidden because it works so well) advances D has over C is Value Range Propagation, where automatic conversions of integers to smaller integers is only done if no bits are lost.
Feb 02 2022
next sibling parent reply Adam Ruppe <destructionator gmail.com> writes:
On Wednesday, 2 February 2022 at 23:27:05 UTC, Walter Bright 
wrote:
 One of the (largely hidden because it works so well) advances D 
 has over C is Value Range Propagation, where automatic 
 conversions of integers to smaller integers is only done if no 
 bits are lost.
D's behavior is worse than C's in actual use. This is a source of constant annoyance when doing anything with the byte and short types. The value range propagation only works inside single expressions and is too conservative to help much in real code.
Feb 02 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2022 3:37 PM, Adam Ruppe wrote:
 D's behavior is worse than C's in actual use.
How?
 The value range propagation only works inside single expressions and is too 
 conservative to help much in real code.
I find it works well. For example, int i; byte b = i & 0xFF; passes without complaint with VRP. As does: ubyte a, b, c; a = b | c;
Feb 02 2022
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright wrote:
 On 2/2/2022 3:37 PM, Adam Ruppe wrote:
 The value range propagation only works inside single 
 expressions and is too conservative to help much in real code.
I find it works well. For example, int i; byte b = i & 0xFF; passes without complaint with VRP.
No, it's doesn't pass: `Error: cannot implicitly convert expression i & 255 of type int to byte`.
 As does:

     ubyte a, b, c;
     a = b | c;
But `a = b + c` is rejected by the compiler. Maybe I'm expecting modular wrap-around arithmetic here? Or maybe I know the possible range of `b` and `c` variables and I'm sure that no overflows are possible? But the compiler requires an explicit cast. Why is it getting in the way? Also if the type is changed to `uint` in the same example, then the compiler is suddenly okay with that and doesn't demand casting to `ulong`. This is inconsistent. You will probably say that it's because of integer promotion and 32-bit size is a special snowflake. But if the intention is to catch bugs at the compilation stage, then adding two ubytes together and adding two uints together isn't very different (both of these operations can potentially overflow). What's the reason to be anal about ubytes? The other modern programming languages can catch arithmetic overflows at runtime. And allow to opt out of these checks in performance critical parts of the code.
Feb 02 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2022 6:25 PM, Siarhei Siamashka wrote:
 On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright wrote:
 On 2/2/2022 3:37 PM, Adam Ruppe wrote:
 The value range propagation only works inside single expressions and is too 
 conservative to help much in real code.
I find it works well. For example,     int i;     byte b = i & 0xFF; passes without complaint with VRP.
No, it's doesn't pass: `Error: cannot implicitly convert expression i & 255 of type int to byte`.
My mistake. b should have been declared as ubyte.
 
 As does:

     ubyte a, b, c;
     a = b | c;
But `a = b + c` is rejected by the compiler.
That's because `b + c` may create a value that does not fit in a ubyte.
 Maybe I'm expecting modular 
 wrap-around arithmetic here? Or maybe I know the possible range of `b` and `c` 
 variables and I'm sure that no overflows are possible? But the compiler
requires 
 an explicit cast. Why is it getting in the way?
Because C bugs where there are hidden truncations to bytes are a problem.
 Also if the type is changed to `uint` in the same example, then the compiler
is 
 suddenly okay with that and doesn't demand casting to `ulong`. This is 
 inconsistent.
It follows the C integral promotion rules. This is for consistent arithmetic behavior with C. You will probably say that it's because of integer promotion and
 32-bit size is a special snowflake. But if the intention is to catch bugs at
the 
 compilation stage, then adding two ubytes together and adding two uints
together 
 isn't very different (both of these operations can potentially overflow).
What's 
 the reason to be anal about ubytes?
We do the best we can. There really is no solution that doesn't have its own issues.
 The other modern programming languages can catch arithmetic overflows at 
 runtime. And allow to opt out of these checks in performance critical parts of 
 the code.
They just have other problems. VRP makes many implicit conversions to bytes safely possible.
Feb 02 2022
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/3/22 12:50 AM, Walter Bright wrote:
 On 2/2/2022 6:25 PM, Siarhei Siamashka wrote:
 On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright wrote:
 On 2/2/2022 3:37 PM, Adam Ruppe wrote:
 The value range propagation only works inside single expressions and 
 is too conservative to help much in real code.
I find it works well. For example,     int i;     byte b = i & 0xFF; passes without complaint with VRP.
No, it's doesn't pass: `Error: cannot implicitly convert expression i & 255 of type int to byte`.
My mistake. b should have been declared as ubyte.
Which is interesting, because this is allowed: ```d int i; ubyte _tmp = i & 0xFF; byte b = _tmp; ``` -Steve
Feb 03 2022
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:
 On 2/2/2022 6:25 PM, Siarhei Siamashka wrote:
 On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright 
 wrote:
 As does:

     ubyte a, b, c;
     a = b | c;
But `a = b + c` is rejected by the compiler.
That's because `b + c` may create a value that does not fit in a ubyte.
And yet: int a, b, c; a = b + c; `b + c` may create a value that does not fit in an int, but instead of the rejecting the code, the compiler accepts it and allows the result to wrap around. The inconsistency is the problem here. Having integer types behave differently depending on their width makes the language harder to learn, and forces generic code to add special cases for narrow integers, like this one in `std.math.abs`: static if (is(immutable Num == immutable short) || is(immutable Num == immutable byte)) return x >= 0 ? x : cast(Num) -int(x); else return x >= 0 ? x : -x; (Source: https://github.com/dlang/phobos/blob/v2.098.1/std/math/algebraic.d#L56-L59)
Feb 03 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/3/2022 8:25 AM, Paul Backus wrote:
 And yet:
 
      int a, b, c;
      a = b + c;
 
 `b + c` may create a value that does not fit in an int, but instead of the 
 rejecting the code, the compiler accepts it and allows the result to wrap
around.
Yup.
 The inconsistency is the problem here. Having integer types behave differently 
 depending on their width makes the language harder to learn,
It's not really that hard - it's about two or three sentences. As long as one understands 2s-complement arithmetic. If one doesn't understand 2s-complement, and assumes it works like 3rd grade arithmetic, I agree it can be baffling. There's really no fix for that other than making the effort to understand 2s-complement. Some noble attempts: Java: disallowed all unsigned types. Wound up having to add that back in as a hack. Python: numbers can grow arbitrarily large without loss of precision. Makes your code slow, though. Javascript: everything is a double precision floating point value! Makes for all kinds of other problems. If there's anything people understand less (a lot less) than 2s-complement, it's floating point.
 and forces generic 
 code to add special cases for narrow integers, like this one in `std.math.abs`:
 
      static if (is(immutable Num == immutable short) || is(immutable Num == 
 immutable byte))
          return x >= 0 ? x : cast(Num) -int(x);
      else
          return x >= 0 ? x : -x;
 
 (Source: 
 https://github.com/dlang/phobos/blob/v2.098.1/std/math/algebraic.d#L56-L59)
That's because adding abs(short) and abs(byte) was a giant mistake. There's good reason these functions never appeared in C. Trying to hide the reality of how computer integer arithmetic works, and how integral promotions work, is a prescription for endless frustration and inevitable failure. If anybody has questions about how 2s complement arithmetic works, and how integral promotions work, I'll be happy to answer them.
Feb 03 2022
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
 On 2/3/2022 8:25 AM, Paul Backus wrote:
 The inconsistency is the problem here. Having integer types 
 behave differently depending on their width makes the language 
 harder to learn,
It's not really that hard - it's about two or three sentences. As long as one understands 2s-complement arithmetic. If one doesn't understand 2s-complement, and assumes it works like 3rd grade arithmetic, I agree it can be baffling.
I don't think this is limited to learning. I don't think programmers with decades of experience with C/C++ has a problem understanding 2s-complement, but it is still creating annoyances and friction. Maybe it is time to acknowledge that most of the D user base use the language for high level programming I would do the following: 1. make 64 bit signed integers with overflow checks the "default" type across the board 2. provide a library type for range-constrained integers that use intrinsic "assume" directives to provide the compiler with information about constraints. This type would choose a storage type that the constrained integer fits in. 3. add some clean syntax for disabling runtime checks where higher speed is required. D could become competitive with that and ARC + local GC. D should try to improve on higher level programming as well as the ability to transition from high level to system level in a metamorphosis like evolution process.
Feb 04 2022
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 4 February 2022 at 09:19:38 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
 On 2/3/2022 8:25 AM, Paul Backus wrote:
 The inconsistency is the problem here. Having integer types 
 behave differently depending on their width makes the 
 language harder to learn,
...
I don't think this is limited to learning. I don't think programmers with decades of experience with C/C++ has a problem understanding 2s-complement, but it is still creating annoyances and friction. ...
Gosling experience at kind of proved otherwise, "In programming language design, one of the standard problems is that the language grows so complex that nobody can understand it. One of the little experiments I tried was asking people about the rules for unsigned arithmetic in C. It turns out nobody understands how unsigned arithmetic in C works. There are a few obvious things that people understand, but many people don't understand it." https://www.artima.com/articles/james-gosling-on-java-may-2001 Then again, maybe Sun lacked enough people with decades of C and C++ experience, and someone with the track record of Gosling across the computing industry does have any clue about what he was talking about.
Feb 04 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 4 February 2022 at 09:45:31 UTC, Paulo Pinto wrote:
 Then again, maybe Sun lacked enough people with decades of C 
 and C++ experience, and someone with the track record of 
 Gosling across the computing industry does have any clue about 
 what he was talking about.
I learned about 1s- and 2s-complement at high school in the context of digital circuits, but I guess that is unusual. Regardless, it might take a decade of system level programming to get good intuition of C semantics, on that we can agree. Maybe we also can agree that most D programmers have no use for that. And well, why should they? The details are primarily useful for very low level trickery and error-prone bit manipulation. With a good standard library this should not be needed often. Also, with the availability of SIMD I find bithacks to be of very low utility. Prior to SIMD I sometimes used unsigned bit hacks to emulate SIMD (for image processing), but that is arcane at this point in time. I only do such things on the rare occasion where I want to create a high precision phasor (oscillator) or treat floats as bit-vectors. Most programmers don't need this knowledge, they just need a good library. Anyways, it is a poor strategy to require C-like proficiency as that actually makes it easier for D programmers to transition to C++! D needs to evolve towards simplicity, that is the main advantage it can obtain over C++ and Rust.
Feb 04 2022
prev sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 4 February 2022 at 09:45:31 UTC, Paulo Pinto wrote:
 One of the little experiments I tried was asking people about 
 the rules for unsigned arithmetic in C.
2's complement is just one part of C's rules. The 2's complement part is relatively easy, but knowing what is promoted to what in C is a bit more involved and easy to get wrong. (and what C does doesn't make a whole lot of sense.)
Feb 04 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 5:18 AM, Adam D Ruppe wrote:
 (and what C does doesn't make a whole lot of sense.)
C was developed on a PDP-11 and the integral promotions rules come about because of the way the -11 instructions work. It's the same for the float=>double promotion rules.
Feb 04 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 04, 2022 at 12:24:37PM -0800, Walter Bright via Digitalmars-d wrote:
 On 2/4/2022 5:18 AM, Adam D Ruppe wrote:
 (and what C does doesn't make a whole lot of sense.)
C was developed on a PDP-11 and the integral promotions rules come about because of the way the -11 instructions work. It's the same for the float=>double promotion rules.
PDP-11 instructions no longer resemble how modern machines work, though. What made sense back then may not necessarily make sense anymore today. T -- To err is human; to forgive is not our policy. -- Samuel Adler
Feb 04 2022
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 12:31 PM, H. S. Teoh wrote:
 PDP-11 instructions no longer resemble how modern machines work, though.
 What made sense back then may not necessarily make sense anymore today.
The PDP-11, no, but the modern machines *definitely* hew to how C works, see my other post in this thread.
Feb 04 2022
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Friday, 4 February 2022 at 20:31:58 UTC, H. S. Teoh wrote:
 On Fri, Feb 04, 2022 at 12:24:37PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 On 2/4/2022 5:18 AM, Adam D Ruppe wrote:
 (and what C does doesn't make a whole lot of sense.)
C was developed on a PDP-11 and the integral promotions rules come about because of the way the -11 instructions work. It's the same for the float=>double promotion rules.
PDP-11 instructions no longer resemble how modern machines work, though. What made sense back then may not necessarily make sense anymore today. T
Instruction sets really don't have changed much.
Feb 16 2022
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
 On 2/3/2022 8:25 AM, Paul Backus wrote:
 The inconsistency is the problem here. Having integer types 
 behave differently depending on their width makes the language 
 harder to learn,
It's not really that hard - it's about two or three sentences.
Two or three sentences here, two or three sentences there--it's not much on its own, I agree, but all these little things add up. And the fact is, C and C++ programmers *do* find these rules difficult to learn and remember in practice. That's why articles like the one that started this discussion are written in the first place.
 There's really no fix for that other than making the effort to 
 understand 2s-complement.
[...]
 Trying to hide the reality of how computer integer arithmetic 
 works, and how integral promotions work, is a prescription for 
 endless frustration and inevitable failure.
2s-complement is "the reality of how computer integer arithmetic works," but there is nothing fundamental or necessary about C's integer promotion rules, and plenty of system-level languages get by without them.
Feb 04 2022
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 04, 2022 at 02:01:43PM +0000, Paul Backus via Digitalmars-d wrote:
 On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
[...]
 There's really no fix for that other than making the effort to
 understand 2s-complement.
[...]
 Trying to hide the reality of how computer integer arithmetic works,
 and how integral promotions work, is a prescription for endless
 frustration and inevitable failure.
2s-complement is "the reality of how computer integer arithmetic works," but there is nothing fundamental or necessary about C's integer promotion rules, and plenty of system-level languages get by without them.
+1. T -- There are 10 kinds of people in the world: those who can count in binary, and those who can't.
Feb 04 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 6:01 AM, Paul Backus wrote:
 Trying to hide the reality of how computer integer arithmetic works, and how 
 integral promotions work, is a prescription for endless frustration and 
 inevitable failure.
2s-complement is "the reality of how computer integer arithmetic works," but there is nothing fundamental or necessary about C's integer promotion rules, and plenty of system-level languages get by without them.
The integral promotion rules came about because of how the PDP-11 instruction set worked, as C was developed on an -11. But this has carried over into modern CPUs. Consider: void tests(short* a, short* b, short* c) { *c = *a * *b; } 0F B7 07 movzx EAX,word ptr [RDI] 66 0F AF 06 imul AX,[RSI] 66 89 02 mov [RDX],AX C3 ret void testi(int* a, int* b, int* c) { *c = *a * *b; } 8B 07 mov EAX,[RDI] 0F AF 06 imul EAX,[RSI] 89 02 mov [RDX],EAX C3 ret You're paying a 3 size byte penalty for using short arithmetic rather than int arithmetic. It's slower, too. Generally speaking, int should be used for most calculations, short and byte for storage. (Modern CPUs have long been deliberately optimized and tuned for C semantics.)
Feb 04 2022
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 The integral promotion rules came about because of how the 
 PDP-11 instruction set worked, as C was developed on an -11. 
 But this has carried over into modern CPUs. Consider:
[...]
 You're paying a 3 size byte penalty for using short arithmetic 
 rather than int arithmetic. It's slower, too.

 Generally speaking, int should be used for most calculations, 
 short and byte for storage.
Sure. That's a reason why I, the programmer, might want to use int instead of short or byte in my code. But if, for whatever reason, I've chosen to use short or byte in spite of the performance penalties, I would rather not have the language second-guess me on that choice.
Feb 04 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/4/22 4:54 PM, Paul Backus wrote:
 On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 The integral promotion rules came about because of how the PDP-11 
 instruction set worked, as C was developed on an -11. But this has 
 carried over into modern CPUs. Consider:
[...]
 You're paying a 3 size byte penalty for using short arithmetic rather 
 than int arithmetic. It's slower, too.

 Generally speaking, int should be used for most calculations, short 
 and byte for storage.
Sure. That's a reason why I, the programmer, might want to use int instead of short or byte in my code. But if, for whatever reason, I've chosen to use short or byte in spite of the performance penalties, I would rather not have the language second-guess me on that choice.
Yeah, the user doesn't care how the compiler does the instructions. They care about the outcome. If they want to assign it back to a byte, they probably don't care about losing the extra precision. Otherwise, they would assign to an int. I don't think anyone is arguing that the result of the operation should be truncated to a byte, even if assigned to an int. -Steve
Feb 04 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 4 February 2022 at 22:11:10 UTC, Steven Schveighoffer 
wrote:
 I don't think anyone is arguing that the result of the 
 operation should be truncated to a byte, even if assigned to an 
 int.
And yet, that's exactly what happens if you use `int` and `long`: int a = int.max; long b = a + 1; writeln(b > 0); // false I think there are reasonable arguments to be made on both sides, but having both behaviors in the same language is a bit of a mess, don't you think?
Feb 04 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/4/22 5:22 PM, Paul Backus wrote:
 On Friday, 4 February 2022 at 22:11:10 UTC, Steven Schveighoffer wrote:
 I don't think anyone is arguing that the result of the operation 
 should be truncated to a byte, even if assigned to an int.
And yet, that's exactly what happens if you use `int` and `long`:     int a = int.max;     long b = a + 1;     writeln(b > 0); // false I think there are reasonable arguments to be made on both sides, but having both behaviors in the same language is a bit of a mess, don't you think?
Yes, I would prefer for the compiler to determine if the overflow is needed, and generate appropriate instructions based on that. The reason this doesn't come up often for int -> long is because a) you aren't usually converting an int-only operation to a long, and b) overflow of an int is rare. -Steve
Feb 04 2022
prev sibling next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 It's slower, too.
Not anymore. And div can be faster on smaller integers.
 You're paying a 3 size byte penalty for using short arithmetic 
 rather than int arithmetic.
1. You are very careful to demonstrate short arithmetic, not byte arithmetic, which is the same size as int arithmetic on x86. 2. Cycle-counting (or byte-counting) is not a sensible approach to language design. It is relevant to language implementation, maybe; and whole-program performance may be relevant to language design; but these sorts of changes are marginal and should not get in the way of correct semantics. 3. Your code example actually does exactly what you suggest--using short arithmetic for storage. It just happens that in this case using short calculations rather than int calculations yields the same result and smaller code. 4. (continued from 3) in a larger, more interesting expression, regardless of language semantics, the compiler will generally be free to use ints for intermediates.
Feb 04 2022
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 2:15 PM, Elronnd wrote:
 On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 It's slower, too.
Not anymore.  And div can be faster on smaller integers.
The code size penalty is still there.
 You're paying a 3 size byte penalty for using short arithmetic rather than int 
 arithmetic.
1. You are very careful to demonstrate short arithmetic, not byte arithmetic, which is the same size as int arithmetic on x86.
The penalty for byte arithmetic is the shortage of registers. Even so, if you're talking about a general solution, not treating bytes differently from shorts, then I only need mention the shorts. Also, Implying I have nefarious motives here is not called for.
 2. Cycle-counting (or byte-counting) is not a sensible approach to language 
 design.  It is relevant to language implementation, maybe; and whole-program 
 performance may be relevant to language design; but these sorts of changes are 
 marginal and should not get in the way of correct semantics.
That's fine unless you're using a systems programming language, where the customers expect performance. Remember the the recent deal with the x87 where dmd would keep the extra precision around, to avoid the double rounding problem? I propagated this to dmc, and it cost me a design win. The customer benchmarked it on 'float' arithmetic, and pronounced dmc 10% slower. The double rounding issue did not interest him.
 3. Your code example actually does exactly what you suggest--using short 
 arithmetic for storage.
The load instructions still use the extra operand size override bytes.
 It just happens that in this case using short 
 calculations rather than int calculations yields the same result and smaller
code.
It's not "just happens". Every short load will incur an extra byte. I compiled it with gcc -O, too, just so nobody will accuse me of sabotaging the result with dmd.
 4. (continued from 3) in a larger, more interesting expression, regardless of 
 language semantics, the compiler will generally be free to use ints for 
 intermediates.
If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short. Seriously, I've been around the block with this for 40 years. There are no magic solutions. The obvious solutions all simply have other problems. The integral promotion rules really are the most practical solution. It's best to simply spend a few moments learning them, and you'll be fine.
Feb 04 2022
next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 4 February 2022 at 23:43:28 UTC, Walter Bright wrote:
 The penalty for byte arithmetic is the shortage of registers.
On 64-bit, there are as many byte registers as word registers. (More, technically, but the high-half registers should be avoided at all costs.)
 Implying I have nefarious motives here is not called for.
Yes. My bad.
 these sorts of changes are marginal and should not get in the 
 way of correct semantics.
That's fine unless you're using a systems programming language, where the customers expect performance.
If a customer wants int ops to be generated, they can use ints. There is nothing preventing them from doing this, as has been pointed out else-thread.
 3. Your code example actually does exactly what you 
 suggest--using short arithmetic for storage.
The load instructions still use the extra operand size override bytes.
I do not follow. Your post said:
 Generally speaking, int should be used for most calculations, 
 short and byte for storage.
How am I to store shorts without an operand-size override prefix?
 It just happens that in this case using short calculations 
 rather than int calculations yields the same result and 
 smaller code.
It's not "just happens". Every short load will incur an extra byte. I compiled it with gcc -O, too, just so nobody will accuse me of sabotaging the result with dmd.
In this case I was referring to the multiply. It was possible to load the second register, perform a 32-bit multiply, and then store the truncated result. In a different context, this might have been worthwhile.
 4. (continued from 3) in a larger, more interesting 
 expression, regardless of language semantics, the compiler 
 will generally be free to use ints for intermediates.
If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short.
Example: ubyte x,y,z,w; w = x + y + z. (((x + y) mod 2^32 mod 2^8) + z) mod 2^32 mod 2^8 is the same as (((x + y) mod 2^32) + z) mod 2^32 mod 2^8. The mod 2^32 are implicit in the use of 32-bit registers; the mod 2^8 are explicit truncation. The former form, with two explicit truncations, can be rewritten as the latter form, getting rid of the intermediate truncation, giving the exact same result as with promotion.
Feb 04 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 3:55 PM, Elronnd wrote:
 On Friday, 4 February 2022 at 23:43:28 UTC, Walter Bright wrote:
 The penalty for byte arithmetic is the shortage of registers.
On 64-bit, there are as many byte registers as word registers. (More, technically, but the high-half registers should be avoided at all costs.)
Access to those byte registers requires an additional REX byte.
 That's fine unless you're using a systems programming language, where the 
 customers expect performance.
If a customer wants int ops to be generated, they can use ints. There is nothing preventing them from doing this, as has been pointed out else-thread.
Actually, they'd need to insert casts to int for subexpressions. This is not going to be appealing.
 3. Your code example actually does exactly what you suggest--using short 
 arithmetic for storage.
The load instructions still use the extra operand size override bytes.
I do not follow.  Your post said:
 Generally speaking, int should be used for most calculations, short and byte 
 for storage.
How am I to store shorts without an operand-size override prefix?
Consider more complex expressions than load and store.
 4. (continued from 3) in a larger, more interesting expression, regardless of 
 language semantics, the compiler will generally be free to use ints for 
 intermediates.
If it does, then you'll have other truncation problems depending on how the optimization of the expression plays out. Unless you went the x87 route and slowed everything down by truncating every subexpression to short.
Example: ubyte x,y,z,w; w = x + y + z. (((x + y) mod 2^32 mod 2^8) + z) mod 2^32 mod 2^8 is the same as (((x + y) mod 2^32) + z) mod 2^32 mod 2^8.  The mod 2^32 are implicit in the use of 32-bit registers; the mod 2^8 are explicit truncation.  The former form, with two explicit truncations, can be rewritten as the latter form, getting rid of the intermediate truncation, giving the exact same result as with promotion.
Consider: byte a, b; int d = a + b; You're going to get surprising results with your proposal.
Feb 04 2022
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 05.02.22 00:43, Walter Bright wrote:
 
 That's fine unless you're using a systems programming language, where 
 the customers expect performance.
 
 Remember the the recent deal with the x87 where dmd would keep the extra 
 precision around, to avoid the double rounding problem?
That does not avoid problems, it's just confusing to users and it will introduce new bugs. It's not even a cure for double-rounding issues! It may even have the opposite effect!
 I propagated 
 this to dmc, and it cost me a design win. The customer benchmarked it on 
 'float' arithmetic, and pronounced dmc 10% slower. The double rounding 
 issue did not interest him.
Sure, it stands to reason that people who are not careful with their floating-point implementations actually do not care. And the weird extra precision is extremely annoying for those that are careful. Less performance, less reproducibility, randomly introducing double-rounding issues in code that would be correct if it did not insist on keeping around more precision in hard-to-predict, implementation defined cases that are not even properly specced out, in exchange for sometimes randomly hiding issues in badly written code. No, thanks. This is terrible! I get that the entire x87 design is pretty bad and so there are trade-offs, but as it has now been deprecated, I hope this kind of second-guessing will become a thing of the past entirely. In the meantime, I will avoid using DMD for anything that requires floating-point arithmetic.
Feb 05 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/5/2022 6:54 AM, Timon Gehr wrote:
 I get that the entire x87 design is pretty bad and so there are trade-offs,
but 
 as it has now been deprecated, I hope this kind of second-guessing will become
a 
 thing of the past entirely. In the meantime, I will avoid using DMD for
anything 
 that requires floating-point arithmetic.
I'm not sure how you concluded that. DMD now rounds float calculations to float with the x87, despite the cost in speed. If the CPU has SIMD float instructions on it, that is used instead of the x87, just like what every other compiler does.
Feb 05 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 05.02.22 23:01, Walter Bright wrote:
 On 2/5/2022 6:54 AM, Timon Gehr wrote:
 I get that the entire x87 design is pretty bad and so there are 
 trade-offs, but as it has now been deprecated, I hope this kind of 
 second-guessing will become a thing of the past entirely. In the 
 meantime, I will avoid using DMD for anything that requires 
 floating-point arithmetic.
I'm not sure how you concluded that.
Maybe my information is outdated. (This has come up many times in the past, and you have traditionally argued in favor of not respecting the specified precision.)
 DMD now rounds float calculations 
 to float with the x87, despite the cost in speed.
 ...
That's great news, but the opposite is still in the spec: https://dlang.org/spec/float.html In any case, AFAIK CTFE still relies on this leeway (in all compilers, as it's a frontend feature).
 If the CPU has SIMD float instructions on it, that is used instead of 
 the x87, just like what every other compiler does.
My current understanding is that this can change at any point in time without it being considered a breaking change, and that DMD is more likely to do this than LDC.
Feb 05 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/5/2022 2:52 PM, Timon Gehr wrote:
 On 05.02.22 23:01, Walter Bright wrote:
 On 2/5/2022 6:54 AM, Timon Gehr wrote:
 I get that the entire x87 design is pretty bad and so there are trade-offs, 
 but as it has now been deprecated, I hope this kind of second-guessing will 
 become a thing of the past entirely. In the meantime, I will avoid using DMD 
 for anything that requires floating-point arithmetic.
I'm not sure how you concluded that.
Maybe my information is outdated. (This has come up many times in the past, and you have traditionally argued in favor of not respecting the specified precision.)
 DMD now rounds float calculations to float with the x87, despite the cost in 
 speed.
 ...
That's great news, but the opposite is still in the spec: https://dlang.org/spec/float.html
That'll be fixed.
 In any case, AFAIK CTFE still relies on this leeway (in all compilers, as it's
a 
 frontend feature).
I don't think it does, but I'll have to check.
 If the CPU has SIMD float instructions on it, that is used instead of the x87, 
 just like what every other compiler does.
My current understanding is that this can change at any point in time without it being considered a breaking change, and that DMD is more likely to do this than LDC.
Highly unlikely. (Neither the C nor the C++ standards require this behavior, either, AFAIK, so you shouldn't use any other compilers, either.)
Feb 05 2022
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
https://issues.dlang.org/show_bug.cgi?id=22740
Feb 05 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06.02.22 00:16, Walter Bright wrote:
 https://issues.dlang.org/show_bug.cgi?id=22740
Thanks! One place where this has now actually bit me is this calculation (DMD on linux): ```d void main(){ import std.stdio; assert(42*6==252); // constant folding, uses extended precision, overall less accurate result due to double rounding: assert(cast(int)(4.2*60)==251); // no constant folding, uses double precision, overall more accurate result double x=4.2; assert(cast(int)(x*60)==252); } ``` 4.2 and 60 were named constants and the program would have worked fine with a result of either 251 or 252, I did not rely on the result being a specific one of those. However, because the result was sometimes 251 and at other times 252, this resulted in a hard to track down bug caused by the inconsistency. I even got one result on Windows and the other one on linux when compiling _exactly the same expression_. This was with LDC though, not sure if the platform dependency is reproducible with DMD. Note that this was relatively recently, but I had seen this coming for a long time before it actually happened to me, which is why I had consistently argued so vehemently against this kind of precision "enhancements".
Feb 05 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
Unfortunately, I ran into a stumbling block. The current 80 bit "emulation"
done 
for Microsoft compatibility doesn't support conversion of 80 bits to float or 
double.

I've been considering for a while writing my own 80 emulator to resolve these 
problems once and for all, but it's a bit of a project. It's not that hard, it 
just takes some careful attention to detail.

A search online showed no Boost compatible emulators, which kinda surprised me. 
After all these years, you'd think there would be one.
Feb 05 2022
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06.02.22 00:04, Walter Bright wrote:
 ...
 My current understanding is that this can change at any point in time 
 without it being considered a breaking change, and that DMD is more 
 likely to do this than LDC.
Highly unlikely.
Great!
 (Neither the C nor the C++ standards require this 
 behavior, either, AFAIK, so you shouldn't use any other compilers, either.)
In practice, the story is a bit more complicated than this. Besides the C and C++ standards, there is also IEEE 754 and common practice, in particular 32/64 bit IEEE 754. Compilers implement multiple standards, at least with a suitable set of flags, and they explicitly document the guarantees one can expect with each set of flags.
Feb 05 2022
prev sibling parent Max Samukha <maxsamukha gmail.com> writes:
On Friday, 4 February 2022 at 22:15:37 UTC, Elronnd wrote:
 On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 It's slower, too.
Not anymore. And div can be faster on smaller integers.
 You're paying a 3 size byte penalty for using short arithmetic 
 rather than int arithmetic.
1. You are very careful to demonstrate short arithmetic, not byte arithmetic, which is the same size as int arithmetic on x86.
Interestingly, for bytes, the code is even smaller (ldc): 0000000000000000 <_D4main5testbFPhQcQeZv>: 0: 8a 02 mov (%rdx),%al 2: 41 f6 20 mulb (%r8) 5: 88 01 mov %al,(%rcx) 7: c3 ret 0000000000000000 <_D4main5testiFPiQcQeZv>: 0: 8b 02 mov (%rdx),%eax 2: 41 0f af 00 imul (%r8),%eax 6: 89 01 mov %eax,(%rcx) 8: c3 ret Also, there is no difference in size for ARM64: testb: ldrb w0, [x0] ldrb w1, [x1] mul w0, w0, w1 strb w0, [x2] ret tests: ldrh w0, [x0] ldrh w1, [x1] mul w0, w0, w1 strh w0, [x2] ret testi: ldr w0, [x0] ldr w1, [x1] mul w0, w0, w1 str w0, [x2] ret
Feb 05 2022
prev sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 4 February 2022 at 21:13:10 UTC, Walter Bright wrote:
 The integral promotion rules came about because of how the 
 PDP-11 instruction set worked, as C was developed on an -11. 
 But this has carried over into modern CPUs. Consider:

 ```
 void tests(short* a, short* b, short* c) { *c = *a * *b; }
         0F B7 07                movzx   EAX,word ptr [RDI]
 66      0F AF 06                imul    AX,[RSI]
 66      89 02                   mov     [RDX],AX
         C3                      ret

 void testi(int* a, int* b, int* c) { *c = *a * *b; }
         8B 07                   mov     EAX,[RDI]
         0F AF 06                imul    EAX,[RSI]
         89 02                   mov     [RDX],EAX
         C3                      ret
 ```
 You're paying a 3 size byte penalty for using short arithmetic 
 rather than int arithmetic. It's slower, too.
Larger code size is surely more stressful for the instructions cache, but the slowdown caused by this is most likely barely measurable on modern processors.
 Generally speaking, int should be used for most calculations, 
 short and byte for storage.

 (Modern CPUs have long been deliberately optimized and tuned 
 for C semantics.)
I generally agree, but this is only valid for the regular scalar code. Autovectorizable code taking advantage of SIMD instructions looks a bit different. Consider: void tests(short* a, short* b, short* c, int n) { while (n--) *c++ = *a++ * *b++; } <...> 50: f3 0f 6f 04 07 movdqu (%rdi,%rax,1),%xmm0 55: f3 0f 6f 0c 06 movdqu (%rsi,%rax,1),%xmm1 5a: 66 0f d5 c1 pmullw %xmm1,%xmm0 5e: 0f 11 04 02 movups %xmm0,(%rdx,%rax,1) 62: 48 83 c0 10 add $0x10,%rax 66: 4c 39 c0 cmp %r8,%rax 69: 75 e5 jne 50 <tests+0x50> <...> 7 instructions, which are doing 8 multiplications per inner loop iteration. void testi(int* a, int* b, int* c, int n) { while (n--) *c++ = *a++ * *b++; } <...> 188: f3 0f 6f 04 07 movdqu (%rdi,%rax,1),%xmm0 18d: f3 0f 6f 0c 06 movdqu (%rsi,%rax,1),%xmm1 192: 66 0f 38 40 c1 pmulld %xmm1,%xmm0 197: 0f 11 04 02 movups %xmm0,(%rdx,%rax,1) 19b: 48 83 c0 10 add $0x10,%rax 19f: 4c 39 c0 cmp %r8,%rax 1a2: 75 e4 jne 188 <testi+0x48> <...> 7 instructions, which are doing 4 multiplications per inner loop iteration. The code size increases really a lot, because there are large prologue and epilogue parts before and after the inner loop. But the performance improves really a lot when processing large arrays. And the 16-bit version is roughly twice faster than the 32-bit version (because each 128-bit XMM register represents either 8 shorts or 4 ints). If we want D language to be SIMD friendly, then discouraging the use of `short` and `byte` types for local variables isn't the best idea.
Feb 04 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 2:18 PM, Siarhei Siamashka wrote:
 If we want D language to be SIMD friendly, then discouraging the use of
`short` 
 and `byte` types for local variables isn't the best idea.
SIMD is its own world, and why D has vector types as a core language feature. I never had much faith in autovectorization.
Feb 04 2022
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 4 February 2022 at 23:45:31 UTC, Walter Bright wrote:
 On 2/4/2022 2:18 PM, Siarhei Siamashka wrote:
 If we want D language to be SIMD friendly, then discouraging 
 the use of `short` and `byte` types for local variables isn't 
 the best idea.
SIMD is its own world, and why D has vector types as a core language feature. I never had much faith in autovectorization.
I don't have much faith in autovectorization quality either, but this feature is provided by free by GCC and LLVM backends. And right now excessively paranoid errors about byte/short variables coerce the users into one of these two unattractive alternatives: * litter the code with ugly casts * change types of temporary variables to ints and waste some vectorization opportunities When the signal/noise ratio is bad, then it's natural that the users start ignoring error messages. Beginners are effectively trained to apply casts without thinking just to shut up the annoying compiler and it leads to situations like this: https://forum.dlang.org/thread/uqeobimtzhuyhvjpvkvz forum.dlang.org Is see VRP as just a band-aid, which helps very little, but causes a lot of inconveniences. My suggestion: 1. Implement `wrapping_add`, `wrapping_sub`, `wrapping_mul` intrinsics similar to Rust, this is easy and costs nothing. 2. Implement an experimental `-ftrapv` option in one of the D compilers (most likely GDC or LDC) to catch both signed and unsigned overflows at runtime. Or maybe add function attributes to enable/disable this functionality with a more fine grained control. Yes, I know that this violates the current D language spec, which requires two's complement wraparound for everything, but it doesn't matter for a fancy experimental option. 3. Run some tests with `-ftrapv` and check how many arithmetic overflows are actually triggered in Phobos. Replace the affected arithmetic operators with intrinsics if the wrapping behavior is actually intended. 4. In the long run consider updating the language spec. Benefits: even if `-ftrapv` turns out to have a high overhead, this would still become a useful tool for testing arithmetic overflows safety in applications. Having something is better than having nothing.
Feb 04 2022
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 6:35 PM, Siarhei Siamashka wrote:
 On Friday, 4 February 2022 at 23:45:31 UTC, Walter Bright wrote:
 On 2/4/2022 2:18 PM, Siarhei Siamashka wrote:
 If we want D language to be SIMD friendly, then discouraging the use of 
 `short` and `byte` types for local variables isn't the best idea.
SIMD is its own world, and why D has vector types as a core language feature. I never had much faith in autovectorization.
I don't have much faith in autovectorization quality either, but this feature is provided by free by GCC and LLVM backends. And right now excessively paranoid errors about byte/short variables coerce the users into one of these two unattractive alternatives:  * litter the code with ugly casts  * change types of temporary variables to ints and waste some vectorization opportunities
Generally one should use the vector types rather than relying on autovectorization. One of the problems with autovectorization is never knowing that some minor change you made prevented vectorizing.
 When the signal/noise ratio is bad, then it's natural that the users start 
 ignoring error messages. Beginners are effectively trained to apply casts 
 without thinking just to shut up the annoying compiler and it leads to 
 situations like this: 
 https://forum.dlang.org/thread/uqeobimtzhuyhvjpvkvz forum.dlang.org
That has nothing to do with integers.
 Is see VRP as just a band-aid, which helps very little, but causes a lot of 
 inconveniences.
Certainly allowing implicit conversions of ints to shorts is *convenient*. But you cannot have that *and* safe integer math. As I mentioned repeatedly, there is no solution that is fast, convenient, and doesn't hide mistakes.
 My suggestion:
 
   1. Implement `wrapping_add`, `wrapping_sub`, `wrapping_mul` intrinsics
similar 
 to Rust, this is easy and costs nothing.
   2. Implement an experimental `-ftrapv` option in one of the D compilers
(most 
 likely GDC or LDC) to catch both signed and unsigned overflows at runtime. Or 
 maybe add function attributes to enable/disable this functionality with a more 
 fine grained control. Yes, I know that this violates the current D language 
 spec, which requires two's complement wraparound for everything, but it
doesn't 
 matter for a fancy experimental option.
   3. Run some tests with `-ftrapv` and check how many arithmetic overflows
are 
 actually triggered in Phobos. Replace the affected arithmetic operators with 
 intrinsics if the wrapping behavior is actually intended.
   4. In the long run consider updating the language spec.
 
 Benefits: even if `-ftrapv` turns out to have a high overhead, this would
still 
 become a useful tool for testing arithmetic overflows safety in applications. 
 Having something is better than having nothing.
Feb 05 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 6:35 PM, Siarhei Siamashka wrote:
 My suggestion:
 
   1. Implement `wrapping_add`, `wrapping_sub`, `wrapping_mul` intrinsics
similar 
 to Rust, this is easy and costs nothing.
   2. Implement an experimental `-ftrapv` option in one of the D compilers
(most 
 likely GDC or LDC) to catch both signed and unsigned overflows at runtime. Or 
 maybe add function attributes to enable/disable this functionality with a more 
 fine grained control. Yes, I know that this violates the current D language 
 spec, which requires two's complement wraparound for everything, but it
doesn't 
 matter for a fancy experimental option.
   3. Run some tests with `-ftrapv` and check how many arithmetic overflows
are 
 actually triggered in Phobos. Replace the affected arithmetic operators with 
 intrinsics if the wrapping behavior is actually intended.
   4. In the long run consider updating the language spec.
 
 Benefits: even if `-ftrapv` turns out to have a high overhead, this would
still 
 become a useful tool for testing arithmetic overflows safety in applications. 
 Having something is better than having nothing.
I recommend creating a DIP for it.
Feb 05 2022
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 5 February 2022 at 08:59:22 UTC, Walter Bright wrote:
 On 2/4/2022 6:35 PM, Siarhei Siamashka wrote:
 My suggestion:
 
   1. Implement `wrapping_add`, `wrapping_sub`, `wrapping_mul` 
 intrinsics similar to Rust, this is easy and costs nothing.
   2. Implement an experimental `-ftrapv` option in one of the 
 D compilers (most likely GDC or LDC) to catch both signed and 
 unsigned overflows at runtime. Or maybe add function 
 attributes to enable/disable this functionality with a more 
 fine grained control. Yes, I know that this violates the 
 current D language spec, which requires two's complement 
 wraparound for everything, but it doesn't matter for a fancy 
 experimental option.
   3. Run some tests with `-ftrapv` and check how many 
 arithmetic overflows are actually triggered in Phobos. Replace 
 the affected arithmetic operators with intrinsics if the 
 wrapping behavior is actually intended.
   4. In the long run consider updating the language spec.
 
 Benefits: even if `-ftrapv` turns out to have a high overhead, 
 this would still become a useful tool for testing arithmetic 
 overflows safety in applications. Having something is better 
 than having nothing.
I recommend creating a DIP for it.
Thanks for not outright rejecting it. This really means a lot! I'll look into the DIP submission process. Accidentally or not, turns out that GDC already supports `-ftrapv` option. Which works with C/C++ semantics (traps for signed overflows, wraparound for unsigned overflows, types smaller than `int` are flying under the radar due to integral promotion). Now I need to experiment with it a little bit to check how it interacts with Phobos and the other D code in practice. Patching up GCC sources to test if unsigned overflows can be also trapped is going to be interesting too. But in general, this looks like a very promising feature. It can provide some protection against arithmetic overflow bugs for 32-bit and 64-bit calculations. And the practical implications of troubleshooting such arithmetic overflows in large and complicated software was one of my primary concerns about D language.
Feb 05 2022
prev sibling parent reply Mark <smarksc gmail.com> writes:
On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
 There's really no fix for that other than making the effort to 
 understand 2s-complement. Some noble attempts:

 Java: disallowed all unsigned types. Wound up having to add 
 that back in as a hack.
How many people actually use (and need) unsigned integers? If 99% of users don't need them, that's a good case for relegating them to a library type. This wasn't possible in Java because it doesn't support operator overloading, without which dealing with such types would have been quite annoying.
Feb 04 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 04, 2022 at 08:50:35PM +0000, Mark via Digitalmars-d wrote:
 On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright wrote:
 There's really no fix for that other than making the effort to
 understand 2s-complement. Some noble attempts:
 
 Java: disallowed all unsigned types. Wound up having to add that
 back in as a hack.
How many people actually use (and need) unsigned integers?
I do. They are very useful in APIs where I expect only positive values. Marking the parameter type as uint makes it clear exactly what's expected, instead of using circumlocutions like taking int with an in-contract that x>=0. Also, when you're dealing with bitmasks, you WANT unsigned types. Using signed types for that will cause values to get munged by unwanted sign extensions, and in general just cause grief and needless complexity where an unsigned type would be completely straightforward. Also, for a systems programming language unsigned types are necessary, because they are a closer reflection of the reality at the hardware level.
 If 99% of users don't need them, that's a good case for relegating
 them to a library type.  This wasn't possible in Java because it
 doesn't support operator overloading, without which dealing with such
 types would have been quite annoying.
Needing a library type for manipulating bitmasks would make D an utter joke of a systems programming language. T -- First Rule of History: History doesn't repeat itself -- historians merely repeat each other.
Feb 04 2022
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Amusingly, how signed division is done when a signed divide instruction is not 
available is to save the signs of the operands, negate them to unsigned, do the 
unsigned divide, then negate the result according to the original signs.

Unsigned operations are the core of how CPUs work, the signed computations are 
another layer on top of that.
Feb 04 2022
prev sibling parent reply Mark <smarksc gmail.com> writes:
On Friday, 4 February 2022 at 21:27:44 UTC, H. S. Teoh wrote:
 On Fri, Feb 04, 2022 at 08:50:35PM +0000, Mark via 
 Digitalmars-d wrote:
 On Friday, 4 February 2022 at 04:28:37 UTC, Walter Bright 
 wrote:
 There's really no fix for that other than making the effort 
 to understand 2s-complement. Some noble attempts:
 
 Java: disallowed all unsigned types. Wound up having to add 
 that back in as a hack.
How many people actually use (and need) unsigned integers?
I do. They are very useful in APIs where I expect only positive values. Marking the parameter type as uint makes it clear exactly what's expected, instead of using circumlocutions like taking int with an in-contract that x>=0. Also, when you're dealing with bitmasks, you WANT unsigned types. Using signed types for that will cause values to get munged by unwanted sign extensions, and in general just cause grief and needless complexity where an unsigned type would be completely straightforward. Also, for a systems programming language unsigned types are necessary, because they are a closer reflection of the reality at the hardware level.
 If 99% of users don't need them, that's a good case for 
 relegating them to a library type.  This wasn't possible in 
 Java because it doesn't support operator overloading, without 
 which dealing with such types would have been quite annoying.
Needing a library type for manipulating bitmasks would make D an utter joke of a systems programming language. T
I should have phrased my question as "how many people outside systems programming...", as this is what I had in mind (I mostly write high-level code, though I don't know if I'm the typical D user). But since D is proudly general-purpose I admit that this question is moot. Regarding positive values, AFAIK unsigned ints aren't suitable for this because you still want to do ordinary arithmetic on positive integers, not modular arithmetic. Runtime checks are unavoidable because even mundane operations such as `--x` can potentially escape the domain of positive integers. Also, I don't think being a library type is a mark of shame. Depending on the language, they can be just as useful and almost as convenient as built-in types. C++'s std::byte was mentioned on this thread - it's a library type.
Feb 05 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 5 February 2022 at 10:11:48 UTC, Mark wrote:
 Also, I don't think being a library type is a mark of shame. 
 Depending on the language, they can be just as useful and 
 almost as convenient as built-in types. C++'s std::byte was 
 mentioned on this thread - it's a library type.
Is it a library type though? I am not sure there is clear distinction between language and library in C++. So you can have "library features" that are implemented using intrinsics, which might make them language-features if they cannot be done within the language in a portable fashion. It is kinda hard to tell sometimes, maybe the huge spec makes it more clear, but it is at least not obvious to me as a programmer.
Feb 05 2022
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:
 VRP makes many implicit conversions to bytes safely possible.
It also *causes* bugs. When code gets refactored, and the types change, those forced casts may not be doing what is desired, and can do things like unexpectedly truncating integer values.
Feb 03 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/3/2022 8:35 AM, Adam D Ruppe wrote:
 On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:
 VRP makes many implicit conversions to bytes safely possible.
It also *causes* bugs. When code gets refactored, and the types change, those forced casts may not be doing what is desired, and can do things like unexpectedly truncating integer values.
No, then the VRP will emit an error.
Feb 03 2022
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 4 February 2022 at 04:29:21 UTC, Walter Bright wrote:
 No, then the VRP will emit an error.
No, because you casted it away. Consider the old code being: --- struct Thing { short a; } // somewhere very different Thing calculate(int a, int b) { return Thing(a + b); } --- The current rules would require that you put an explicit cast in that constructor call. Then, later, Thing gets refactored into `int`. It will still compile, with the explicit cast still there, now chopping off bits. The problem with anything requiring explicit casts is once they're written, they rarely get unwritten. I tell new users that `cast` is a code smell - somethings you need it, but it is usually an indication that you're doing something wrong. But then you do: short a; short b = a + 1; And suddenly the language requires one. Yes, I know, there's a carry bit that might get truncated. But when you're using all `short`, there's probably an understanding that this is how it works. It's not really that hard - it's about two or three sentences. As long as one understands 2s-complement arithmetic. On the other hand, there might be loss if there's an integer in there in some kinds of generic code. I think a reasonable compromise would be to allow implicit conversions down to the biggest type of the input. The VRP can apply here on any literals present. Meaning: short a; short b = a + 1; It checks the input: a = type short 1 = VRP'd down to byte (or bool even) Biggest type there? short. So it allows implicit conversion down to short. then VRP can run to further make it smaller: byte c = (a&0x7e) + 1; // ok the VRP can see it still fits there, so it goes even smaller. But since the biggest original input fits in a `short`, it allows the output to go to `short`, even if there's a carry bit it might lose. On the other hand: ushort b = a + 65535 + 3; Nope, the compiler can constant fold that literal and VRP will size it to `int` given its value, so explicit cast required there to ensure none of the *actual* input is lost. short a; short b; short c = a * b; I'd allow that. The input is a and b, they're both short, so let the output truncate back to short implicitly too. Just like with int, there's some understanding that yes, there is a high word produced by the multiply, but it might not fit and I don't need the compiler nagging me like I'm some kind of ignoramus. This compromise I think would balance the legitimate safety concerns with accidental loss or refactoring changing things (if you refactor to ints, now the input type grows and the compiler can issue an error again) with the annoying casts almost everywhere. And by removing most the casts, it makes the ones that remain stand out more as the potential problems they are.
Feb 04 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 6:12 AM, Adam D Ruppe wrote:
 I'd allow that. The input is a and b, they're both short, so let the output 
 truncate back to short implicitly too. Just like with int, there's some 
 understanding that yes, there is a high word produced by the multiply, but it 
 might not fit and I don't need the compiler nagging me like I'm some kind of 
 ignoramus.
As I observed before, there is no solution. Just different problems. It's best to stick with a scheme that has well-understood issues and works best with the common CPU architectures.
Feb 04 2022
next sibling parent reply Adam Ruppe <destructionator gmail.com> writes:
On Friday, 4 February 2022 at 23:25:14 UTC, Walter Bright wrote:
 On 2/4/2022 6:12 AM, Adam D Ruppe wrote:
 I'd allow that. The input is a and b, they're both short, so 
 let the output truncate back to short implicitly too. Just 
 like with int, there's some understanding that yes, there is a 
 high word produced by the multiply, but it might not fit and I 
 don't need the compiler nagging me like I'm some kind of 
 ignoramus.
As I observed before, there is no solution. Just different problems. It's best to stick with a scheme that has well-understood issues and works best with the common CPU architectures.
I don't think you understand my proposal, which is closer to C's existing rules than D is now.
Feb 04 2022
parent reply Adam Ruppe <destructionator gmail.com> writes:
On Friday, 4 February 2022 at 23:36:11 UTC, Adam Ruppe wrote:
 I don't think you understand my proposal, which is closer to 
 C's existing rules than D is now.
To reiterate: C's rule: int promote, DO allow narrowing implicit conversion. D's rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes. My proposed rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes OR the requested conversion is the same as the largest input type (with literals excluded unless their value is obviously out of range). So there's no change to the actual calculation. Just loosening D's currently strict implicit conversion rule back to something closer to C's permissive standard. There'd be zero changes to codegen. No modification of intermediate values. It just allows implicit conversions back to the input *just like C does*.
Feb 04 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 3:51 PM, Adam Ruppe wrote:
 On Friday, 4 February 2022 at 23:36:11 UTC, Adam Ruppe wrote:
 I don't think you understand my proposal, which is closer to C's existing 
 rules than D is now.
To reiterate: C's rule: int promote, DO allow narrowing implicit conversion. D's rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes. My proposed rule: int promote, do NOT allow narrowing implicit conversion unless VRP passes OR the requested conversion is the same as the largest input type (with literals excluded unless their value is obviously out of range).
We considered that and chose not to go that route, on the grounds that we were trying to minimize invisible truncation. P.S. as a pragmatic programmer, I find very little use for shorts other than saving some space in a data structure. Using shorts as temporaries is a code smell.
Feb 04 2022
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Saturday, 5 February 2022 at 02:43:27 UTC, Walter Bright wrote:
 P.S. as a pragmatic programmer, I find very little use for 
 shorts other than saving some space in a data structure. Using 
 shorts as temporaries is a code smell.
As a pragmatic programmer with hand-coded assembly optimizations experience and also familiar with SIMD compiler intrinsics, using shorts as temporaries in C code actually works great for prototyping/testing the behavior of a single 16-bit lane. As a bonus, autovectorizers in compilers may pick up something too. But tons of forced type casts is the actual code smell.
Feb 04 2022
prev sibling next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Saturday, 5 February 2022 at 02:43:27 UTC, Walter Bright wrote:
 We considered that and chose not to go that route, on the 
 grounds that we were trying to minimize invisible truncation.
I know how D works. I know why it works that way. Hell, I implemented part of the VRP code in dmd myself and have explained it to who knows how many new users over the last 15 years. What I'm telling you is *it doesn't actually work*. These forced explicit casts rarely prevent real bugs and in exchange, they make the language significantly harder to use and create their own problems down the line. Loosening the rules would reduce the burden of the many, many, many false positives forcing harmful casts while keeping the spirit of the rule. It isn't just *invisible* truncation you want to minimize - it is *buggy* invisible truncation. You want the compiler (and the casts) to call out potentially buggy areas so when it cries wolf, you actually look for a wolf that's probably there.
Feb 04 2022
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 5 February 2022 at 03:48:15 UTC, Adam D Ruppe wrote:
 You want the compiler (and the casts) to call out potentially 
 buggy areas so when it cries wolf, you actually look for a wolf 
 that's probably there.
Well written code would use a narrowing cast with checks for debugging, but the type itself is less interesting, so it would be better with overloading on return type. But it could be the default if overflow checks were implemented. byte x = narrow(expression); If it was the default, you could disable it instead: byte x = uncheck(expression);
Feb 04 2022
parent jmh530 <john.michael.hall gmail.com> writes:
On Saturday, 5 February 2022 at 07:59:21 UTC, Ola Fosheim Grøstad 
wrote:
 [snip]

 Well written code would use a narrowing cast with checks for 
 debugging, but the type itself is less interesting, so it would 
 be better with overloading on return type. But it could be the 
 default if overflow checks were implemented.

     byte x = narrow(expression);


 If it was the default, you could disable it instead:

    byte x = uncheck(expression);
In the meantime, the equivalent of the `narrow` function could get added to `std.conv`.
Feb 06 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 7:48 PM, Adam D Ruppe wrote:
 On Saturday, 5 February 2022 at 02:43:27 UTC, Walter Bright wrote:
 We considered that and chose not to go that route, on the grounds that we were 
 trying to minimize invisible truncation.
I know how D works. I know why it works that way. Hell, I implemented part of the VRP code in dmd myself and have explained it to who knows how many new users over the last 15 years. What I'm telling you is *it doesn't actually work*. These forced explicit casts rarely prevent real bugs and in exchange, they make the language significantly harder to use and create their own problems down the line. Loosening the rules would reduce the burden of the many, many, many false positives forcing harmful casts while keeping the spirit of the rule. It isn't just *invisible* truncation you want to minimize - it is *buggy* invisible truncation. You want the compiler (and the casts) to call out potentially buggy areas so when it cries wolf, you actually look for a wolf that's probably there.
I use D all day every day, the time, and I don't seem to be having these problems. I did a: grep -w cast *.d across src/dmd/*.d, and found hardly any casts to short/ushort that would fall under the forced cast category you mentioned. Granted, maybe your style of coding is different. Doing the same grep across phobos/std/*.d, rather little of it which I have written, I found zero instances of forced cast to short/ushort. As for "rarely", these kinds of bugs are indeed rare, but can be invisible yet significant. It's just the sort of thing we want to catch.
Feb 05 2022
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Saturday, 5 February 2022 at 02:43:27 UTC, Walter Bright wrote:
 On 2/4/2022 3:51 PM, Adam Ruppe wrote:
 
 To reiterate:
 
 C's rule: int promote, DO allow narrowing implicit conversion.
 
 D's rule: int promote, do NOT allow narrowing implicit 
 conversion unless VRP passes.
 
 My proposed rule: int promote, do NOT allow narrowing implicit 
 conversion unless VRP passes OR the requested conversion is 
 the same as the largest input type (with literals excluded 
 unless their value is obviously out of range).
We considered that and chose not to go that route, on the grounds that we were trying to minimize invisible truncation.
I do like Adam's proposal as well. If you're adding two shorts together and assigning them back to a short, there isn't really any surprising truncation happening, it's more like just any integer overflow: ```d int a = 0x6000_0000; int b = a+a; // overflow short c = 0x6000; short d = c+c; // overflow with Adam's proposal, disallowed now. ``` I can't see why that overflow would be any more surprising with `short` than with an `int`. One thing that also speaks for the proposal it is 16-bit programming. Yes, I know that D is not designed for under 32 bits so 16 bits should be a secondary concern, but remember that D can already do that to some extent: https://forum.dlang.org/post/kctkzmrdhocsfummllhq forum.dlang.org .
 P.S. as a pragmatic programmer, I find very little use for 
 shorts other than saving some space in a data structure. Using 
 shorts as temporaries is a code smell.
Feb 20 2022
prev sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 4 February 2022 at 23:25:14 UTC, Walter Bright wrote:
 As I observed before, there is no solution. Just different 
 problems. It's best to stick with a scheme that has 
 well-understood issues and works best with the common CPU 
 architectures.
I think the anecdote regarding Gosling demonstrates that these issues are not well understood.
Feb 04 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/4/2022 3:41 PM, Elronnd wrote:
 I think the anecdote regarding Gosling demonstrates that these issues are not 
 well understood.
None of the other proposals are better understood.
Feb 04 2022
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Thursday, 3 February 2022 at 05:50:24 UTC, Walter Bright wrote:
 On 2/2/2022 6:25 PM, Siarhei Siamashka wrote:
 On Thursday, 3 February 2022 at 01:05:15 UTC, Walter Bright 
 wrote:
 I find it works well. For example,

     int i;
     byte b = i & 0xFF;

 passes without complaint with VRP.
No, it's doesn't pass: `Error: cannot implicitly convert expression i & 255 of type int to byte`.
My mistake. b should have been declared as ubyte.
Regarding your original example with the `byte` type. Maybe the use of the following code can be encouraged as a good idiomatic overflow-safe way to do it in D language? int i; byte b = i.to!byte; i = -129; b = i.to!byte; // std.conv.ConvOverflowException This is 2 characters shorter and IMHO nicer looking than `byte b = cast(byte)i;`. An overflow check is done at runtime to catch bugs, but good optimizing compilers are actually smart enough to eliminate it when the range of possible values of `i` is known at compile time. For example: void foobar(byte[] a) { foreach (i ; 0 .. a.length) a[i] = (i % 37).to!byte; } Gets compiled into: $ gdc-12.0.1 -O3 -fno-weak-templates -c test.d && objdump -d test.o 0000000000000000 <_D4test6foobarFAgZv>: 0: 48 85 ff test %rdi,%rdi 3: 74 37 je 3c <_D4test6foobarFAgZv+0x3c> 5: 49 b8 8b 7c d6 0d a6 movabs $0xdd67c8a60dd67c8b,%r8 c: c8 67 dd f: 31 c9 xor %ecx,%ecx 11: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 18: 48 89 c8 mov %rcx,%rax 1b: 49 f7 e0 mul %r8 1e: 48 c1 ea 05 shr $0x5,%rdx 22: 48 8d 04 d2 lea (%rdx,%rdx,8),%rax 26: 48 8d 14 82 lea (%rdx,%rax,4),%rdx 2a: 48 89 c8 mov %rcx,%rax 2d: 48 29 d0 sub %rdx,%rax 30: 88 04 0e mov %al,(%rsi,%rcx,1) 33: 48 83 c1 01 add $0x1,%rcx 37: 48 39 cf cmp %rcx,%rdi 3a: 75 dc jne 18 <_D4test6foobarFAgZv+0x18> 3c: c3 retq Slow division is replaced by multiplication and shifts, conditional branches are only done to compare `i` with the array length. The `.to!byte` part doesn't have any overhead at all and bytes are just directly written to the destination array via `mov %al,(%rsi,%rcx,1)` instruction.
Feb 03 2022
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 2 February 2022 at 23:27:05 UTC, Walter Bright 
wrote:
 It also *causes* bugs. When code gets refactored, and the types 
 change, those forced casts may not be doing what is desired, 
 and can do things like unexpectedly truncating integer values.

 One of the (largely hidden because it works so well) advances D 
 has over C is Value Range Propagation, where automatic 
 conversions of integers to smaller integers is only done if no 
 bits are lost.
+1 The cure would probably be worse than the "problem". We should be careful what we wish for. D does exactly what was so successful in C, integer promotion and proper casts. It cause zero surprise to a native programmer. There is a difference. C does the conversion to shorter integer implicitely, D does not and if you translate you have to cast(). Yet, it isn't clear that the D code with the cast is less brittle than the C code in that case, as when the type changes you get neither a warning in C nor in D. Example from qoi.d (translation of qoi.h) ---- byte vr = cast(byte)(px.rgba.r - px_prev.rgba.r); byte vg = cast(byte)(px.rgba.g - px_prev.rgba.g); byte vb = cast(byte)(px.rgba.b - px_prev.rgba.b); ---- When px.rgba.r changes its type, the D code will have no more warning than the C code, arguably less. Thus the top methods for detecting integer problems are in that case: 1. No casts with VRP 2. ex aequo: D with cast, or C with implicit cast It would be nice to "get over" the D integers, it's not like there is a magic design that makes all problems go away, also as of today people still translate from C to D all the time. It is the state we are in today, where compatibility with C semantics helps immensely.
Feb 04 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 4 February 2022 at 15:00:18 UTC, Guillaume Piolat 
wrote:
 C does the conversion to shorter integer implicitely, D does 
 not and if you translate you have to cast().
Let us not forget that D does not compete with C (nobody can). D competes with C++, Nim, and so on. Even C++ keeps introducing new basic type to get better type safety for every new edition of the language. For instance, in C++ ```std::byte``` is not an arithmetic type. This is an improvement, also for system level native programming.
Feb 04 2022
prev sibling next sibling parent reply bachmeier <no spam.net> writes:
On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 Unfortunately, this is also one of the areas of D that comes 
 directly from C, so D programmers have to watch out for these 
 as well.
Here's a lovely one I wrote about yesterday in Learn: ``` import std.conv, std.range, std.stdio; void main() { writeln(iota(5, 0, -1)); writeln(iota(5, -1, -1)); writeln(iota(5.to!uint, -1, -1)); writeln(iota(5.to!uint, 0, -1)); writeln(-1.to!uint); auto z = -1; writeln(z.to!uint); } ``` Which delivers the following output: ``` [5, 4, 3, 2, 1] [5, 4, 3, 2, 1, 0] [] [5, 4, 3, 2, 1] 4294967295 std.conv.ConvOverflowException /usr/include/dmd/phobos/std/conv.d(567): Conversion negative overflow ---------------- ??:? pure safe bool std.exception.enforce!(bool).enforce(bool, lazy object.Throwable) [0x555fe1c5c946] ??:? pure safe uint std.conv.toImpl!(uint, int).toImpl(int) [0x555fe1c6f1ff] ??:? pure safe uint std.conv.to!(uint).to!(int).to(int) [0x555fe1c6f1d0] ??:? _Dmain [0x555fe1c5594c] ``` All I wanted was a function that iterates through the elements of an array starting at the end. The only time you have a problem is if you want to include the first element of the array. A simple solution is to add a `-scottmeyers` switch that retains full compatibility with C, but sets the default as a language that is productive.
Feb 17 2022
next sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Thursday, 17 February 2022 at 17:53:58 UTC, bachmeier wrote:
 	writeln(-1.to!uint);
This is -(1.to!uint). Not (-1).to!uint. Not really a WTF, imo; precedence is just something you have to know, and it works more or less the same way in most programming languages.
Feb 17 2022
parent bachmeier <no spam.net> writes:
On Thursday, 17 February 2022 at 18:14:46 UTC, Elronnd wrote:
 On Thursday, 17 February 2022 at 17:53:58 UTC, bachmeier wrote:
 	writeln(-1.to!uint);
This is -(1.to!uint). Not (-1).to!uint. Not really a WTF, imo; precedence is just something you have to know, and it works more or less the same way in most programming languages.
Setting aside the fact that this is maybe 1% of the point of my post, it's not about precedence. Write it like this if you want `writeln(-(1.to!uint))` and you'll get the same output. The language is more than happy to let someone take the negative of a uint without so much as a warning, even though the ratio of mistakes to legitimate uses is a gazillion to one. But that was not the main point of my post. It's the behavior of iota silently converting good code to bad code when the second argument changes from 0 to -1. It's hard to trust the code you've written - something we already know from decades of writing C.
Feb 17 2022
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/17/22 12:53 PM, bachmeier wrote:

 A simple solution is to add a `-scottmeyers` switch that retains full 
 compatibility with C, but sets the default as a language that is 
 productive.
Let's try it in C: ```c for(unsigned int i = 5; i > -1; --i) printf("i is %d\n", i); ``` output is nothing. -Steve
Feb 17 2022
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 17 February 2022 at 17:53:58 UTC, bachmeier wrote:
 On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 Unfortunately, this is also one of the areas of D that comes 
 directly from C, so D programmers have to watch out for these 
 as well.
Here's a lovely one I wrote about yesterday in Learn: ``` import std.conv, std.range, std.stdio; void main() { writeln(iota(5, 0, -1)); writeln(iota(5, -1, -1)); writeln(iota(5.to!uint, -1, -1)); writeln(iota(5.to!uint, 0, -1)); writeln(-1.to!uint); auto z = -1; writeln(z.to!uint); } ```
Yeah, implicit signed-to-unsigned conversion is really nasty. Even if we keep the rest of the C-style promotion rules, getting rid of that one would still be a big improvement.
Feb 17 2022
next sibling parent reply forkit <forkit gmail.com> writes:
On Thursday, 17 February 2022 at 20:11:07 UTC, Paul Backus wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. 
 Even if we keep the rest of the C-style promotion rules, 
 getting rid of that one would still be a big improvement.
Implicit memory manipulation (e.g. type casting for example) can result in unintentional memory safety bugs, which in turn can result in adverse events. As such, it is not consistent with the concept of memory safety. It would be great if D had a feature whereby I could annotate a function in such a way, that it disallowed implicit type conversions on its input arguments. Regardless, I think this is another reason why D is unlikely to ever get widespread adoption. That is, the ship has sailed with regards to the pricipals of memory safety in programming languages, and the decisions Rust has made with regards to inherent safety, have resulted in the kind of language features programmers *will have to* work with in the future.
Feb 17 2022
next sibling parent forkit <forkit gmail.com> writes:
On Thursday, 17 February 2022 at 20:55:38 UTC, forkit wrote:

e.g.

This code would not compile:

(dic = disable implicit conversions)

// --

module test;
 safe:

import std;

void main()
{
     uint x = 4294967295;
     foo(x);
}


 dic void foo (int i)
{
     writeln(i); // will print -1 if this was allowed to compile.
}

// ---
Feb 17 2022
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 17 February 2022 at 20:55:38 UTC, forkit wrote:
 On Thursday, 17 February 2022 at 20:11:07 UTC, Paul Backus 
 wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. 
 Even if we keep the rest of the C-style promotion rules, 
 getting rid of that one would still be a big improvement.
Implicit memory manipulation (e.g. type casting for example) can result in unintentional memory safety bugs, which in turn can result in adverse events. As such, it is not consistent with the concept of memory safety.
Memory safety is about avoiding undefined behavior, not avoiding bugs in general. Implicitly casting an int to a uint can certainly cause bugs in a program, but it cannot introduce undefined behavior unless you are already doing something unsafe with the result (like indexing into an array without bounds checking).
Feb 17 2022
parent reply forkit <forkit gmail.com> writes:
On Thursday, 17 February 2022 at 21:35:00 UTC, Paul Backus wrote:
 Memory safety is about avoiding undefined behavior, not 
 avoiding bugs in general. Implicitly casting an int to a uint 
 can certainly cause bugs in a program, but it cannot introduce 
 undefined behavior unless you are already doing something 
 unsafe with the result (like indexing into an array without 
 bounds checking).
Well, strong type safety is a component of memory safety. Now a 'bug' is where the programmer takes the average of two unsigned integers, and it results in an overflow. Here, correctness is the programmers responsibility. On the otherhand, implicit conversion of uint to int is inherently unsafe, since the compiler cannot determine whether the coercion 'avoids undefined behaviour'. On that basis, it should just not do it - and instead, make the programmer take responsibilty. Thus once again, the programmer is in charge - which is, as it should be.
Feb 17 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 18 February 2022 at 04:33:39 UTC, forkit wrote:
 On the otherhand, implicit conversion of uint to int is 
 inherently unsafe, since the compiler cannot determine whether 
 the coercion 'avoids undefined behaviour'.
The behavior of converting a uint to an int is well-defined in D: the uint's bit pattern is re-interpreted as a signed int using 32-bit two's complement notation. This conversion is valid for every possible pattern of 32 bits, and therefore for every possible uint. There is absolutely no possibility of undefined behavior. "Undefined behavior" is a technical term with a precise meaning. [1] It does not simply mean "undesirable behavior" or "error-prone behavior" or even "behavior that violates the rules of conventional mathematics." [1] https://en.wikipedia.org/wiki/Undefined_behavior
Feb 17 2022
parent reply forkit <forkit gmail.com> writes:
On Friday, 18 February 2022 at 05:47:13 UTC, Paul Backus wrote:
 The behavior of converting a uint to an int is well-defined in 
 D: the uint's bit pattern is re-interpreted as a signed int 
 using 32-bit two's complement notation. This conversion is 
 valid for every possible pattern of 32 bits, and therefore for 
 every possible uint. There is absolutely no possibility of 
 undefined behavior.

 "Undefined behavior" is a technical term with a precise 
 meaning. [1] It does not simply mean "undesirable behavior" or 
 "error-prone behavior" or even "behavior that violates the 
 rules of conventional mathematics."

 [1] https://en.wikipedia.org/wiki/Undefined_behavior
The 'convertability' of a type may well be defined by the language, but the conversion itself may not be defined by the programmer. I don't think it is unreasonable, to extend the concept of 'undefined behaviour' to include behaviour not defined by the programmer. But in any case...semantics aside... In a language that does implicit conversion on primitive types, I would prefer that the programmer have the tools to undefine those implicit conversions. That is all there is to my argument.
Feb 18 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 18 February 2022 at 10:13:07 UTC, forkit wrote:
 In a language that does implicit conversion on primitive types, 
 I would prefer that the programmer have the tools to undefine 
 those implicit conversions.
The easy solution is to have LDC and GDC implement a command line switch that restricts unsigned to signed conversions without a cast?
Feb 19 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 19 February 2022 at 09:49:20 UTC, Ola Fosheim 
Grøstad wrote:
 On Friday, 18 February 2022 at 10:13:07 UTC, forkit wrote:
 In a language that does implicit conversion on primitive 
 types, I would prefer that the programmer have the tools to 
 undefine those implicit conversions.
The easy solution is to have LDC and GDC implement a command line switch that restricts unsigned to signed conversions without a cast?
Or the opposite…
Feb 19 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/17/2022 12:11 PM, Paul Backus wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. Even if we keep 
 the rest of the C-style promotion rules, getting rid of that one would still
be 
 a big improvement.
There is a simple solution you can use: never use unsigned integers. I'm not being facetious. Many languages do not have unsigned integer types.
Feb 17 2022
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 17 February 2022 at 22:41:59 UTC, Walter Bright 
wrote:
 On 2/17/2022 12:11 PM, Paul Backus wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. 
 Even if we keep the rest of the C-style promotion rules, 
 getting rid of that one would still be a big improvement.
There is a simple solution you can use: never use unsigned integers. I'm not being facetious. Many languages do not have unsigned integer types.
The fact that language built-ins like an array's .length property are unsigned make this somewhat difficult to do in D.
Feb 17 2022
prev sibling next sibling parent reply bachmeier <no spam.net> writes:
On Thursday, 17 February 2022 at 22:41:59 UTC, Walter Bright 
wrote:
 On 2/17/2022 12:11 PM, Paul Backus wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. 
 Even if we keep the rest of the C-style promotion rules, 
 getting rid of that one would still be a big improvement.
There is a simple solution you can use: never use unsigned integers. I'm not being facetious. Many languages do not have unsigned integer types.
But that would mean you have to give up D's arrays. All I did was feed the length of an array to std.range.iota.
Feb 17 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/17/2022 2:51 PM, bachmeier wrote:
 But that would mean you have to give up D's arrays. All I did was feed the 
 length of an array to std.range.iota.
Cast it to ptrdiff_t and you'll be fine.
Feb 17 2022
parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 17 February 2022 at 23:23:40 UTC, Walter Bright 
wrote:
 On 2/17/2022 2:51 PM, bachmeier wrote:
 But that would mean you have to give up D's arrays. All I did 
 was feed the length of an array to std.range.iota.
Cast it to ptrdiff_t and you'll be fine.
If only there were some way we could remind people to use an explicit cast in situations like these...
Feb 17 2022
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 17 February 2022 at 22:41:59 UTC, Walter Bright 
wrote:
 On 2/17/2022 12:11 PM, Paul Backus wrote:
 Yeah, implicit signed-to-unsigned conversion is really nasty. 
 Even if we keep the rest of the C-style promotion rules, 
 getting rid of that one would still be a big improvement.
There is a simple solution you can use: never use unsigned integers. I'm not being facetious. Many languages do not have unsigned integer types.
Then you don't get to know the length of a slice, this is going to be really limiting really quick.
Feb 17 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/17/2022 4:07 PM, deadalnix wrote:
 Then you don't get to know the length of a slice, this is going to be really 
 limiting really quick.
cast(ptrdiff_t)length
Feb 17 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/17/2022 8:31 PM, Walter Bright wrote:
 On 2/17/2022 4:07 PM, deadalnix wrote:
 Then you don't get to know the length of a slice, this is going to be really 
 limiting really quick.
cast(ptrdiff_t)length
ptrdiff_t len = array.length; also works.
Feb 17 2022
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 18.02.22 05:32, Walter Bright wrote:
 On 2/17/2022 8:31 PM, Walter Bright wrote:
 On 2/17/2022 4:07 PM, deadalnix wrote:
 Then you don't get to know the length of a slice, this is going to be 
 really limiting really quick.
cast(ptrdiff_t)length
    ptrdiff_t len = array.length; also works.
Except perhaps for somewhat long arrays in a 32-bit program.
Feb 17 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/17/2022 9:25 PM, Timon Gehr wrote:
 Except perhaps for somewhat long arrays in a 32-bit program.
Can't have everything. If you've got an array length longer than int.max, you're going to have trouble distinguishing a subtraction from a wraparound addition in any case. Dealing with that means one is simply going to have to pay attention to how integer 2-s complement arithmetic works on a computer. Just wait till you get into floating point!
Feb 18 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 18.02.22 09:05, Walter Bright wrote:
 On 2/17/2022 9:25 PM, Timon Gehr wrote:
 Except perhaps for somewhat long arrays in a 32-bit program.
Can't have everything. ...
Well, you I guess you *could* just use `long` instead of `ptrdiff_t`. (It seemed to me the entire point of this exercise was to do things in a way that's less error prone.)
 If you've got an array length longer than int.max,
Seems I likely won't have that (compiled with -m32): ```d void main(){ import core.stdc.stdlib; import std.stdio; writeln(malloc(size_t(int.max)+1)); // null (int.max works) auto a=new ubyte[](int.max); // out of memory error } ```
 you're going to have 
 trouble distinguishing a subtraction from a wraparound addition in any 
 case.
Why? A wraparound addition is an addition where the result's sign differs from that of both operands. Seems simple enough. If course, I can just sign-extend both operands so the total width precludes a wraparound addition. (E.g., just use `long`.)
 Dealing with that means one is simply going to have to pay 
 attention to how integer 2-s complement arithmetic works on a computer.
 ...
Most of that is not too helpful as it's not exposed by the language. (At least in D, signed arithmetic actually has 2-s complement semantics, but the hardware has some features to make dealing with 2-s complement convenient that are not really exposed by the programming language.) In any case, I can get it right, the scenario I had in mind is competent programmers having to spend time debugging a weird issue and then ultimately fix some library dependency that silently acquires funky behavior once the data gets a bit bigger than what's in the unit tests because the library authors blindly followed a `ptrdiff_t` recommendation they once saw in the forums. Unlikely to happen to me personally, as I currently see little reason to write 32-bit programs, even less 32-bit programs dealing with large arrays, but it seemed to me that "works" merited some minor qualification, as you kind of went out of your way to explicitly use the sometimes overly narrow `int` on 32-bit machines. ;) Especially given that QA might mostly happen on 64 bit builds, that's probably quite risky in some cases.
Feb 18 2022
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 18 February 2022 at 10:13:42 UTC, Timon Gehr wrote:
 On 18.02.22 09:05, Walter Bright wrote:
 If you've got an array length longer than int.max,
Seems I likely won't have that (compiled with -m32): ```d void main(){ import core.stdc.stdlib; import std.stdio; writeln(malloc(size_t(int.max)+1)); // null (int.max works) auto a=new ubyte[](int.max); // out of memory error } ```
This is either an OS configuration issue (you are exceeding a per-process limit for one of the resources) or maybe there could be indeed some large arrays handling bugs in the D library. Dealing with utilizing as much memory as possible on 32-bit systems is already an ancient history. Server admins used to tweak various things, such PAE or 3.5G/0.5G user/kernel address space split, etc. But now none of this really matters anymore, because all memory hungry servers moved to 64-bit hardware a long time ago. And before that, there used to be such things as EMS and himem on ancient 16-bit MS-DOS systems to use as much memory as possible by applications. But modern D compilers even can't generate 16-bit code and nobody cares today. I guess, some or even many people in this forum were born after this stuff became obsolete. Personally I'm not going to miss anything if sizes of arrays change to use a signed type and int.max (or ssize_t.max) becomes the official array size limit on 32-bit systems.
 Most of that is not too helpful as it's not exposed by the 
 language. (At least in D, signed arithmetic actually has 2-s 
 complement semantics, but the hardware has some features to 
 make dealing with 2-s complement convenient that are not really 
 exposed by the programming language.)
The hardware only provides the flags register, which can be checked for overflows after arithmetic operations. This functionality is provided by https://dlang.org/phobos/core_checkedint.html in D language, but I wouldn't call it convenient. The flags checks in assembly are not convenient either.
 In any case, I can get it right, the scenario I had in mind is 
 competent programmers having to spend time debugging a weird 
 issue and then ultimately fix some library dependency that 
 silently acquires funky behavior once the data gets a bit 
 bigger than what's in the unit tests because the library 
 authors blindly followed a `ptrdiff_t` recommendation they once 
 saw in the forums.
I still think that support for trapping arithmetic overflows at runtime is a reasonable solution. It can catch a lot of bugs, which are very hard to debug using other methods. For example, there are not too many signed arithmetic overflows in phobos. Passing the phobos unit tests with signed overflows trapping enabled (the "-ftrapv" option in GDC) only requires some minor patches in a few places: https://github.com/ssvb/gcc/commits/gdc-ftrapv-phobos-20220209 Most of the affected places in phobos are already marked with cautionary comments ("beware of negating int.min", "there was an overflow bug, here's a link to bugzilla", etc.). A history of blood and suffering is recorded there. Some people may think that (signed) arithmetic overflows being defined to wraparound is a useful feature of the D language and some software may rely on it for doing something useful. But I don't see any real evidence of that. Most of the silent arithmetic overflows look like just yet undetected and undesirable bugs. The next step is probably to see how many changes are needed in the compiler frontend code to make it "-ftrapv" compatible too. But again, the problem is not technical at all. The problem is that too many people are convinced that silent wraparounds are good and nothing needs to be changed or improved. Or that rudimentary VRP checks at compile time are sufficient.
Feb 20 2022
prev sibling parent reply Nick Treleaven <nick geany.org> writes:
On Friday, 18 February 2022 at 04:32:56 UTC, Walter Bright wrote:
     ptrdiff_t len = array.length;
The problem is remembering to do that, particularly in cases where the unsigned value is an inferred function result, or for an index involving $. We need an error, not an implicit conversion. I expect you to say that will force users to cast, which can introduce bugs if the source type changes. The solution to that is to encourage using e.g. std.conv.signed: https://dlang.org/library/std/conv/signed.html
Feb 18 2022
parent Nick Treleaven <nick geany.org> writes:
On Friday, 18 February 2022 at 09:27:08 UTC, Nick Treleaven wrote:
 We need an error, not an implicit conversion. I expect you to 
 say that will force users to cast, which can introduce bugs if 
 the source type changes.
In fact, a cast that changes both the integer size and the signedness at the same time could be made an error as well.
 The solution to that is to encourage using e.g. std.conv.signed:
 https://dlang.org/library/std/conv/signed.html
Of course, these would be easier to use if they were in object.d.
Feb 18 2022
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 17.02.22 18:53, bachmeier wrote:
 On Friday, 28 January 2022 at 02:15:51 UTC, Paul Backus wrote:
 Unfortunately, this is also one of the areas of D that comes directly 
 from C, so D programmers have to watch out for these as well.
Here's a lovely one I wrote about yesterday in Learn: ``` import std.conv, std.range, std.stdio; void main() {     writeln(iota(5, 0, -1));     writeln(iota(5, -1, -1));     writeln(iota(5.to!uint, -1, -1));     writeln(iota(5.to!uint, 0, -1));     writeln(-1.to!uint);     auto z = -1;     writeln(z.to!uint); } ``` Which delivers the following output: ``` [5, 4, 3, 2, 1] [5, 4, 3, 2, 1, 0] [] [5, 4, 3, 2, 1] 4294967295 std.conv.ConvOverflowException /usr/include/dmd/phobos/std/conv.d(567): Conversion negative overflow ---------------- ??:? pure safe bool std.exception.enforce!(bool).enforce(bool, lazy object.Throwable) [0x555fe1c5c946] ??:? pure safe uint std.conv.toImpl!(uint, int).toImpl(int) [0x555fe1c6f1ff] ??:? pure safe uint std.conv.to!(uint).to!(int).to(int) [0x555fe1c6f1d0] ??:? _Dmain [0x555fe1c5594c] ``` All I wanted was a function that iterates through the elements of an array starting at the end. The only time you have a problem is if you want to include the first element of the array. A simple solution is to add a `-scottmeyers` switch that retains full compatibility with C, but sets the default as a language that is productive.
Not defending C rules at all or how Phobos is handling them, but if you want a range with all array indices in reverse, there is a very simple way to state just that: ```d import std.conv, std.range, std.stdio; void main() { auto v = [1, 2, 3, 4, 5]; writeln(iota(v.length).retro); } ``` In general, I avoid using negation/subtraction if anything unsigned is involved. There is usually another way to write it, in this case it is even much simpler, more descriptive, and it will work correctly for long arrays. (`int`/`uint` are not large enough to address modern amounts of RAM. However, I guess instead of avoiding negation/subtraction, it can also make sense to do index computations with `long` instead of `size_t`.)
Feb 17 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
You can select the behavior you want with:

https://dlang.org/phobos/std_experimental_checkedint.html
Feb 18 2022