digitalmars.D - Integer overflow and underflow semantics
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (8/8) May 04 2012 Hi,
- Era Scarecrow (19/26) May 04 2012 Any systems that implement a carry flag likely works exactly
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (5/29) May 05 2012 Right, but the question was whether the language guarantees what I
- Era Scarecrow (14/17) May 05 2012 I can't see why it wouldn't, unless the compiler adds in checks
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (12/27) May 05 2012 It all depends. GCC (and thus GDC) can target very exotic architectures
- Manu (17/36) May 05 2012 With regard to code-gen on such colourful architectures, would stating a
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (15/51) May 05 2012 I don't know of a single architecture (out of the ones on
- Mehrdad (7/9) May 06 2012 Yes, but that misses the actual problem!
- Era Scarecrow (41/46) May 06 2012 I know; as long as the number is within bounds it won't have any
- bearophile (14/20) May 05 2012 In a good system language I'd like to see something better than
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/25) May 05 2012 Good point. The wrapping rule described in my OP should be the default
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (5/11) May 06 2012 Can anyone give a definitive answer to this or at least confirm that it
- Paul D. Anderson (4/19) May 18 2012 I don't have the reference at the moment but the C99(?) standard
- Paul D. Anderson (7/29) May 18 2012 Sorry, it's C++:
- renoX (6/31) May 21 2012 It's both C and C++ for unsigned integer.
- Don Clugston (4/10) May 07 2012 I think the reason that C makes no guarantees, was because of
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (5/18) May 07 2012 That's what I'm thinking. I can't even name a one's complement machine
- renoX (12/14) May 09 2012 Maybe but that's not the full reason, as Mehrdad says the
- =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (7/13) May 14 2012 Bumping this as we still need to make a decision about this. As recently...
- akaz (20/25) May 18 2012 Just jumping into the bandwagon with several info:
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (9/34) May 18 2012 I think this counts as sloppy programming if anything.
- akaz (3/6) May 18 2012 In the release version, yes. In the release version, array bound
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (9/15) May 18 2012 It should be controllable with a command line option just as
- Andrew Wiley (8/31) May 18 2012 to
Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures. -- - Alex
May 04 2012
On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen wrote:Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Any systems that implement a carry flag likely works exactly like that. Carry flag is set or cleared after a math operation, allowing you to extend the size of your integer any level you want, like in oh the 6502? Might look something like this: Been a while but you should get the idea clc ;clear carry loop_here: mov ax, [bx] adc [dx], ax inc bx inc dx dec cx cmp cx, 0 jne loop_here ;if carry after this point then... jc overflow_warning
May 04 2012
On 05-05-2012 07:32, Era Scarecrow wrote:On Saturday, 5 May 2012 at 04:57:48 UTC, Alex Rønne Petersen wrote:Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that. -- - AlexHi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Any systems that implement a carry flag likely works exactly like that. Carry flag is set or cleared after a math operation, allowing you to extend the size of your integer any level you want, like in oh the 6502? Might look something like this: Been a while but you should get the idea clc ;clear carry loop_here: mov ax, [bx] adc [dx], ax inc bx inc dx dec cx cmp cx, 0 jne loop_here ;if carry after this point then... jc overflow_warning
May 05 2012
On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that.I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them. I know before when i needed to know if it overflowed and simulated the carry flag, i ended up using a larger type of int. (This was for a custom multiple precision unsigned ints) Of course now that I think about it, if anything ends up using MMX for it's registers, the rules do change a little. MMX don't overflow if I remember right, they just cap/truncate to the Max/Min. Since it's intended more for multimedia... If the compiler uses those, I can't tell you what would happen.
May 05 2012
On 05-05-2012 10:23, Era Scarecrow wrote:On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote:It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compiler Y work" issue. An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them.Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that.I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.I know before when i needed to know if it overflowed and simulated the carry flag, i ended up using a larger type of int. (This was for a custom multiple precision unsigned ints) Of course now that I think about it, if anything ends up using MMX for it's registers, the rules do change a little. MMX don't overflow if I remember right, they just cap/truncate to the Max/Min. Since it's intended more for multimedia... If the compiler uses those, I can't tell you what would happen.Right, but it's not so much about what *would* happen as much as it is about what *should* happen. ;) -- - Alex
May 05 2012
On 5 May 2012 11:42, Alex R=C3=B8nne Petersen <xtzgzorex gmail.com> wrote:On 05-05-2012 10:23, Era Scarecrow wrote:erOn Saturday, 5 May 2012 at 07:10:28 UTC, Alex R=C3=B8nne Petersen wrote: Right, but the question was whether the language guarantees what IIt all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compil=described. C and C++ don't, and IMO, we should strive to fix that.I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them.Y work" issue. An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them.With regard to code-gen on such colourful architectures, would stating a defined behaviour for overflow/underflow affect the common case where an over/underflow did not occur? Short of an architecture that causes hardware exception on over/underflow, I suspect that it would interfere with the common case (additional code generated around every add/sub/etc to handle the overflow behaviour), and on such an architecture, every single numerical integer operation would become inefficient. I believe this is why C doesn't define the behaviour, because C is still effectively a macro language, and shouldn't produce 'unexpected' inefficient code. ('unexpected' from the perspective of the architecture you're targeting) I would personally rather see it remain undefined like C, but with convention approved by common/conventional architectures where cross platform porting is likely to occur.
May 05 2012
On 05-05-2012 12:25, Manu wrote:On 5 May 2012 11:42, Alex Rønne Petersen <xtzgzorex gmail.com <mailto:xtzgzorex gmail.com>> wrote: On 05-05-2012 10 <tel:05-05-2012%2010>:23, Era Scarecrow wrote: On Saturday, 5 May 2012 at 07:10:28 UTC, Alex Rønne Petersen wrote: Right, but the question was whether the language guarantees what I described. C and C++ don't, and IMO, we should strive to fix that. I can't see why it wouldn't, unless the compiler adds in checks and changes it's behavior or the assembler does it's own quirky magic. The bit patterns of how they end up are pretty fixed, it's just how we interpret them. It all depends. GCC (and thus GDC) can target very exotic architectures where any assumptions may not, for whatever reason, hold true. This is a language design issue more than it's a "how does architecture X or compiler Y work" issue. An interesting problem with undefined behavior for integer overflow and underflow in C/C++ is that optimizers are basically free to do anything with regards to them, and often do whatever is more convenient for them. With regard to code-gen on such colourful architectures, would stating a defined behaviour for overflow/underflow affect the common case where an over/underflow did not occur? Short of an architecture that causes hardware exception on over/underflow, I suspect that it would interfere with the common case (additional code generated around every add/sub/etc to handle the overflow behaviour), and on such an architecture, every single numerical integer operation would become inefficient.I don't know of a single architecture (out of the ones on dlang.org/version.html and many others) that doesn't signal overflow/underflow somehow (or obey the rules I described in the OP).I believe this is why C doesn't define the behaviour, because C is still effectively a macro language, and shouldn't produce 'unexpected' inefficient code. ('unexpected' from the perspective of the architecture you're targeting)Right, but C was designed many, many years ago. :)I would personally rather see it remain undefined like C, but with convention approved by common/conventional architectures where cross platform porting is likely to occur.Do a grep for "\.min" and "\.max" in Phobos and you'll notice a lot of places actually depend on the current x86 behavior (wrapping on overflow and underflow). I don't think we can afford to make the same mistake C and C++ did. While I did say that this is a language design issue, it's also a matter of pragmatism - does any real world architecture that D could possibly target actually *not* obey the aforementioned rules? I don't know of any that doesn't. -- - Alex
May 05 2012
On Saturday, 5 May 2012 at 05:32:14 UTC, Era Scarecrow wrote:Any systems that implement a carry flag likely works exactly like that.Yes, but that misses the actual problem! The problem isn't the _system_, it's the _compiler_. If the language says overflow or underflow are undefined behavior, then what the system does _doesn't matter one bit_, because the compiler can optimize away the code entirely: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
May 06 2012
On Sunday, 6 May 2012 at 16:39:02 UTC, Mehrdad wrote:Yes, but that misses the actual problem! The problem isn't the _system_, it's the _compiler_. If the language says overflow or underflow are undefined behavior, then what the system does _doesn't matter one bit_, because the compiler can optimize away the code entirely:I know; as long as the number is within bounds it won't have any issues. hopefully a simple suggestion would be to add a compiler flag to check all results against overflow/underflow, of add/subtract/multiply; and should it happen it throws an exception; but asserts can likely do the same job if you have a critical section you needed checked. Under this case if you needed specific behavior you could use the asm{} blocks, but as a whole I don't know. :( I know I brought something similar to this before walter regarding the carry bit and overflow and if you could have finer control over it; but that ended up being rejected under the same idea that if you really needed that control you can use the asm{} block (I think, it's been a year or two I think). It would be tedious to add checks after every instruction, so the compiler would need to do it if you wanted those checks. Either a flag would need to be set after a certain point in the module that says 'act this way with math operations involving over/underflow', or it is forced inside certain blocks as a new try/catch, except... that might be trysys { a = b + c; } catch (Carry c) { //fix or whatever //since we know it's a carry flag we don't need c //so maybe catchsys(carry) instead?? } Which might take an exception/like approach. But that seems doubtful, since very likely the cases where it's needed is very very very rare and refers back to asm or manual checks. a = b + c; if (a < b+c) { //overflow? //fails if MMX } or if (a - b != c) { //overflow? } Mmmm... Unfortunately I only have a few ideas and nothing concrete.
May 06 2012
Alex Rønne Petersen:I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.In a good system language I'd like to see something better than programmer the choice of 3 or 4 different semantics in integral operations: 1) A shared standard semantics that overflows, as in Java; 2) A semantics that overflows, that adapts to the fastest available on the CPU, as in C; 3) Shared standard overflows with unsigned values and run-time errors when a signed value overflows (or goes out of its range). 4) Run-time errors when every signed or unsigned value overflows (or goes out of its range), as in Ada. Bye, bearophile
May 05 2012
On 05-05-2012 13:06, bearophile wrote:Alex Rønne Petersen:Good point. The wrapping rule described in my OP should be the default IMO. Then perhaps the other modes could be activated with an attribute of sorts? -- - AlexI don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.In a good system language I'd like to see something better than what's choice of 3 or 4 different semantics in integral operations: 1) A shared standard semantics that overflows, as in Java; 2) A semantics that overflows, that adapts to the fastest available on the CPU, as in C; 3) Shared standard overflows with unsigned values and run-time errors when a signed value overflows (or goes out of its range). 4) Run-time errors when every signed or unsigned value overflows (or goes out of its range), as in Ada. Bye, bearophile
May 05 2012
On 05-05-2012 06:57, Alex Rnne Petersen wrote:Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Can anyone give a definitive answer to this or at least confirm that it is an open issue? -- - Alex
May 06 2012
On Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen wrote:On 05-05-2012 06:57, Alex Rønne Petersen wrote:I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Can anyone give a definitive answer to this or at least confirm that it is an open issue?
May 18 2012
On Friday, 18 May 2012 at 19:59:01 UTC, Paul D. Anderson wrote:On Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen wrote:Sorry, it's C++: 3.9.1/4 Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. PaulOn 05-05-2012 06:57, Alex Rønne Petersen wrote:I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Can anyone give a definitive answer to this or at least confirm that it is an open issue?
May 18 2012
On Friday, 18 May 2012 at 20:01:15 UTC, Paul D. Anderson wrote:On Friday, 18 May 2012 at 19:59:01 UTC, Paul D. Anderson wrote:It's both C and C++ for unsigned integer. Signed is undefined, unsigned is 'wraparound'. Both sucks for the default behaviour IMHO : premature optimisation.. renoXOn Sunday, 6 May 2012 at 16:27:34 UTC, Alex Rønne Petersen wrote:Sorry, it's C++:On 05-05-2012 06:57, Alex Rønne Petersen wrote:I don't have the reference at the moment but the C99(?) standard requires wraparound behavior by UNSIGNED integer values. I don't know if there is an equivalent requirement for signed values.Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Can anyone give a definitive answer to this or at least confirm that it is an open issue?
May 21 2012
On 05/05/12 06:57, Alex Rnne Petersen wrote:Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.I think the reason that C makes no guarantees, was because of ones-complement machines (which had become very rare even by 1970). Surely we can assume 2-s complement behaviour?
May 07 2012
On 07-05-2012 10:05, Don Clugston wrote:On 05/05/12 06:57, Alex Rnne Petersen wrote:That's what I'm thinking. I can't even name a one's complement machine still in use today. -- - AlexHi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.I think the reason that C makes no guarantees, was because of ones-complement machines (which had become very rare even by 1970). Surely we can assume 2-s complement behaviour?
May 07 2012
On Monday, 7 May 2012 at 08:06:01 UTC, Don Clugston wrote:I think the reason that C makes no guarantees, was because of ones-complement machinesMaybe but that's not the full reason, as Mehrdad says the compiler can use the fact that overflow are undefined to optimise the code generated. In my opinion, there are only two valid options for the integer operation's overflow by default: either the compiler should make sure that the overflow is noticed (Ada: exception, adding 'Not-a-Number' values would work too) or C's signed undefined behaviour. Modulo operation (except where really useful) are bad: slower than the 'undefined' semantic, yet it still produce obviously incorrect behaviour such as negative absolute value..
May 09 2012
On 05-05-2012 06:57, Alex Rnne Petersen wrote:Hi, I don't think the language really makes it clear whether overflows and underflows are well-defined. Do we guarantee that for any integral type T, T.max + 1 == T.min and T.min - 1 == T.max? This is relevant in particular for GDC and LDC since they target a lot of weird architectures.Bumping this as we still need to make a decision about this. As recently as yesterday, someone on the GCC mailing list posted a complaint about an optimization pass that assumed undefined semantics for overflow. We need to have a stance about this, since GDC is going into mainline GCC soon. -- - Alex
May 14 2012
Bumping this as we still need to make a decision about this. As recently as yesterday, someone on the GCC mailing list posted a complaint about an optimization pass that assumed undefined semantics for overflow. We need to have a stance about this, since GDC is going into mainline GCC soon.Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/Therac Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the “flag” will temporarily indicate a zero condition! If the “set” command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invoked.
May 18 2012
On 18-05-2012 15:26, akaz wrote:I think this counts as sloppy programming if anything. I agree that throwing an exception could be a good feature to have, but it should *not* be the default. I want my systems code to run at full speed when I know what I'm doing. -- Alex Rønne Petersen alex lycus.org http://lycus.orgBumping this as we still need to make a decision about this. As recently as yesterday, someone on the GCC mailing list posted a complaint about an optimization pass that assumed undefined semantics for overflow. We need to have a stance about this, since GDC is going into mainline GCC soon.Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/Therac Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the “flag” will temporarily indicate a zero condition! If the “set” command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invoked.
May 18 2012
I agree that throwing an exception could be a good feature to have, but it should *not* be the default. I want my systems code to run at full speed when I know what I'm doing.In the release version, yes. In the release version, array bound checking is disabled, too. But in the debug version?
May 18 2012
On 18-05-2012 19:28, akaz wrote:It should be controllable with a command line option just as -noboundscheck. I'll revise what I said: It can be on in debug for all I care, as long as I can disable it independently of what mode I'm compiling in. -- Alex Rønne Petersen alex lycus.org http://lycus.orgI agree that throwing an exception could be a good feature to have, but it should *not* be the default. I want my systems code to run at full speed when I know what I'm doing.In the release version, yes. In the release version, array bound checking is disabled, too. But in the debug version?
May 18 2012
On Fri, May 18, 2012 at 6:26 AM, akaz <nemo utopia.com> wrote:nBumping this as we still need to make a decision about this. As recently as yesterday, someone on the GCC mailing list posted a complaint about a=tooptimization pass that assumed undefined semantics for overflow. We need=ed.have a stance about this, since GDC is going into mainline GCC soon.Just jumping into the bandwagon with several info: http://en.wikipedia.org/wiki/**Therac<http://en.wikipedia.org/wiki/Therac= Therac25 was a medicale machine that injured several people because: "When input parameters are unverified or inconsistent, the treatment monitor task periodically runs a procedure that increments a counter This counter is used as a flag by the housekeeping task, indicating whether gun firing should be enabled or not However, as the counter is only 8 bits, it will overflow every 256 ticks, and the =93flag=94 will temporarily indicate a zero condition! If the =93set=94 command is given at that instant, inconsistencies are not checked, and unshielded high- energy radiation may result" The case is known in the real-time operating systems programming. Does D throw an exception when an integral type (signed or unsigned) underflows or overflows? I am for defining this as the implicit behavior. Using a counter in the cyclical mode should be rather be explicitly invok=Massive industrial systems run on code written in systems languages that dismissed this behavior as unacceptably slow years ago. That one programmer was incrementing a counter when he should have been storing a nonzero value instead isn't really relevant to this discussion.
May 18 2012