www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - primitive value overflow

reply luka8088 <luka8088 owave.net> writes:
Hello everyone.

Today I ran into a interesting issue. I wrote

   auto offset = text1.length - text2.length;

and in case text2 was longer then text1 I got something around 4294967291.

So I opened an issue:
http://d.puremagic.com/issues/show_bug.cgi?id=10093

I know that there is a perfectly valid reason for this behavior, and 
that this behavior is not undefined, but it is unexpected, especially 
because unsigned is never mentioned in the code. One solution that comes 
to mind is changing length to signed, but that makes no sense because 
length is never negative.

After some thinking a though came that maybe such value overflow should 
be treated the same way as array overflow and checked by druntime with 
optional disabling in production code (like array bound checks)?

I think it would be very helpful to get an error for such mistake (that 
could very easily happen by accident), and on the other hand it can be 
disabled (like all other checks).
May 16 2013
next sibling parent reply "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

   auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
May 16 2013
next sibling parent reply "Mr. Anonymous" <mailnew4ster gmail.com> writes:
On Thursday, 16 May 2013 at 20:29:13 UTC, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

  auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
It's exactly the same as checking if(text1.length > text2.length). But the idea of checking for integer overflows in debug builds is really nice. P.S. I remember Microsoft had some serious bug because of an integer overflow, that allowed a remote machine to create a denial of service.
May 16 2013
parent reply luka8088 <luka8088 owave.net> writes:
On 16.5.2013. 22:35, Mr. Anonymous wrote:
 On Thursday, 16 May 2013 at 20:29:13 UTC, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

 auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
It's exactly the same as checking if(text1.length > text2.length). But the idea of checking for integer overflows in debug builds is really nice. P.S. I remember Microsoft had some serious bug because of an integer overflow, that allowed a remote machine to create a denial of service.
I agree that it is exactly the same as checking if (text1.length > text2.length). And I don't think that this is an issues if you are aware of the fact that you are working with unsigned values. But in the code that I wrote there was no mentioning of unsigned so the possibility of that kind of issue never came to mind until I actually printed the values. And that is what I wanted to emphasize.
May 16 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, May 16, 2013 22:42:23 luka8088 wrote:
 I agree that it is exactly the same as checking if (text1.length >
 text2.length). And I don't think that this is an issues if you are aware
 of the fact that you are working with unsigned values. But in the code
 that I wrote there was no mentioning of unsigned so the possibility of
 that kind of issue never came to mind until I actually printed the
 values. And that is what I wanted to emphasize.
Well, I'm not sure what can be done about that. length is size_t, which is the unsigned integral value which matches the architecture (uint for 32-bit and ulong for 64-bit). AFAIK, the documentation is clear on this (though I haven't read it recently). If it's not, then it should be made clearer, but using size_t for length is pervasive in D as it is in C++, and if you know the standard types, you know what size_t is. As for overflow checking, it's come up quite a few times, and Walter is completely against it. The hardware doesn't support it, and it would definitely be slow if it were added. The standard way to handle that if you want it is to create user-defined integral type which does the checks (though that obviously won't help you when you can't control the types that you're dealing with). But if you want to add checks in your code, you can always create a wrapper function for doing the subtraction. And if you wanted the checks to only be there in non-release mode, then you could even put the checks in a version(assert) block. So, you can add checks if you want them, but there's pretty much no way that how unsigned or overlflow are handled in the language is going to change. - Jonathan M Davis
May 16 2013
parent reply "Mr. Anonymous" <mailnew4ster gmail.com> writes:
On Thursday, 16 May 2013 at 21:04:38 UTC, Jonathan M Davis wrote:
 On Thursday, May 16, 2013 22:42:23 luka8088 wrote:
 I agree that it is exactly the same as checking if 
 (text1.length >
 text2.length). And I don't think that this is an issues if you 
 are aware
 of the fact that you are working with unsigned values. But in 
 the code
 that I wrote there was no mentioning of unsigned so the 
 possibility of
 that kind of issue never came to mind until I actually printed 
 the
 values. And that is what I wanted to emphasize.
Well, I'm not sure what can be done about that. length is size_t, which is the unsigned integral value which matches the architecture (uint for 32-bit and ulong for 64-bit). AFAIK, the documentation is clear on this (though I haven't read it recently). If it's not, then it should be made clearer, but using size_t for length is pervasive in D as it is in C++, and if you know the standard types, you know what size_t is. As for overflow checking, it's come up quite a few times, and Walter is completely against it. The hardware doesn't support it, and it would definitely be slow if it were added. The standard way to handle that if you want it is to create user-defined integral type which does the checks (though that obviously won't help you when you can't control the types that you're dealing with). But if you want to add checks in your code, you can always create a wrapper function for doing the subtraction. And if you wanted the checks to only be there in non-release mode, then you could even put the checks in a version(assert) block. So, you can add checks if you want them, but there's pretty much no way that how unsigned or overlflow are handled in the language is going to change. - Jonathan M Davis
I agree with Walter if we're talking about production code, but I think it could be very helpful for debug builds. P.S.
 The hardware doesn't support it
That's not completely true. e.g. x86, while it doesn't throw an exception on an overflow, it does set a flag, which could be relatively cheaply checked.
May 16 2013
parent 1100110 <0b1100110 gmail.com> writes:
On 05/16/2013 04:17 PM, Mr. Anonymous wrote:
 On Thursday, 16 May 2013 at 21:04:38 UTC, Jonathan M Davis wrote:
 On Thursday, May 16, 2013 22:42:23 luka8088 wrote:
 I agree that it is exactly the same as checking if (text1.length >
 text2.length). And I don't think that this is an issues if you are aw=
are
 of the fact that you are working with unsigned values. But in the cod=
e
 that I wrote there was no mentioning of unsigned so the possibility o=
f
 that kind of issue never came to mind until I actually printed the
 values. And that is what I wanted to emphasize.
Well, I'm not sure what can be done about that. length is size_t, which is the unsigned integral value which matches the architecture (uint for 32-bit and ulong for 64-bit). AFAIK, the documentation is clear on this (though I=
 haven't
 read it recently). If it's not, then it should be made clearer, but us=
ing
 size_t for length is pervasive in D as it is in C++, and if you know t=
he
 standard types, you know what size_t is.

 As for overflow checking, it's come up quite a few times, and Walter i=
s
 completely against it. The hardware doesn't support it, and it would
 definitely
 be slow if it were added. The standard way to handle that if you want
 it is to
 create user-defined integral type which does the checks (though that
 obviously
 won't help you when you can't control the types that you're dealing
 with). But
 if you want to add checks in your code, you can always create a wrappe=
r
 function for doing the subtraction. And if you wanted the checks to
 only be
 there in non-release mode, then you could even put the checks in a
 version(assert) block. So, you can add checks if you want them, but
 there's
 pretty much no way that how unsigned or overflow are handled in the
 language
 is going to change.

 - Jonathan M Davis
=20 I agree with Walter if we're talking about production code, but I think=
 it could be very helpful for debug builds.
=20
 P.S.
 The hardware doesn't support it
That's not completely true. e.g. x86, while it doesn't throw an exception on an overflow, it does set a flag, which could be relatively cheaply checked.
Perhaps a compiler flag would be nice, can't really argue performance if it's explicitly chosen. Like -profile or -cov.
May 16 2013
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/16/13, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 The hardware doesn't support it, and it would definitely
 be slow if it were added.
We could add it in -debug mode perhaps. Ideally this sort of thing would be its own switch, but yes that goes towards switch proliferation, I know the story.
May 16 2013
prev sibling parent reply luka8088 <luka8088 owave.net> writes:
On 16.5.2013. 22:29, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

 auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
Yes, thanks for the advice, I did something similar. =)
May 16 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 16 May 2013 22:39:16 +0200
schrieb luka8088 <luka8088 owave.net>:

 On 16.5.2013. 22:29, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

 auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
Yes, thanks for the advice, I did something similar. =)
Now that doesn't work if you deal with some text2 that is over 2 GiB longer than text1. My approach is to see the close relation between any offset from beginning or length to the machine memory model. So any byte or char array in memory naturally has an unsigned length typed by the architecture's word size (e.g. 32 or 64 bit). With that in mind I _only_ ever subtract two values if I know the difference will be positive. That is the case for file_size - valid_offset for example. I don't know the context for your line of code, but if text1 and text2 are passed in as parameters to a function, a contract should verify that text1 is longer (or equal) than text2. Now feel free to tell me I'm wrong, but with the two lengths being natural numbers or "countable", I claim that a negative value for your offset variable would not have been usable anyway. It is a result that makes no sense. So on the next line you probably check "if (offset >= 0)" which is the same as putting "if (text1.length >= text2.length)" one line earlier to avoid running into the situation where you can end up with an over- or underflow because the result range of size_t - size_t fits neither size_t nor sizediff_t. Say text1 is 0 bytes long and text2 is 3_000_000_000 bytes long. Then -3_000_000_000 would be the result that cannot be stored in any 32-bit type. And thus it is important to think about possible input to your integer calculations and place if-else-branches there (or in-contracts), especially when the language accepts overflows silently. But I'd really like to see the context of your code if it is not a secret. :) -- Marco
May 16 2013
next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 16 May 2013 23:23:20 +0100, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Thu, 16 May 2013 22:39:16 +0200
 schrieb luka8088 <luka8088 owave.net>:

 On 16.5.2013. 22:29, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

 auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap
auto
 with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's
 non-negative.
Yes, thanks for the advice, I did something similar. =)
Now that doesn't work if you deal with some text2 that is over 2 GiB longer than text1.
This is not a problem I worry about in the general case. I can honestly say I have never had to write any code which had to handle strings that long.. in fact the longest string length would probably have fit in ushort TBH. This is why, for better or worse, I cast all C strlen() calls from size_t to int as a matter of course. Sure, length cannot be negative, but for all practical purposes it will not be larger than signed int either. I have been doing this for ages now and have had exactly 0 bugs as a result, vs the few I have had using unsigned types and subtraction. Of course, were I coding something that was intended to specifically handle super sized strings, it would be a different story. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
May 17 2013
prev sibling parent reply luka8088 <luka8088 owave.net> writes:
On 17.5.2013. 0:23, Marco Leise wrote:
 Am Thu, 16 May 2013 22:39:16 +0200
 schrieb luka8088<luka8088 owave.net>:

 On 16.5.2013. 22:29, Andrej Mitrovic wrote:
 On Thursday, 16 May 2013 at 20:24:31 UTC, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

 auto offset = text1.length - text2.length;
Yeah, I don't like these bugs either. In the meantime you can swap auto with 'sizediff_t' or 'ptrdiff_t', and then you can check if it's non-negative.
Yes, thanks for the advice, I did something similar. =)
Now that doesn't work if you deal with some text2 that is over 2 GiB longer than text1. My approach is to see the close relation between any offset from beginning or length to the machine memory model. So any byte or char array in memory naturally has an unsigned length typed by the architecture's word size (e.g. 32 or 64 bit). With that in mind I _only_ ever subtract two values if I know the difference will be positive. That is the case for file_size - valid_offset for example. I don't know the context for your line of code, but if text1 and text2 are passed in as parameters to a function, a contract should verify that text1 is longer (or equal) than text2. Now feel free to tell me I'm wrong, but with the two lengths being natural numbers or "countable", I claim that a negative value for your offset variable would not have been usable anyway. It is a result that makes no sense. So on the next line you probably check "if (offset>= 0)" which is the same as putting "if (text1.length>= text2.length)" one line earlier to avoid running into the situation where you can end up with an over- or underflow because the result range of size_t - size_t fits neither size_t nor sizediff_t. Say text1 is 0 bytes long and text2 is 3_000_000_000 bytes long. Then -3_000_000_000 would be the result that cannot be stored in any 32-bit type. And thus it is important to think about possible input to your integer calculations and place if-else-branches there (or in-contracts), especially when the language accepts overflows silently. But I'd really like to see the context of your code if it is not a secret. :)
I understand perfectly the issue that you are pointing out. But that is not the real issue here. I know how computer arithmetic works, the understanding is not the issue here. The real issue is that at the time of writing unsigned was never mentioned, and it never came to my mind that it could be the issue (until I printed that actual value). So in reality I made a mistake, as I sometimes make a typo mistake. And I fix them (as I did this one) by debugging. What I want to point out is that this kind of mistakes can be pointed out to the programer in debug mode (and save programer time) by adding a runtime check. The only real benefit here would be shorter debug time, and the only real tradeoff would be slower executing in debug mode. Nothing else.
May 23 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 23 May 2013 19:55:36 +0200
schrieb luka8088 <luka8088 owave.net>:

 I understand perfectly the issue that you are pointing out. But that is 
 not the real issue here. I know how computer arithmetic works, the 
 understanding is not the issue here. The real issue is that at the time 
 of writing unsigned was never mentioned, and it never came to my mind 
 that it could be the issue (until I printed that actual value). So in 
 reality I made a mistake, as I sometimes make a typo mistake. And I fix 
 them (as I did this one) by debugging.
 
 What I want to point out is that this kind of mistakes can be pointed 
 out to the programer in debug mode (and save programer time) by adding a 
 runtime check. The only real benefit here would be shorter debug time, 
 and the only real tradeoff would be slower executing in debug mode. 
 Nothing else.
Ok, I remember that from Delphi http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/devcommon/compdirsoverflowchecking_xml.html It is a good idea and probably better then some Checked!int construct. Overflows are only wanted in a few places, so it makes sense to have this check enabled globally and to disable it for some code locally. -- Marco
May 23 2013
prev sibling parent reply Ary Borenszweig <ary esperanto.org.ar> writes:
On 5/16/13 5:24 PM, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

    auto offset = text1.length - text2.length;

 and in case text2 was longer then text1 I got something around 4294967291.

 So I opened an issue:
 http://d.puremagic.com/issues/show_bug.cgi?id=10093

 I know that there is a perfectly valid reason for this behavior, and
 that this behavior is not undefined, but it is unexpected, especially
 because unsigned is never mentioned in the code. One solution that comes
 to mind is changing length to signed, but that makes no sense because
 length is never negative.
It makes sense if you don't want to have these subtle bugs in your programming language -.-
May 17 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 17 May 2013 14:43:08 -0300
schrieb Ary Borenszweig <ary esperanto.org.ar>:

 On 5/16/13 5:24 PM, luka8088 wrote:
 Hello everyone.

 Today I ran into a interesting issue. I wrote

    auto offset = text1.length - text2.length;

 and in case text2 was longer then text1 I got something around 4294967291.

 So I opened an issue:
 http://d.puremagic.com/issues/show_bug.cgi?id=10093

 I know that there is a perfectly valid reason for this behavior, and
 that this behavior is not undefined, but it is unexpected, especially
 because unsigned is never mentioned in the code. One solution that comes
 to mind is changing length to signed, but that makes no sense because
 length is never negative.
It makes sense if you don't want to have these subtle bugs in your programming language -.-
Just saying: the subtraction of two integers creates a resulting range that doesn't fit in ANY integer of same size. If you want to be safe disallow subtraction -.- -- Marco
May 18 2013
parent reply "Minas Mina" <minas_mina1990 hotmail.co.uk> writes:
I agree that checks for overflow should exist in debug builds 
(and not exist in release builds).
May 18 2013
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 18 May 2013 at 20:29:57 UTC, Minas Mina wrote:
 I agree that checks for overflow should exist in debug builds 
 (and not exist in release builds).
What about code that relies on overflow? It's well-defined behaviour, so it should be expected that people rely on it (I certainly do sometimes)
May 23 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Peter Alexander:

 What about code that relies on overflow? It's well-defined 
 behaviour, so it should be expected that people rely on it (I 
 certainly do sometimes)
Do you rely on signed or unsigned overflow? My opinions on this topic have changed few times. A modern system language should offer the programmer both integral types for the rare situations where the overflow or wrap around are expected or acceptable, and other "default" integral types to be used in all the other situations, where overflow or wrap-around are unexpected and not desired. The implementation then should offer ways to optionally perform run-time tests on the second group of integrals. A very good system language should also offer various means to statically verify the bounds of a certain percentage of values and expression results, to reduce the amount of run-time tests needed (here things like "Liquid Types" help). D currently doesn't have such safe built-in types, and it doesn't offer means to create such efficient types in library code. I think such means should be provided: http://d.puremagic.com/issues/show_bug.cgi?id=9850 Bye, bearophile
May 23 2013
parent luka8088 <luka8088 owave.net> writes:
On 24.5.2013. 1:58, bearophile wrote:
 Peter Alexander:

 What about code that relies on overflow? It's well-defined behaviour,
 so it should be expected that people rely on it (I certainly do
 sometimes)
Do you rely on signed or unsigned overflow? My opinions on this topic have changed few times. A modern system language should offer the programmer both integral types for the rare situations where the overflow or wrap around are expected or acceptable, and other "default" integral types to be used in all the other situations, where overflow or wrap-around are unexpected and not desired. The implementation then should offer ways to optionally perform run-time tests on the second group of integrals. A very good system language should also offer various means to statically verify the bounds of a certain percentage of values and expression results, to reduce the amount of run-time tests needed (here things like "Liquid Types" help). D currently doesn't have such safe built-in types, and it doesn't offer means to create such efficient types in library code. I think such means should be provided: http://d.puremagic.com/issues/show_bug.cgi?id=9850 Bye, bearophile
I agree completely! Would it maybe be a good consensus to allow operator overloading and invariants for primitive types? Or is there a reason why that is a bad idea?
May 29 2013
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 24 May 2013 01:35:42 +0200
schrieb "Peter Alexander" <peter.alexander.au gmail.com>:

 What about code that relies on overflow? It's well-defined 
 behaviour, so it should be expected that people rely on it (I 
 certainly do sometimes)
See my post about Delphi's approach. It could be disabled in D using pragma(), UDAs or even scopes. -- Marco
May 23 2013
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
A)
Requiring a construct such as Checked!int is too complicated for the user
in general as it requires a lot of code change from the user. It may be
useful in certain cases but overflow bugs will crop up in unexpected places.

B)
To help finding such bugs, introduce a special version identifier for
checking arithmetic overflow (eg -version=check_arithmetic_overflow) (no
new compiler flag syntax, just a built in version identifier).

C)
It shouldn't be implied by -debug flag as it may slow down debug code
significantly, making debug code too slow to be useful. Conflating the
"debug" statements and arithmetic overflow is a bad idea; debugging
shouldn't be an all or nothing option.

D)
More generally, there's a trade-off between keeping the number of compiler
flags low for simplicity and having and having more flags for more
expressiveness. I feel dmd is trying too hard to keep the number low.
There's a reason why gcc, clang etc have so many flags, which gives user
finer control. There should be a good middle ground: the defaults should be
simple (optimize vs debug), but when searching for bugs, optimizing or
dealing with deprecated constructs, we want flexibility.

F)
Finally, some code may indeed depend on arithmetic overflow, and shouldn't
be affected by -version=check_arithmetic_overflow. Maybe we need a way to
"unset" a version identifier; here's a possible syntax:

----
module test;
version = check_arithmetic_overflow; //existing syntax
void fun(){}

version -= check_arithmetic_overflow; //new syntax
void fun2(){} //now check_arithmetic_overflow is unset
----

This would be very useful in many ways, eg make some code run fast while
the rest is in debug mode (otherwise the debug code would be too slow).



On Thu, May 23, 2013 at 9:13 PM, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Fri, 24 May 2013 01:35:42 +0200
 schrieb "Peter Alexander" <peter.alexander.au gmail.com>:

 What about code that relies on overflow? It's well-defined
 behaviour, so it should be expected that people rely on it (I
 certainly do sometimes)
See my post about Delphi's approach. It could be disabled in D using pragma(), UDAs or even scopes. -- Marco
May 25 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Good that we talked about it.

-- 
Marco
May 26 2013