www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GCC Undefined Behavior Sanitizer

reply "bearophile" <bearophileHUGS lycos.com> writes:
Just found with Reddit. C seems one step ahead of D with this:

http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

Bye,
bearophile
Oct 16 2014
next sibling parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

 Bye,
 bearophile
The sad thing about this tools is that they are all about fixing the holes introduced by C into the wild. So in the end when using C and C++, we need to have compiler + static analyzer + sanitizers, in a real life example of "Worse is Better", instead of fixing the languages. At least, C++ is on the path of having less undefined behaviors, as the work group clearly saw the benefits don't outweigh the costs and is now the process of cleaning the standard in that regard. As an outsider, I think D would be better by having only defined behaviors. -- Paulo
Oct 17 2014
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 17 Oct 2014 08:38:11 +0000
schrieb "Paulo  Pinto" <pjmlp progtools.org>:

 On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

 Bye,
 bearophile
The sad thing about this tools is that they are all about fixing the holes introduced by C into the wild. So in the end when using C and C++, we need to have compiler + static analyzer + sanitizers, in a real life example of "Worse is Better", instead of fixing the languages. At least, C++ is on the path of having less undefined behaviors, as the work group clearly saw the benefits don't outweigh the costs and is now the process of cleaning the standard in that regard. As an outsider, I think D would be better by having only defined behaviors. -- Paulo
I have a feeling back then the C designers weren't quite sure how the language would work out on current and future architectures, so they gave implementations some freedom here and there. Now that C/C++ is the primary language for any architecture, the table turned and the hardware designers build chips that behave "as expected" in some cases that C/C++ left undefined. That in turn allows C/C++ to become more restrictive. Or maybe I don't know what I'm talking about. What behavior is undefined in D? I'm not kidding, I don't really know of any list of undefined behaviors. The only thing I remember is casting away immutable and modifying the content is undefined behavior. Similar to C/C++ I think this is to allow current and future compilers to perform as of yet unknown optimizations on immutable data structures. Once such optimizations become well known in 10 to 20 years or so, D will define that behavior, too. Just like C/C++. -- Marco
Oct 17 2014
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 17 October 2014 at 08:38:12 UTC, Paulo  Pinto wrote:
 As an outsider, I think D would be better by having only 
 defined behaviors.
Actually, this is the first thing I would change about D and make it less dependent on x86. I think a system level language should enable max optimization on basic types and rather inject integrity tests for debugging/testing or support debug-exceptions where available. The second thing I would change is to make whole program analysis mandatory so that you can deduce and constrain value ranges. I don't believe the argument about separate compilation and commercial needs (and even then augmented object code is a distinct possibility). Even FFI is not a great argument, you should be able to specify what can happen in a foreign function. It is just plain wrong to let integers wrap by default in an accessible result. That is not integer behaviour. The correct thing to do is to inject overflow checks in debug mode and let overflow in results (that are accessed) be undefined. Otherwise you end up giving the compiler a difficult job: uint y=x+1; if (x < y){…} Should be optimized to: {…} In D (and C++) you would get: if (x < ((x+1)&0xffffffff)){…} As a result you are encouraged to use signed int everywhere in C++, since unsigned ints use modulo-arithmetic. Unsigned ints in C++ are only meant for bit-field stuff. And the C++ designers admit that the C++ library is ill-specified because it uses unsigned ints for integers that cannot be negative, while that is now considered a bad practice… In D it is even worse since you are forced to use a fixed size modulo even for int, so you cannot do 32 bit arithmetic in a 64 bit register without getting extra modulo operations. So, "undefined behaviour" is not so bad, as long as you qualify it. You could for instance say that overflow on ints leads to an unknown value, but no other side effects. That was probably the original intent for C, but compiler writers have taken it a step further… D has locked itself to Pentium-style x86 behaviour. Unfortunately it is very difficult to have everything be well-defined in a low level programming language. It isn't even obvious that a byte should be 8 bits, although the investments in creating UTF-8 resources on the Internet probably has locked us to it for the next 100 years… :)
Oct 17 2014
next sibling parent reply "eles" <eles eles.com> writes:
On Friday, 17 October 2014 at 09:46:49 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 17 October 2014 at 08:38:12 UTC, Paulo  Pinto wrote:
 The second thing I would change is to make whole program 
 analysis mandatory so that you can deduce and constrain value 
 ranges.
Nice idea, but how to persuade libraries to play that game?
Oct 17 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 17 October 2014 at 10:30:14 UTC, eles wrote:
 On Friday, 17 October 2014 at 09:46:49 UTC, Ola Fosheim Grøstad 
 wrote:
 The second thing I would change is to make whole program 
 analysis mandatory so that you can deduce and constrain value 
 ranges.
Nice idea, but how to persuade libraries to play that game?
1. Provide a meta-language for writing propositions that describes what libraries do if they are foreign (pre/post conditions). Could be used for "asserts" too. 2. Provide a C compiler that compiles to the same internal representation as the new language, so you can run the same analysis on C code. 3. Remove int so that you have to specify the range and make typedefs local to the library 4. Provide the ability to specify additional constraints on library functions you use in your project or even probabilistic information. Essentially it is a cultural thing, so the standard library has to be very well written. Point 4 above could let you specify properties on the input to a sort function on the call site and let the compiler use that information for optimization. E.g. if one million values are evenly distributed over a range of 0..100000 then a quick sort could break it down without using pivots. If the range is 0..1000 then it could switch to an array of counters. If the input is 99% sorted then it could switch to some insertion-sort based scheme. If you allow both absolute and probabilistic meta-information then the probabilistic information can be captured on a corpus of representative test-data. You could run the algorithm within the "measured probable range" and switch to a slower algorithm when you detect values outside it. Lots of opportunities for improving "state-of-the-art".
Oct 17 2014
parent reply "eles" <eles215 gzk.dot> writes:
On Friday, 17 October 2014 at 10:50:54 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 17 October 2014 at 10:30:14 UTC, eles wrote:
 On Friday, 17 October 2014 at 09:46:49 UTC, Ola Fosheim 
 Grøstad wrote:
 Nice idea, but how to persuade libraries to play that game?
1. Provide a meta-language for writing propositions that describes what libraries do if they are foreign (pre/post conditions). Could be used for "asserts" too.
That's complicated, to provide another langage for describing the behavior. And how? Embedded in the binary library? Maybe a set of annotations that are exposed through the .di files. But, then, we are back to headers... Another idea would be to simply make the in and out contracts of a function exposed in the corresponding .di file, or at least a part of them (we could use "public" for those). Anyway, as far as I ca imagine it, it would be like embedding Polyspace inside the compiler and stub functions inside libraries.
 2. Provide a C compiler that compiles to the same internal 
 representation as the new language, so you can run the same 
 analysis on C code.
For source code. But for cloused-source libraries?
 3. Remove int so that you have to specify the range and make 
 typedefs local to the library
Pascal arrays?
 Lots of opportunities for improving "state-of-the-art".
True. But a lot of problems too. And there is not much agreement on what is the state of the art...
Oct 19 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 19 October 2014 at 09:04:59 UTC, eles wrote:
 That's complicated, to provide another langage for describing 
 the behavior.
I think D need to unify UDA, type-traits, template constraints and other deductive facts and rules into a deductive database in order to make it more pleasant and powerful. And also provide the means to query that database from CTFE code. A commercial compiler could also speed up compilation of large programs with complex compile time logic. (storing facts in a persistent high performance database) There are several languages to learn from in terms of specifying "reach" in a graph/tree structure. E.g. XQuery You can view " nogc func(){} " as a fact: nogc('func). or perhaps: nogc( ('modulename,'func) ). then you could list the functions that are nogc in a module using: nogc( ('modulename,X) ) Same for type traits. If you build it into the typesystem then you can easily define new type constraints in complex ways. (You could start with something simple, like specifying if values reachable through a parameter escape the lifetime of the function call.)
 And how? Embedded in the binary library?
The same way you would do it with C/C++ today. Some binary format allow extra meta-info, so it is possible… in the long term.
 Another idea would be to simply make the in and out contracts 
 of a function exposed in the corresponding .di file, or at 
 least a part of them (we could use "public" for those).
That's an option. Always good to start with something simple, but with an eye for a more generic/powerful/unified solution in the future.
 Anyway, as far as I ca imagine it, it would be like embedding 
 Polyspace inside the compiler and stub functions inside 
 libraries.
Yes, or have a semantic analyser check, provide and collect facts for a deductive database, i.e.: 1. collect properties that are cheap to derive from source, build database 2. CTFE: query property X 3. if database query for X succeeds return result 4. collect properties that are more expensive guided by (2), inject into database 5. return result
 For source code. But for cloused-source libraries?
You need annotations. Or now that you are getting stuff like PNACL, maybe you can have closed source libraries in a IR format that can be analysed.
 3. Remove int so that you have to specify the range and make 
 typedefs local to the library
Pascal arrays?
subrange variables: var age : 0 ... 150; year: 1970 ... 9999;
 Lots of opportunities for improving "state-of-the-art".
True. But a lot of problems too. And there is not much agreement on what is the state of the art...
Right, and it get's worse the less specific the use scenario is. What should be created is a modular generic specification for a system programming language, based on what is, what should be, hardware trends and theory. Then you can see the dependencies among the various concepts. C, C++, D, Rust can form a starting point. I think D2 is to far on it's own trajectory to be modified, so I view D1 and D2 primarily as experiments, which is important too. But D3 (or some other language) should build on what the existing languages enable and unify concepts so you have something more coherent than C++ and D. Evolution can only take you so far, then you hit the walls set up by existing features/implementation.
Oct 19 2014
parent "eles" <eles215 gzk.dot> writes:
On Sunday, 19 October 2014 at 10:45:31 UTC, Ola Fosheim Grøstad 
wrote:
 On Sunday, 19 October 2014 at 09:04:59 UTC, eles wrote:
I mostly agree with all that you are saying, still I am aware that much effort and coordination will be needed. OTOH, this would give D (and/aka the future of computing) a non-negligeable edge (being able to optimize across libraries).
 Some binary format allow extra meta-info, so it is possible… in 
 the long term.
Debug builds could be re-used for that, with some minor modifications, I think.
 Another idea would be to simply make the in and out contracts 
 of a function exposed in the corresponding .di file, or at 
 least a part of them (we could use "public" for those).
That's an option. Always good to start with something simple, but with an eye for a more generic/powerful/unified solution in the future.
I think it would not turn that bad. For the time being, putting the contracts in the .di files would cost barely nothing (but disk space). And, progressively, the compiler could be made to integrate those, when the .di files with contracts are available, in order to optimize the builds. It would be directly D code, so very easily to interpret. Basically, the optimizer would have the set of the asserts that limit the behaviour of that function at his hand. Anybody else who would like to comment on this?
 But D3
People here traditionally don't like that word, but it has been unleased several times on the forum. Maybe not that stringent need, but I think that a somewhat disruptive "clean, clarify and fix glitches and bad legacy" release of D(2) is more and more needed and quite accepted as a good thing by the community (which is ready to take the effort to bring code up to date).
Oct 19 2014
prev sibling next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 09:46:48 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It is just plain wrong to let integers wrap by default in an=20
 accessible result. That is not integer behaviour.
do you know any widespread hardware with doesn't work this way? yet i know very widespread language which doesn't care. by a strange coincidence programs in this language tend to have endless problems with overflows.
 The correct=20
 thing to do is to inject overflow checks in debug mode and let=20
 overflow in results (that are accessed) be undefined.
the correct thing is to not turning perfectly defined operations to undefined ones.
 Otherwise=20
 you end up giving the compiler a difficult job:
=20
 uint y=3Dx+1;
 if (x < y){=E2=80=A6}
=20
 Should be optimized to:
=20
 {=E2=80=A6}
no, it shouldn't. at least not until there will be something like 'if_carry_set'.
 In D (and C++) you would get:
=20
 if (x < ((x+1)&0xffffffff)){=E2=80=A6}
perfect. nice and straightforward way to do overflow checks.
 In D it is even worse since you are forced to use a fixed size=20
 modulo even for int, so you cannot do 32 bit arithmetic in a 64=20
 bit register without getting extra modulo operations.
why should i, as programmer, care? what i *really* care about is portable code. having size of base types not strictly defined is not helping at all.
 So, "undefined behaviour" is not so bad
yes, it's not bad, it's terrible. having "undefined behavior" in language is like saying "hey, we don't know what to do with this, and we don't want to think about it. so we'll turn our problem into your problem. have a nice day, sucker!"
 You could for instance say that overflow on ints leads to an=20
 unknown value, but no other side effects. That was probably the=20
 original intent for C, but compiler writers have taken it a step=20
 further=E2=80=A6
how is this differs from the current interpretation?
 D has locked itself to Pentium-style x86 behaviour.
oops. 2's complement integer arithmetic is "pentium-style x86" now... i bet x86_64 does everything in ternary, right? oh, and how about pre-pentium era?
 Unfortunately=20
 it is very difficult to have everything be well-defined in a low=20
 level programming language. It isn't even obvious that a byte=20
 should be 8 bits
it is very easy. take current hardware, evaluate it's popularity, do what most popular hardware does. that's it. i, for myself, don't need a language for "future hardware", i need to work with what i have now. if we'll have some drastic changes in the future... well, we always can emulate old HW to work with old code, and rewrite that old code for new HW.
Oct 17 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 17 October 2014 at 13:44:24 UTC, ketmar via 
Digitalmars-d wrote:
 On Fri, 17 Oct 2014 09:46:48 +0000
 via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It is just plain wrong to let integers wrap by default in an 
 accessible result. That is not integer behaviour.
do you know any widespread hardware with doesn't work this way?
Yes, the carry flag is set if you add with carry. It means you SHOULD add to another hi-word with carry. :P You can also add with clamp with SSE, so you clamp to max/min. Too bad languages don't support it. I've always thought it be nice to have clamp operators, so you can say x(+)y and have the result clamped to the max/min values. Useful for stuff like DSP on integers.
 if (x < ((x+1)&0xffffffff)){…}
perfect. nice and straightforward way to do overflow checks.
Uh, so you want slow? If you want this you should also check the overflow flag so that you can catch overflows and throw an exception. But then you have a high level language. All high level languages should do this.
 In D it is even worse since you are forced to use a fixed size 
 modulo even for int, so you cannot do 32 bit arithmetic in a 
 64 bit register without getting extra modulo operations.
why should i, as programmer, care? what i *really* care about is portable code. having size of base types not strictly defined is not helping at all.
So you want to have lots of masking on your shiny new 64-bit register only CPU, because D is stuck on promoting to 32-bits by spec? That's not portable, that is "portable".
 So, "undefined behaviour" is not so bad
yes, it's not bad, it's terrible. having "undefined behavior" in language is like saying "hey, we don't know what to do with this, and
Nah, it is saying: if your code is wrong then you will get wrong results unless you turn on runtime checks. What D is saying is: nothing is wrong even if you get something you never wanted to express, because we specify all operations to be boundless (circular) so that nothing can be wrong by definition (but your code will still crash and burn). That also means that you cannot turn on runtime checks, since it is by definition valid. No way for the compiler to figure out if it is intentional or not.
 D has locked itself to Pentium-style x86 behaviour.
oops. 2's complement integer arithmetic is "pentium-style x86" now... i bet x86_64 does everything in ternary, right? oh, and how about pre-pentium era?
The overhead for doing 64bit calculations is marginal. Locking yourself to 32bit is a bad idea.
 it is very easy. take current hardware, evaluate it's 
 popularity, do
 what most popular hardware does. that's it. i, for myself, 
 don't need
 a language for "future hardware", i need to work with what i 
 have now.
My first computer had no division or multiply and 8 bit registers and was insanely popular. It was inconceivable that I would afford anything more advanced in the next decade. In the next 5 years I had two 16 bit computers, one with 16x RAM and GPU… and at a much lower price…
 if we'll have some drastic changes in the future... well, we 
 always can
 emulate old HW to work with old code, and rewrite that old code 
 for new
 HW.
The most work on a codebase is done after it ships. Interesting things may happen on the hardware side in the next few years: - You'll find info on the net where Intel has planned buffered transactional memory for around 2017. - AMD is interested in CPU/GPU intergration/convergence - Intel has a many core "co-processor" - SIMD registers are getting wider and wider… 512 bits is a lot! etc...
Oct 17 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 14:38:29 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 It is just plain wrong to let integers wrap by default in an=20
 accessible result. That is not integer behaviour.
do you know any widespread hardware with doesn't work this way?
=20 Yes, the carry flag is set if you add with carry. It means you=20 SHOULD add to another hi-word with carry. :P
i was writing about 'if_carry_set'. yes, i really-really-really want either "propagating carry" if 'if_carry_flag_set', or a way to tell the compiler "do overflow check on this expression and throw exception on overflow".
 You can also add with clamp with SSE, so you clamp to max/min.=20
 Too bad languages don't support it. I've always thought it be=20
 nice to have clamp operators, so you can say x(+)y and have the=20
 result clamped to the max/min values. Useful for stuff like DSP=20
 on integers.
it's good. but this not justifies the decision to make 2's complement overflow undefined.
 if (x < ((x+1)&0xffffffff)){=E2=80=A6}
perfect. nice and straightforward way to do overflow checks.
Uh, so you want slow? If you want this you should also check the=20 overflow flag so that you can catch overflows and throw an=20 exception.
i want a way to check integer overflows. i don't even want to think about dirty C code to do that ('cause, eh, our compilers are very smart and they, eh, know that there must be no overflows on ints, and they, eh, just removing some checks 'cause that checks is no-ops when there are no overflows, and now we, eh, have to cheat the compiler to... screw it, i'm going home!)
 So you want to have lots of masking on your shiny new 64-bit=20
 register only CPU, because D is stuck on promoting to 32-bits by=20
 spec?
yes. what's wrong with using long/ulong when you need 64 bits? i don't care about work CPU must perform to execute my code, CPU was created to help me, not vice versa. yet i really care about 'int' being the same size on different architectures (size_t, you sux! i want you to go away!).
 That's not portable, that is "portable".
it's portable. and "portable" is when i should making life of some silicon crap easier instead of silicon crap making my life easier.
 Nah, it is saying: if your code is wrong then you will get wrong=20
 results unless you turn on runtime checks.
...and have a nice day, sucker!
 What D is saying is: nothing is wrong even if you get something=20
 you never wanted to express, because we specify all operations to=20
 be boundless (circular) so that nothing can be wrong by=20
 definition (but your code will still crash and burn).
perfect!
 That also means that you cannot turn on runtime checks, since it=20
 is by definition valid. No way for the compiler to figure out if=20
 it is intentional or not.
if you want such checks, you have a choice. you either can do such checks manually or use something like CheckedInt. this way when i see CheckedInt variable i know programmer's intentions from the start. and if programmer using simple 'int' i know that compiler will not "optimize away" some checking code.
 The overhead for doing 64bit calculations is marginal. Locking=20
 yourself to 32bit is a bad idea.
did you noticed long/ulong types in D specs? and reserved 'cent' type for that matter?
 My first computer had no division or multiply and 8 bit registers=20
 and was insanely popular. It was inconceivable that I would=20
 afford anything more advanced in the next decade. In the next 5=20
 years I had two 16 bit computers, one with 16x RAM and GPU=E2=80=A6 and=20
 at a much lower price=E2=80=A6
that's why you don't use assembler to write your code now, aren't you? i was trying to use C for Z80, and that wasn't a huge success that days. why do you want to make my life harder by targeting D2 to some imaginary "future hardware" instead of targetting to current one? by the days when "future hardware" will become "current hardware" we will have D5 or so. nobody using K&R C now, right?
 The most work on a codebase is done after it ships.
porting to another arch, for example. where... ah, FSCK, int is 29 bits there! shit! or 16 bits... "portable by rewriting", yeah.
 Interesting things may happen on the hardware side in the next=20
 few years:
=20
 - You'll find info on the net where Intel has planned buffered=20
 transactional memory for around 2017.
(looking at 'date' output) ok, it's 2014 now. and i see no such HW around. let's talk about this when we'll have widespreaded HW with this feature.
 - AMD is interested in CPU/GPU intergration/convergence
and me not. but i'm glad for AMD.
 - Intel has a many core "co-processor"
and?..
 - SIMD registers are getting wider and wider=E2=80=A6 512 bits is a lot!
and i must spend time to make some silicon crap happy, again? teach compilers to transparently rewrite my code, i don't want to be a slave to CPUs.
Oct 17 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 17 October 2014 at 15:17:12 UTC, ketmar via 
Digitalmars-d wrote:
 it's good. but this not justifies the decision to make 2's 
 complement
 overflow undefined.
If you want a circular type, then call it something to that effect. Not uint or int. Call it bits or wrapint.
 yes. what's wrong with using long/ulong when you need 64 bits?
What is wrong and arbitrary is promoting to 32-bits by default.
 did you noticed long/ulong types in D specs? and reserved 
 'cent' type
 for that matter?
If you want fixed width, make it part of the name: i8, i16, i24, i32, i64… Seriously, if you are going to stick to fixed register sizes you have to support 24 bit and other common register sizes too. Otherwise you'll get 24bit wrapping 32bit ints.
 i was trying to use C for Z80, and that wasn't a huge success 
 that days.
How did you manage to compile with it? ;-) The first good programming tool I had was an assembler written in Basic… I had to load the assembler from tape… slooow. And if my program hung I had to reset and reload it. Patience… Then again, that makes you a very careful programmer ;)
 have D5 or so. nobody using K&R C now, right?
ANSI-C is pretty much the same, plenty of codebases are converted over from K&R. With roots in the 70s… :-P
 around. let's talk about this when we'll have widespreaded HW 
 with this feature.
That goes real fast, because is probably cheaper to have it built into all CPUs of the same generation and just disable it on the ones that have to be sold cheap because they are slow/market demand.
 and i must spend time to make some silicon crap happy, again?
If you want a high level language, no. If you want a system level language, yes!!!!!!!!!!
Oct 17 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 15:58:02 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Friday, 17 October 2014 at 15:17:12 UTC, ketmar via=20
 Digitalmars-d wrote:
 it's good. but this not justifies the decision to make 2's=20
 complement
 overflow undefined.
If you want a circular type, then call it something to that=20 effect. Not uint or int. Call it bits or wrapint.
it would be nice. but i'm still against "integral wrapping is undefined". define it! either specify the result, or force program to crash.
 yes. what's wrong with using long/ulong when you need 64 bits?
What is wrong and arbitrary is promoting to 32-bits by default.
64-bit ARMs aren't so widespread yet. oh, wait, are we talking about x86-compatible CPUs only? but why?
 did you noticed long/ulong types in D specs? and reserved=20
 'cent' type
 for that matter?
If you want fixed width, make it part of the name: i8, i16, i24,=20 i32, i64=E2=80=A6
i have nothing against this either. but i have alot against "integral with arbitrary size" type.
 Seriously, if you are going to stick to fixed register sizes you=20
 have to support 24 bit and other common register sizes too.=20
 Otherwise you'll get 24bit wrapping 32bit ints.
nope. if int is 32 but, and it's behavior is defined as 2's complement 32-bit value, it doesn't matter what register size HW has. it's compiler task to make this int behave right.
 i was trying to use C for Z80, and that wasn't a huge success=20
 that days.
How did you manage to compile with it? ;-)
it was... painful. even with disk drive. having only 64KB of memory (actually, only 48K free for use) doesn't help much too.
 Then again, that makes you a very careful programmer ;)
that almost turned me to serial killer.
 around. let's talk about this when we'll have widespreaded HW=20
 with this feature.
That goes real fast, because is probably cheaper to have it built=20 into all CPUs of the same generation and just disable it on the=20 ones that have to be sold cheap because they are slow/market=20 demand.
i don't buying that "we'll made that pretty soon" PR. first they making it widespreaded, then i'll start caring, not vice versa.
 and i must spend time to make some silicon crap happy, again?
=20 If you want a high level language, no. =20 If you want a system level language, yes!!!!!!!!!!
this is a misconception. "low level language" is not one that pleases CPU down to bits and registers, it's about *conceptions*. for example, good high-level language doesn't need pointers, yet low-level one needs 'em. good high-level language makes alot of checks automatically (range checking, overflow checking and so on), goot low-level language allows programmer to control what will be checked and how. good high-level language can transparently use bigints on overflow, good low-level language has clearly defined semantic of integer overflow and defined sizes for integral types. and so on. "going low-level" is not about pleasing CPU (it's not assembler), it's about writing "low-level code" -- one with pointers, manual checks and such.
Oct 17 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 17 October 2014 at 16:26:08 UTC, ketmar via 
Digitalmars-d wrote:
 i have nothing against this either. but i have alot against 
 "integral with arbitrary size" type.
Actually it makes a lot of sense to be able to reuse 16-bit library code on a 24-bit ALU. Like for loading a sound at 16-bit then process it at 24-bit.
 nope. if int is 32 but, and it's behavior is defined as 2's 
 complement
 32-bit value, it doesn't matter what register size HW has. it's
 compiler task to make this int behave right.
And that will result in slow code.
 Then again, that makes you a very careful programmer ;)
that almost turned me to serial killer.
Yeah, IIRC I "cracked" it and put it on a diskette after a while…
 i don't buying that "we'll made that pretty soon" PR. first 
 they making
 it widespreaded, then i'll start caring, not vice versa.
C++ has a workgroup on transactional memory with expertise… So, how long can you wait with planning for the future before being hit by the train? You need to be ahead of the big mover if you want to gain positions in multi-threading (which is the most important area that is up for grabs in system level programming these days).
 this is a misconception. "low level language" is not one that 
 pleases
 CPU down to bits and registers, it's about *conceptions*.
 for example, good high-level language doesn't need pointers, yet
 low-level one needs 'em.
Bad example. Low level languages need pointers because the hardware use 'em. If you have a non-standard memory model you need deal with different aspects of pointers too (like segments or bank switching). If you cannot efficiently compute existing libraries on 24-bit, 48-bit or 64-bit ALUs then the programming language is tied to a specific CPU. That is not good and it will have problems being viewed as a general system level programming language. A system level language should not force you to be overly abstract in a manner that affects performance or restricts flexibility.
Oct 17 2014
parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 16:49:10 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 i have nothing against this either. but i have alot against=20
 "integral with arbitrary size" type.
=20 Actually it makes a lot of sense to be able to reuse 16-bit=20 library code on a 24-bit ALU. Like for loading a sound at 16-bit=20 then process it at 24-bit.
and this is perfectly doable with fixed-size ints. just use 16-bit library when you can and write new code when you can't.
 nope. if int is 32 but, and it's behavior is defined as 2's=20
 complement
 32-bit value, it doesn't matter what register size HW has. it's
 compiler task to make this int behave right.
And that will result in slow code.
i prefer slow code over incorrect code. if i'll find that some code is a real bottleneck (using profiler, of course, almost noone can make a right guess here ;-), i'll write arch-dependant assembler part to replace slow code. but i prefer first to get it working after recompiling, and then starting to optimize. what i don't want is to think each time "what if int will have different size here?" that's why i'm using types from stdint.h in my C code instead of just "int", "long" and so on.
 i don't buying that "we'll made that pretty soon" PR. first=20
 they making
 it widespreaded, then i'll start caring, not vice versa.
=20 C++ has a workgroup on transactional memory with expertise=E2=80=A6 So,=20 how long can you wait with planning for the future before being=20 hit by the train?
indefinitely long if language guarantees that my existing code will continue to work as expected.
 You need to be ahead of the big mover if you want to gain=20
 positions in multi-threading (which is the most important area=20
 that is up for grabs in system level programming these days).
i don't care about positions. what i care about is language with defined behavior. besides, threads sux. ;-) anyway, it's not hard to add "transaction {}" block (from the language design POV, not from the implementor's POV). ;-)
 for example, good high-level language doesn't need pointers, yet
 low-level one needs 'em.
Bad example. Low level languages need pointers because the=20 hardware use 'em.
i really can't imagine hardware without pointers.
 If you have a non-standard memory model you=20
 need deal with different aspects of pointers too (like segments=20
 or bank switching).
this must be accessible, but hidden from me unless i explicitly ask about gory details.
 If you cannot efficiently compute existing libraries on 24-bit,=20
 48-bit or 64-bit ALUs then the programming language is tied to a=20
 specific CPU.
are you saying that 32-bit operations on 64-bit CPUs sux? than those CPUs sux. throw 'em away. besides, having guaranteed and well-defined integer sizes and overflow values is what making using such libs on different architectures possible. what *really* ties code to CPU is "int size depends of host CPU", "overflow result depends of host CPU" and other such things.
 That is not good and it will have problems being=20
 viewed as a general system level programming language.
nope. this is the problem of HW designers and compiler writers, not language problem. i still can't understand why i must write my code to please C compiler. weren't compilers invented to please *me*? i'm not going to serve the servants.
 A system level language should not force you to be overly=20
 abstract in a manner that affects performance or restricts=20
 flexibility.
system level language must provide ability to go to "CPU level" if programmer want that, but it *must* abstract away unnecessary details by default. it's way easier to have not superefficient, but working code first and continually refine it than trying to write it the hard way from the start.
Oct 17 2014
prev sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 17 October 2014 at 13:44:24 UTC, ketmar via 
Digitalmars-d wrote:
 On Fri, 17 Oct 2014 09:46:48 +0000
 via Digitalmars-d <digitalmars-d puremagic.com> wrote:
 In D (and C++) you would get:
 
 if (x < ((x+1)&0xffffffff)){…}
perfect. nice and straightforward way to do overflow checks.
Besides, the code uses x + 1, so the code is already in undefined state. It's just as wrong as the "horrible code with UB" we wère trying to avoid in the first place. So much for convincing me that it's a good idea...
Oct 18 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 18 October 2014 at 08:22:25 UTC, monarch_dodra wrote:
 On Friday, 17 October 2014 at 13:44:24 UTC, ketmar via 
 Digitalmars-d wrote:
 On Fri, 17 Oct 2014 09:46:48 +0000
 via Digitalmars-d <digitalmars-d puremagic.com> wrote:
 In D (and C++) you would get:
 
 if (x < ((x+1)&0xffffffff)){…}
perfect. nice and straightforward way to do overflow checks.
It wasn't an overflow check as ketmar suggested… It was a check that should stay true, always for this instantiation. So the wrong code is bypassed on overflow, possibly missing a termination. The code would have been correct with an optimization that set it to true or with a higher resolution register.
 Besides, the code uses x + 1, so the code is already in 
 undefined state. It's just as wrong as the "horrible code with 
 UB" we wère trying to avoid in the first place.

 So much for convincing me that it's a good idea...
Not sure if you are saying that modulo-arithmetic as a default is a bad or good idea? In D and (C++ for uint) it is modulo-arithmetic so it is defined as a circular type with at discontinuity which makes reasoning about integers harder.
Oct 18 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 18 October 2014 at 23:10:15 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 18 October 2014 at 08:22:25 UTC, monarch_dodra 
 wrote:
 Besides, the code uses x + 1, so the code is already in 
 undefined state. It's just as wrong as the "horrible code with 
 UB" we wère trying to avoid in the first place.

 So much for convincing me that it's a good idea...
Not sure if you are saying that modulo-arithmetic as a default is a bad or good idea?
Op usually suggested that all overflows should be undefined behavior, and that you could "pre-emptivelly" check for overflow with the above code. The code provided itself overflowed, so was also undefined. What I'm pointing out is that working with undefined behavior overflow is exceptionally difficult, see later.
 In D and (C++ for uint) it is modulo-arithmetic so it is 
 defined as a circular type with at discontinuity which makes 
 reasoning about integers harder.
What interesting is that overflow is only defined for unsigned integers. signed integer overflow is *undefined*, and GCC *will* optimize away any conditions that rely on it. One thing I am certain of, is that making overflow *undefined* is *much* worst than simple having modulo arithmetic. In particular, implementing trivial overflow checks is much easier for the average developper. And worst case scenario, you can still have library defined checked integers.
Oct 19 2014
next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 19 Oct 2014 09:40, "monarch_dodra via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 On Saturday, 18 October 2014 at 23:10:15 UTC, Ola Fosheim Gr=C3=B8stad wr=
ote:
 On Saturday, 18 October 2014 at 08:22:25 UTC, monarch_dodra wrote:
 Besides, the code uses x + 1, so the code is already in undefined
state. It's just as wrong as the "horrible code with UB" we w=C3=A8re tryin= g to avoid in the first place.
 So much for convincing me that it's a good idea...
Not sure if you are saying that modulo-arithmetic as a default is a bad
or good idea?
 Op usually suggested that all overflows should be undefined behavior, and
that you could "pre-emptivelly" check for overflow with the above code. The code provided itself overflowed, so was also undefined.
 What I'm pointing out is that working with undefined behavior overflow is
exceptionally difficult, see later.
 In D and (C++ for uint) it is modulo-arithmetic so it is defined as a
circular type with at discontinuity which makes reasoning about integers harder.
 What interesting is that overflow is only defined for unsigned integers.
signed integer overflow is *undefined*, and GCC *will* optimize away any conditions that rely on it.

Good thing that overflow is strictly defined in D then. You can rely on
overflowing to occur rather than be optimised away.

Iain.
Oct 19 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/19/2014 1:56 AM, Iain Buclaw via Digitalmars-d wrote:
 Good thing that overflow is strictly defined in D then. You can rely on
 overflowing to occur rather than be optimised away.
Yeah, but one has to be careful when using a backend designed for C that it doesn't use the C semantics on that anyway. (I know the dmd backend does do the D semantics.)
Oct 19 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 20 October 2014 at 06:17:40 UTC, Walter Bright wrote:
 On 10/19/2014 1:56 AM, Iain Buclaw via Digitalmars-d wrote:
 Good thing that overflow is strictly defined in D then. You 
 can rely on
 overflowing to occur rather than be optimised away.
Yeah, but one has to be careful when using a backend designed for C that it doesn't use the C semantics on that anyway.
8-I And here I was hoping that Iain was being ironic! If you want to support wrapping you could do it like this: int x = wrapcalc( y + DELTA ); And clamping: int x = clampcalc(y+ DELTA); And overflow int x = y+DELTA; if(x.status!=0){ x.status.carry… x.status.overflow… } or if( overflowed( x=a+b+c+d )){ if( overflowed( x=cast(somebigint)a+b+c+d )){ throw … } } or int x = throw_on_overflow(a+b+c+d)
Oct 20 2014
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 19 October 2014 at 08:37:54 UTC, monarch_dodra wrote:
 On Saturday, 18 October 2014 at 23:10:15 UTC, Ola Fosheim 
 Grøstad wrote:
 In D and (C++ for uint) it is modulo-arithmetic so it is 
 defined as a circular type with at discontinuity which makes 
 reasoning about integers harder.
What interesting is that overflow is only defined for unsigned integers. signed integer overflow is *undefined*, and GCC *will* optimize away any conditions that rely on it.
I don't agree with how C/C++ defines arithmetics. I think integers should exhibit monotonic behaviour over addition and multiplication and that the compiler should assume, prove or assert that assigned values are within bounds according to the programmer's confidence and specification. It is the programmers' responsibility to make sure that results stays within the type boundaries or to configure the compiler so that they will be detected. If you provide value-ranges for integers then the compiler could flag all values that are out of bounds and force the programmer to explicitly cast them back to the restricted type. If you default to 64 bit addition then getting overflows within simple expressions is not the most common problem.
 One thing I am certain of, is that making overflow *undefined* 
 is *much* worst than simple having modulo arithmetic. In 
 particular, implementing trivial overflow checks is much easier 
 for the average developper. And worst case scenario, you can 
 still have library defined checked integers.
One big problem with that view is that "a < a+1" is not an overflow check, it is the result of aliasing. It should be optimized to true. That is the only sustainable interpretation from a correctness point of view. Even if "a < a+1" is meant to be an overflow check it completely fails if a is a short since it is promoted to int. So this is completely stuck in 32-bit land. In C++ you should default to int and avoid uint unless you do bit manipulation according to the C++ designers. There are three reasons: speed, portability to new hardware and correctness.
Oct 19 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 19 October 2014 at 09:56:44 UTC, Ola Fosheim Grøstad 
wrote:
 In C++ you should default to int and avoid uint unless you do 
 bit manipulation according to the C++ designers.

 There are three reasons: speed, portability to new hardware and 
 correctness.
Speed: How so? Portability: One issue to keep in mind is that C works on *tons* of hardware. C allows hardware to follow either two's complement, or one's complement. This means that, at best, signed overflow can be implementation defined, but not defined by spec. Unfortunately, it appears C decided to outright go the undefined way. Correctness: IMO, I'm not even sure. Yeah, use int for numbers, but stick to size_t for indexing. I've seen too many bugs on x64 software when data becomes larger than 4G...
Oct 19 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 19 October 2014 at 10:22:37 UTC, monarch_dodra wrote:
 Speed: How so?
All kind of situations where you can prove that "expression1 > expression2" holds, but have no bounds on the variables.
 Portability: One issue to keep in mind is that C works on 
 *tons* of hardware. C allows hardware to follow either two's 
 complement, or one's complement. This means that, at best, 
 signed overflow can be implementation defined, but not defined 
 by spec. Unfortunately, it appears C decided to outright go the 
 undefined way.
I think you might be able to make it defined like this: 1. overlflow is illegal and should not limit reasoning about monoticity 2. after overflow accessing a derived result can lead to a value where the overflow either lead to a higher bit representation which was propagated or lead to a value which was truncated. This is slightly different from "undefined". :-)
 Correctness: IMO, I'm not even sure. Yeah, use int for numbers, 
 but stick to size_t for indexing. I've seen too many bugs on 
 x64 software when data becomes larger than 4G...
Sure, getting C types right and correct is tedious. The type system does not help you a whole lot. And D and C++ does not make it a lot better. Maybe the implicit conversions is a bad thing. I machine language there is often no difference between a signed and unsigned instructions which can be handy, but the typedness of multiplication is actually better than in C languages "u64 mul(u32 a,u32 b)". Multiplication over int is dangerous! Before compilers got good at optimization I viewed C as an annoying assembler. I assumed wrapping behaviour and wanted an easy way to reinterpret_cast between ints and uints (in C it gets rather ugly). These days I take the view that programmers should be explicit about "bit-crushing" operations. Maybe even for multiplication. If you are forced to explicitly truncate() when the compiler fails to rule out overflow then the problem areas also become more visible in the source code: "uint r = a*b/N" might overflow badly even if r is large enough to hold the result. "uint r = truncate(a*b/N)" makes you aware that you are on thin ice.
Oct 19 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/17/2014 2:46 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 It isn't even obvious that a byte should be 8 bits,
Oh come on! http://dlang.org/type.html
Oct 18 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 18 October 2014 at 23:45:37 UTC, Walter Bright wrote:
 On 10/17/2014 2:46 AM, "Ola Fosheim Grøstad" 
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 It isn't even obvious that a byte should be 8 bits,
Oh come on!
Hey, that was a historically motivated reflection on the smallest addressable unit. Not obvious that it should be 8 bit. :9
Oct 18 2014
prev sibling next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 16 October 2014 22:00, bearophile via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
*cough* GDC *cough* :o)
Oct 17 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/17/14, 2:53 AM, Iain Buclaw via Digitalmars-d wrote:
 On 16 October 2014 22:00, bearophile via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
*cough* GDC *cough* :o)
Do you mean ubsan will work with gdc? -- Andrei
Oct 17 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 17 Oct 2014 08:08:34 -0700
Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 Do you mean ubsan will work with gdc? -- Andrei
as far as i can understand, ubsan is GCC feature. not "GCC C compiler", but "GNU Compiler Collection". it works on IR representations, so GDC should be able to use ubsan almost automatically, without significant efforts from Iain. at least it looks like this.
Oct 17 2014
prev sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 17 October 2014 16:08, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 10/17/14, 2:53 AM, Iain Buclaw via Digitalmars-d wrote:
 On 16 October 2014 22:00, bearophile via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 Just found with Reddit. C seems one step ahead of D with this:


 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
*cough* GDC *cough* :o)
Do you mean ubsan will work with gdc? -- Andrei
It doesn't out of the box, but adding in front-end support is a small codegen addition for each plugin you wish to support. The rest is taken care of by GCC. Iain.
Oct 17 2014
prev sibling next sibling parent reply "eles" <eles eles.com> writes:
On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

 Bye,
 bearophile
"Not every software bug has as serious consequences as seen in the Ariane 5 rocket crash." "if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program." The latter won't really solve the former...
Oct 17 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/17/14, 3:51 AM, eles wrote:
 On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/


 Bye,
 bearophile
"Not every software bug has as serious consequences as seen in the Ariane 5 rocket crash." "if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program." The latter won't really solve the former...
Still a step forward. -- Andrei
Oct 17 2014
parent "eles" <eles eles.com> writes:
On Friday, 17 October 2014 at 15:10:33 UTC, Andrei Alexandrescu 
wrote:
 On 10/17/14, 3:51 AM, eles wrote:
 On Thursday, 16 October 2014 at 21:00:18 UTC, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:
 Still a step forward. -- Andrei
While I agree, IIRC, Ariane was never tested in that particular flight configuration that caused the bug (which was not a Heisenbug, as it was easy to reproduce but, you know, *afterwards*). Now, imagine that Ariane in space encountering a runtime error. Go back to Earth, anyone?... I specifically referred to the crash itself.
Oct 17 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/16/2014 2:00 PM, bearophile wrote:
 Just found with Reddit. C seems one step ahead of D with this:

 http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
On the other hand, D is one step ahead of C with many of those (they are part of the language, not an add-on tool). Anyhow, for the remainder, https://issues.dlang.org/show_bug.cgi?id=13636
Oct 18 2014