digitalmars.D - Treating the abusive unsigned syndrome
- Andrei Alexandrescu (66/66) Nov 25 2008 D pursues compatibility with C and C++ in the following manner: if a
- Denis Koroskin (8/74) Nov 25 2008 I think it's fine. That's the way the LLVM stores the integral values
- Andrei Alexandrescu (7/104) Nov 25 2008 Yah, but at least you actively asked for an unsigned. Compare and
- bearophile (17/26) Nov 25 2008 I didn't know of such "support" for C++ syntax too, isn't such "support"...
- bearophile (4/5) Nov 25 2008 Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake...
- Steven Schveighoffer (2/6) Nov 25 2008 lol!!!
- bearophile (40/41) Nov 25 2008 I know, I know... :-) But when people do errors so often, the error is e...
- Nick Sabalausky (8/58) Nov 25 2008 If we ever get extension methods, then maybe something along these lines...
- KennyTM~ (3/74) Nov 25 2008 Already works:
- Nick Sabalausky (5/79) Nov 26 2008 Oh, right. For some stupid reason I was forgetting that the param would
- bearophile (4/7) Nov 26 2008 From the len() code I have posted you can see there are other places whe...
- Andrei Alexandrescu (6/15) Nov 26 2008 I'm rather weary of a short and suggestive name that embodies a linear
- bearophile (7/11) Nov 26 2008 I remember parts of that discussion, and I like your general rule, and I...
- Andrei Alexandrescu (3/19) Nov 26 2008 If it's used often it shouldn't have linear complexity :o).
- Christopher Wright (12/30) Nov 26 2008 My personal rules of optimization:
- Kagamin (11/12) Nov 26 2008 hmm...
- Andrei Alexandrescu (6/19) Nov 25 2008 It's worthwhile keeping length an unsigned type if we can convincingly
- Andrei Alexandrescu (13/13) Nov 25 2008 I remembered a couple more details. The names bits8, bits16, bits32, and...
- Steven Schveighoffer (14/26) Nov 25 2008 One other thing to contemplate:
- Andrei Alexandrescu (10/43) Nov 25 2008 Good point. There's no (or not much) arithmetic mixing bits32 and some
- Sergey Gromov (16/31) Nov 25 2008 I'll add more. :)
- Andrei Alexandrescu (7/43) Nov 25 2008 Having semantics depend so heavily and confusingly on a compiler switch
- Sergey Gromov (5/18) Nov 25 2008 One of us should be missing something. There was no 'different
- Andrei Alexandrescu (3/22) Nov 25 2008 Sorry, I misunderstood.
- Russell Lewis (20/103) Nov 25 2008 I'm of the opinion that we should make mixed-sign operations a
- Andrei Alexandrescu (11/33) Nov 25 2008 (You may want to check your system's date, unless of course you traveled...
- bearophile (5/10) Nov 25 2008 That can be solved making array.length signed.
- Nick Sabalausky (26/34) Nov 25 2008 I disagree. If you start using that as a solution, then you may as well
- Kagamin (2/7) Nov 26 2008 Well... cutting out range can be no problem, after all a thought was flo...
- Daniel de Kok (8/16) Nov 25 2008 Is that conceptually clean/clear? (If so, I'd like to request an array o...
- Ary Borenszweig (4/12) Nov 26 2008 I agree. I proposed this some time ago. For example, even though C# has
- Sean Kelly (6/17) Nov 25 2008 Perhaps not, but the fact that constants are signed integers has been
- Andrei Alexandrescu (9/26) Nov 25 2008 Well with constants we can do many tricks; I mentioned an extreme
- Michel Fortin (24/34) Nov 26 2008 Then the problem is that integer literals are of a specific type. Just
- Andrei Alexandrescu (16/49) Nov 26 2008 Well that at best takes care of _some_ operations involving constants,
- Michel Fortin (34/55) Nov 26 2008 How does it not solve the problem. array.length is of type uint, 1 is
- Don (10/41) Nov 26 2008 Actually, there's no solution.
- Andrei Alexandrescu (47/95) Nov 26 2008 There is. We need to find the block of marble it's in and then chip the
- Michel Fortin (19/33) Nov 26 2008 That's because you're relying on a specific behaviour for overflows and
- Denis Koroskin (53/58) Nov 26 2008 Sure, it shouldn't compile. But explicit casting to either type won't
- Andrei Alexandrescu (29/99) Nov 26 2008 But "silently" and "putting a cast" don't go together. It's the cast
- Denis Koroskin (16/106) Nov 26 2008 Right, it is better. Problem is, you don't want to put checks like
- Don (30/82) Nov 27 2008 Here I think we have a fundamental disagreement: what is an 'unsigned
- Andrei Alexandrescu (10/54) Nov 27 2008 In fact we are in agreement. C tries to make it usable as both, and
- Don (12/72) Nov 27 2008 Well, it does make unsigned numbers (case (B)) quite obscure and
- Andrei Alexandrescu (21/97) Nov 27 2008 I think we're heading towards an impasse. We wouldn't want to make
- KennyTM~ (4/111) Nov 27 2008 So you mean long * int (e.g. 1234567890123L * 2) will return an int
- KennyTM~ (3/118) Nov 27 2008 Em, or do you mean the tightest type that can represent all possible
- Andrei Alexandrescu (11/127) Nov 27 2008 The tightest type possible depends on the operation. In that doctrine,
- Andrei Alexandrescu (24/156) Nov 27 2008 I just remembered a problem with simplemindedly going with the tightest
- Michel Fortin (31/35) Nov 28 2008 I think that'd be a must. Otherwise how would you define your own
- Don (25/130) Nov 28 2008 The problem with that, is that you're then forcing the 'unsigned is a
- Andrei Alexandrescu (15/45) Nov 28 2008 Sounds good. One important consideration is that modulo arithmetic is
- Don (12/66) Nov 28 2008 It's close, but how can code such as:
- Andrei Alexandrescu (10/80) Nov 28 2008 Code may be riddled with subtraction of lengths, but seems to be working...
- Don (6/89) Nov 28 2008 Yes. I think much existing code would fail with sizes over 2GB, though.
- Fawzi Mohamed (12/42) Dec 01 2008 I found a couple of instances where to compare addresses simply a-b was
- Derek Parnell (7/13) Nov 28 2008 It could be transformed by the compiler into more something like ...
- Frits van Bommel (7/18) Nov 28 2008 Then it'd have different behavior from
- Derek Parnell (17/24) Nov 28 2008 I see the problem a little differently. To me, "x.length - y.length" is
- Sean Kelly (5/11) Nov 28 2008 This is why I never understood ptrdiff_t in C. Having to choose between...
- Sean Kelly (10/16) Nov 25 2008 I'll address your actual suggestion separately, but personally, I always...
- Don (26/46) Nov 26 2008 I think that most of these problems are caused by C enforcing a foolish
- Andrei Alexandrescu (17/65) Nov 26 2008 Yah, polysemy will take care of the constants. It's also rather easy to
- Sean Kelly (10/14) Nov 26 2008 What /is/ the appropriate type here? For example:
- Andrei Alexandrescu (25/42) Nov 26 2008 There are several schools of thought (for the lack of a better phrase):
- Lars Kyllingstad (18/48) Nov 26 2008 How about 1.5, the Somewhat Practical but Still Purist Mathematician? He...
- Lars Kyllingstad (5/57) Nov 26 2008 Another point: nint would also be implicitly castable to uint and so on,...
- Kagamin (2/7) Nov 27 2008 I thought, mathematics doesn't distinguish between, say, natural 5, inte...
- Andrei Alexandrescu (3/16) Nov 27 2008 Right, but the notion of set closedness for an operation comes from math...
- Sergey Gromov (6/30) Nov 26 2008 I'm totally with Don here. In math, natural numbers are a subset if
- Andrei Alexandrescu (7/37) Nov 26 2008 That's also a possibility - consider unsigned types just "bags of bits"
- Sergey Gromov (6/44) Nov 26 2008 I guess so. Actually, simply disallowing signed<=>unsigned cast and
- bearophile (13/19) Nov 26 2008 I don't know what the solution is, but I am very happy to see that in th...
- Kagamin (4/10) Nov 27 2008 I don't think that large integers know or respect computers-specific int...
- Andrei Alexandrescu (9/25) Nov 27 2008 Problem is there is an odd jump whenever the sign bit gets into play. An...
- Walter Bright (3/6) Nov 26 2008 SafeD is about memory safety, i.e. no corrupted memory. Dealing with
- Sean Kelly (6/13) Nov 26 2008 This inspired me to think about where I use uint and I realized that I
- Andrei Alexandrescu (11/27) Nov 26 2008 For the record, I use unsigned types wherever there's a non-negative
- Sean Kelly (8/32) Nov 26 2008 To be fair, I generally use unsigned numbers for values that are
- Denis Koroskin (89/120) Nov 26 2008 If they can be more than 2Gb, why can't they be more than 4GB? It is
- Sean Kelly (15/62) Nov 27 2008 Bigger than 4GB on a 32-bit system? Files perhaps, but I'm talking
- Kagamin (5/7) Nov 26 2008 1) I see no danger here.
- Andrei Alexandrescu (3/12) Nov 26 2008 I didn't want runtime checks inserted, just to tighten compilation rules...
- bearophile (4/5) Nov 26 2008 The compiler may use both :-)
- Kagamin (2/3) Nov 26 2008 Why do you want to turn D into Python? You already has one. Just write i...
- bearophile (19/22) Nov 26 2008 The mistake I have shown of using "&&" instead of "&" or vice-versa, and...
- Kagamin (2/3) Nov 26 2008 copying G++ is not always a good idea :) As I remember this alternative ...
- Kagamin (2/5) Nov 26 2008 that thread is about an extra compiler warning (which is always good), n...
- bearophile (7/9) Nov 26 2008 You seem unaware of the current stance of Walter towards warnings. And p...
- Nick Sabalausky (3/12) Nov 26 2008 Python has other issues.
- Michel Fortin (19/20) Nov 26 2008 Just a note here, because it seems to me you're confusing two issues
- Andrei Alexandrescu (10/29) Nov 26 2008 It's also a problem of signedness, considering that int can hold the
- Nick Sabalausky (10/31) Nov 26 2008 I'd love to see D get the ability to turn on/off runtime range checking,...
- Tomas Lindquist Olsen (15/15) Nov 26 2008 I'm not really sure what I think about all this. I try to always insert
- Christopher Wright (3/23) Nov 26 2008 On the other hand, the CPU can report on integer overflow, so you could
- Simen Kjaeraas (13/13) Nov 26 2008 The more I read about this, the more I am convinced that removing the
- Derek Parnell (56/66) Nov 27 2008 Interesting ... but I don't think that this should be the principle
- Andrei Alexandrescu (13/75) Nov 27 2008 These two principle are not necessarily at odds with each other. The
- Derek Parnell (12/48) Nov 27 2008 I think we are saying the same thing. If the C code compiles AND if it h...
- Andrei Alexandrescu (15/60) Nov 27 2008 Well here are two objective at odds with each other. One is the
- bearophile (10/13) Nov 28 2008 Some of the purposes of a good arithmetic are:
- Kagamin (3/9) Nov 28 2008 Yes, giving somethink up always feels like giving something up. But can ...
D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -u Logic operations &, |, and ^ also yield unsigned, but such cases are less abusive because at least the operation wasn't arithmetic in the first place. Comparing for equality is also quite a conundrum - should minus two billion compare equal to 2_294_967_296? I'll ignore these for now and focus on (1) - (6). So far we haven't found a solid solution to this problem that at the same time allows "good" code pass through, weeds out "bad" code, and is compatible with C and C++. The closest I got was to have the compiler define the following internal types: __intuint __longulong I've called them "dual-signed integers" in the past, but let's try the shorter "undecided sign". Each of these is a subtype of both the signed and the unsigned integral in its name, e.g. __intuint is a subtype of both int and uint. (Originally I thought of defining __byteubyte and __shortushort as well but dropped them in the interest of simplicity.) The sign-ambiguous operations (1) - (6) yield __intuint if no operand size was larger than 32 bits, and __longulong otherwise. Undecided sign types define their own operations. Let x and y be values of undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous integral (the size is that of the largest operand). However, the other operators do not work on sign-ambiguous integrals, e.g. x / y would not compile because you must decide what sign x and y should have prior to invoking the operation. (Rationale: multiplication/division work differently depending on the signedness of their operands). User code cannot define a symbol of sign-ambiguous type, e.g. auto a = u + i; would not compile. However, given that __intuint is a subtype of both int and uint, it can be freely converted to either whenever there's no ambiguity: int a = u + i; // fine uint b = u + i; // fine The advantage of this scheme is that it weeds out many (most? all?) surprises and oddities caused by the abusive unsigned rule of C and C++. The disadvantage is that it is more complex and may surprise the novice in its own way by refusing to compile code that looks legit. At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. My strong belief is that we need to address this mess somehow, which type inference will only make more painful (in the hand of the beginner, auto can be a quite dangerous tool for wrong belief propagation). I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type. Your opinions, comments, and suggestions for improvements would as always be welcome. Andrei
Nov 25 2008
On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -u Logic operations &, |, and ^ also yield unsigned, but such cases are less abusive because at least the operation wasn't arithmetic in the first place. Comparing for equality is also quite a conundrum - should minus two billion compare equal to 2_294_967_296? I'll ignore these for now and focus on (1) - (6). So far we haven't found a solid solution to this problem that at the same time allows "good" code pass through, weeds out "bad" code, and is compatible with C and C++. The closest I got was to have the compiler define the following internal types: __intuint __longulong I've called them "dual-signed integers" in the past, but let's try the shorter "undecided sign". Each of these is a subtype of both the signed and the unsigned integral in its name, e.g. __intuint is a subtype of both int and uint. (Originally I thought of defining __byteubyte and __shortushort as well but dropped them in the interest of simplicity.) The sign-ambiguous operations (1) - (6) yield __intuint if no operand size was larger than 32 bits, and __longulong otherwise. Undecided sign types define their own operations. Let x and y be values of undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous integral (the size is that of the largest operand). However, the other operators do not work on sign-ambiguous integrals, e.g. x / y would not compile because you must decide what sign x and y should have prior to invoking the operation. (Rationale: multiplication/division work differently depending on the signedness of their operands). User code cannot define a symbol of sign-ambiguous type, e.g. auto a = u + i; would not compile. However, given that __intuint is a subtype of both int and uint, it can be freely converted to either whenever there's no ambiguity: int a = u + i; // fine uint b = u + i; // fine The advantage of this scheme is that it weeds out many (most? all?) surprises and oddities caused by the abusive unsigned rule of C and C++. The disadvantage is that it is more complex and may surprise the novice in its own way by refusing to compile code that looks legit. At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. My strong belief is that we need to address this mess somehow, which type inference will only make more painful (in the hand of the beginner, auto can be a quite dangerous tool for wrong belief propagation). I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type. Your opinions, comments, and suggestions for improvements would as always be welcome. AndreiI think it's fine. That's the way the LLVM stores the integral values internally, IIRC. But what is the type of -u? If it is undecided, then the following should compile: uint u = 100; uint s = -u; // undecided implicitly convertible to unsigned
Nov 25 2008
Denis Koroskin wrote:On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yah, but at least you actively asked for an unsigned. Compare and contrast with surprises such as: uint a = 5; writeln(-a); // this won't print -5 Such code would be disallowed in the undecided-sign regime. AndreiD pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -u Logic operations &, |, and ^ also yield unsigned, but such cases are less abusive because at least the operation wasn't arithmetic in the first place. Comparing for equality is also quite a conundrum - should minus two billion compare equal to 2_294_967_296? I'll ignore these for now and focus on (1) - (6). So far we haven't found a solid solution to this problem that at the same time allows "good" code pass through, weeds out "bad" code, and is compatible with C and C++. The closest I got was to have the compiler define the following internal types: __intuint __longulong I've called them "dual-signed integers" in the past, but let's try the shorter "undecided sign". Each of these is a subtype of both the signed and the unsigned integral in its name, e.g. __intuint is a subtype of both int and uint. (Originally I thought of defining __byteubyte and __shortushort as well but dropped them in the interest of simplicity.) The sign-ambiguous operations (1) - (6) yield __intuint if no operand size was larger than 32 bits, and __longulong otherwise. Undecided sign types define their own operations. Let x and y be values of undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous integral (the size is that of the largest operand). However, the other operators do not work on sign-ambiguous integrals, e.g. x / y would not compile because you must decide what sign x and y should have prior to invoking the operation. (Rationale: multiplication/division work differently depending on the signedness of their operands). User code cannot define a symbol of sign-ambiguous type, e.g. auto a = u + i; would not compile. However, given that __intuint is a subtype of both int and uint, it can be freely converted to either whenever there's no ambiguity: int a = u + i; // fine uint b = u + i; // fine The advantage of this scheme is that it weeds out many (most? all?) surprises and oddities caused by the abusive unsigned rule of C and C++. The disadvantage is that it is more complex and may surprise the novice in its own way by refusing to compile code that looks legit. At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. My strong belief is that we need to address this mess somehow, which type inference will only make more painful (in the hand of the beginner, auto can be a quite dangerous tool for wrong belief propagation). I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type. Your opinions, comments, and suggestions for improvements would as always be welcome. AndreiI think it's fine. That's the way the LLVM stores the integral values internally, IIRC. But what is the type of -u? If it is undecided, then the following should compile: uint u = 100; uint s = -u; // undecided implicitly convertible to unsigned
Nov 25 2008
Few general comments. Andrei Alexandrescu:D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.I didn't know of such "support" for C++ syntax too, isn't such "support" for C syntax only? D has very little to share with C++. This rule is good because you can take a piece of C code and convert it to D with less work and fewer surprises. I have already translated large pieces of C code to D, so I appreciate this. But in several things C syntax and semantics is too much error prone or "wrong", so sometimes it can also become a significant disadvantage for a language like D that tries to be much less error-prone than C. One solution is to "disable" some of the more error-prone syntax allowed in C, turning it into a compilation error. For example I have seen newbies write bugs caused by leaving & where a && was necessary. In such case just adopting "and" and making "&&" a syntax error solves the problem and doesn't lead to bugs when you convert C code to D (you just use a search&replace, replacing && with and on the code). In other situations it may be less easy to find such kind of solutions (that is invent an alternative syntax/semantics and make the C one a syntax error), in such cases I think it's better to discuss each one of such situations independently. In some situations we can even break the standard way D pursues compatibility, for the sake of avoiding bugs and making the semantics better.The disadvantage is that it is more complexIt's not really more complex, it just makes visible some hidden complexity that is already present and inherent of the signed/unsigned nature of the numbers. It also follows the Python Zen rule: "In the face of ambiguity, refuse the temptation to guess."and may surprise the novice in its own way by refusing to compile code that looks legit.A compile error is better than a potential runtime bug.Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem.I'm not a newbie of programming, but in the last year I have put in my code two bugs related to this, so I suggest to find ways to avoid this silly situation. I think the first bug was something like: if (arr.lenght > x) ... where x was a signed int with value -5 (this specific bug can also be solved making array length a signed value. What's the point of making it unsigned in the first place? I have seen that in D it's safer to use signed values everywhere you don't strictly need an unsigned value. And that length doesn't need to be unsigned). Beside the unsigned/signed problems discussed here, it may be positive to list some of other situations where the C syntax/semantics may lead to bugs. For example, does fixes the C semantics of % (modulo) operation? Another example: in both Pascal and Python3 there are two different operators for the division, one for the FP one and one for the integer one (in Pascal they are / and div, in Python3 they are / and // ).. So can it be positive for D too to define two different operators for such purpose? Bye, bearophile
Nov 25 2008
bearophile:if (arr.lenght > x) ...Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake of mine, usually the code editor allows me to avoid this error because the right one becomes colored. That's why in the past I have suggested something simpler and shorter like "len" (others have suggested "size" instead, it too is acceptable to me). Bye, bearophile
Nov 25 2008
"bearophile" wrotebearophile:lol!!!if (arr.lenght > x) ...Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake of mine
Nov 25 2008
Steven Schveighoffer:lol!!!I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings. Bye, bearophile
Nov 25 2008
"bearophile" <bearophileHUGS lycos.com> wrote in message news:gghc97$1mfo$1 digitalmars.com...Steven Schveighoffer:If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }lol!!!I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
Nov 25 2008
Nick Sabalausky wrote:"bearophile" <bearophileHUGS lycos.com> wrote in message news:gghc97$1mfo$1 digitalmars.com...Already works: uint len(A) (in A x) { return x.length; }Steven Schveighoffer:If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }lol!!!I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
Nov 25 2008
"KennyTM~" <kennytm gmail.com> wrote in message news:ggipu6$26mr$1 digitalmars.com...Nick Sabalausky wrote:Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint)."bearophile" <bearophileHUGS lycos.com> wrote in message news:gghc97$1mfo$1 digitalmars.com...Already works: uint len(A) (in A x) { return x.length; }Steven Schveighoffer:If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }lol!!!I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
Nov 26 2008
Nick Sabalausky:Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophile
Nov 26 2008
bearophile wrote:Nick Sabalausky:I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n). AndreiOh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophile
Nov 26 2008
Andrei Alexandrescu:I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc. But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code. Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators. So in this specific case I think it's acceptable to break your general rule, for practical reasons. Bye, bearophile
Nov 26 2008
bearophile wrote:Andrei Alexandrescu:If it's used often it shouldn't have linear complexity :o). AndreiI'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc. But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code. Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators. So in this specific case I think it's acceptable to break your general rule, for practical reasons. Bye, bearophile
Nov 26 2008
Andrei Alexandrescu wrote:bearophile wrote:My personal rules of optimization: - I don't know what's slow. - I don't know what's called often enough to be worth speeding up. - Most of the time, my data sets are small. If getting the length of an array were a linear operation, that wouldn't much affect any of my code. Most of my arrays are probably no larger than twenty elements, and I don't often need to get their lengths. If I need to change data structures for better performance, I'd like to be able to replace them (or switch to generators) without undo effort. Things like changing function names according to the algorithmic complexity of the implementation just hurts.Nick Sabalausky:I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophileAndrei
Nov 26 2008
bearophile Wrote:From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields.hmm... import std.stdio, std.algorithm; void main() { bool pred(int x){ return x>2; } auto counter=(int count, int x){ return pred(x)?count+1:count; }; int[] a=[0,1,2,3,4]; auto lazylen=reduce!(counter)(0,a); writeln(lazylen); //2 }
Nov 26 2008
bearophile wrote:Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem.I'm not a newbie of programming, but in the last year I have put in my code two bugs related to this, so I suggest to find ways to avoid this silly situation. I think the first bug was something like: if (arr.lenght > x) ...where x was a signed int with value -5 (this specific bug can also be solved making array length a signed value. What's the point of making it unsigned in the first place? I have seen that in D it's safer to use signed values everywhere you don't strictly need an unsigned value. And that length doesn't need to be unsigned).It's worthwhile keeping length an unsigned type if we can convincingly sell unsigned types as models of natural numbers. With the current rules, we can't make a convincing argument. But if we do manage to improve the rules, then we'll all be better off. Andrei
Nov 25 2008
I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. So we're contemplating: (a) Add bits8, bits16, bit32, bits64 public types. (b) Add bit32, bits64 public types. (c) Add bits8, bits16, bit32, bits64 compiler-internal types. (d) Add bit32, bits64 compiler-internal types. Make your pick or add more choices! Andrei
Nov 25 2008
"Andrei Alexandrescu" wroteI remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. So we're contemplating: (a) Add bits8, bits16, bit32, bits64 public types. (b) Add bit32, bits64 public types. (c) Add bits8, bits16, bit32, bits64 compiler-internal types. (d) Add bit32, bits64 compiler-internal types. Make your pick or add more choices!One other thing to contemplate: What happens if you add a bits32 to a bits64, long, or ulong value? This needs to be illegal since you don't know whether to sign-extend or not. Or you could reinterpret the expression to promote the original types to 64-bit first? This makes the version with 8 and 16 bit types less attractive. Another alternative is to select the bits type based on the entire expression. Of course, you'd have to disallow them as public types. And you'd want to do some special optimizations. You could represent it conceptually as calculating for all the bits types until the one that is decided is used, and then the compiler can optimize out the unused ones, which would at least keep it context-free. -Steve
Nov 25 2008
Steven Schveighoffer wrote:"Andrei Alexandrescu" wroteGood point. There's no (or not much) arithmetic mixing bits32 and some 64-bit integral because it's unclear whether extending the bits32 operand should extend the sign bit or not.I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. So we're contemplating: (a) Add bits8, bits16, bit32, bits64 public types. (b) Add bit32, bits64 public types. (c) Add bits8, bits16, bit32, bits64 compiler-internal types. (d) Add bit32, bits64 compiler-internal types. Make your pick or add more choices!One other thing to contemplate: What happens if you add a bits32 to a bits64, long, or ulong value? This needs to be illegal since you don't know whether to sign-extend or not. Or you could reinterpret the expression to promote the original types to 64-bit first?This makes the version with 8 and 16 bit types less attractive. Another alternative is to select the bits type based on the entire expression. Of course, you'd have to disallow them as public types. And you'd want to do some special optimizations. You could represent it conceptually as calculating for all the bits types until the one that is decided is used, and then the compiler can optimize out the unused ones, which would at least keep it context-free. -SteveThat's the intent of defining arithmetic on sign-ambiguous values. The type information propagates in a complex expression. I haven't heard of typechecking on entire expression patterns and I think it would be a rather unclean technique (it means either that there are values that you can't tell the type of, or that a given value has a context-dependent type). Andrei
Nov 25 2008
Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. So we're contemplating: (a) Add bits8, bits16, bit32, bits64 public types. (b) Add bit32, bits64 public types. (c) Add bits8, bits16, bit32, bits64 compiler-internal types. (d) Add bit32, bits64 compiler-internal types. Make your pick or add more choices!I'll add more. :) The problem with signed/unsigned types is that neither int nor uint is a sub-type of one another. They're essentially incompatible. Therefore a possible solution is: 1. Disallow implicit signed <=> unsigned conversion. 2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior: uint opAdd(int, uint) uint opAdd(uint, int) ulong opAdd(long, ulong) etc. This way you can even implement compatibility levels: only C-style additions, or additions with multiplications, or complete compatibility including the original signed/unsigned comparison behavior.
Nov 25 2008
Sergey Gromov wrote:Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:I forgot to mention that that's implied in the bitsNN approach too.I remembered a couple more details. The names bits8, bits16, bits32, and bits64 were a possible choice for undecided-sign integrals. Walter and I liked that quite some. Walter also suggested that we make those actually full types accessible to programmers. We both were concerned that they'd add to the already large panoply of integral types in D. Dropping bits8 and bits16 would reduce bloating at the cost of consistency. So we're contemplating: (a) Add bits8, bits16, bit32, bits64 public types. (b) Add bit32, bits64 public types. (c) Add bits8, bits16, bit32, bits64 compiler-internal types. (d) Add bit32, bits64 compiler-internal types. Make your pick or add more choices!I'll add more. :) The problem with signed/unsigned types is that neither int nor uint is a sub-type of one another. They're essentially incompatible. Therefore a possible solution is: 1. Disallow implicit signed <=> unsigned conversion.2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior: uint opAdd(int, uint) uint opAdd(uint, int) ulong opAdd(long, ulong) etc.Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.This way you can even implement compatibility levels: only C-style additions, or additions with multiplications, or complete compatibility including the original signed/unsigned comparison behavior.I don't think we can pursue such a path. Andrei
Nov 25 2008
Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:Sergey Gromov wrote:One of us should be missing something. There was no 'different semantics' in my proposal. The code either compiles and behaves exactly like in C or does not compile at all. The amount of code which compiles or fails depends on a compiler switch, not semantics.2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior: uint opAdd(int, uint) uint opAdd(uint, int) ulong opAdd(long, ulong) etc.Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
Nov 25 2008
Sergey Gromov wrote:Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:Sorry, I misunderstood. AndreiSergey Gromov wrote:One of us should be missing something. There was no 'different semantics' in my proposal. The code either compiles and behaves exactly like in C or does not compile at all. The amount of code which compiles or fails depends on a compiler switch, not semantics.2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior: uint opAdd(int, uint) uint opAdd(uint, int) ulong opAdd(long, ulong) etc.Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
Nov 25 2008
I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code. IMHO, it's a mistake to have implicit casts that lose information. Want to hear a funny/sad, but somewhat related story? I was chasing down a segfault recently at work. I hunted and hunted, and finally found out that the pointer returned from malloc() was bad. I figured that I was overwriting the heap, right? So I added tracing and debugging everywhere...no luck. I finally, in desperation, included <stdlib.h> to the source file (there was a warning about malloc() not being prototyped)...and the segfaults vanished!!! The problem was that the xlc compiler, when it doesn't have the prototype for a function, assumes that it returns int...but int is 32 bits. Moreover, the compiler was happily implicitly casting that int to a pointer...which was 64 bits. The compiler was silently cropping the top 32 bits off my pointers. And it all was a "feature" to make programming "easier." Russ Andrei Alexandrescu wrote:D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -u Logic operations &, |, and ^ also yield unsigned, but such cases are less abusive because at least the operation wasn't arithmetic in the first place. Comparing for equality is also quite a conundrum - should minus two billion compare equal to 2_294_967_296? I'll ignore these for now and focus on (1) - (6). So far we haven't found a solid solution to this problem that at the same time allows "good" code pass through, weeds out "bad" code, and is compatible with C and C++. The closest I got was to have the compiler define the following internal types: __intuint __longulong I've called them "dual-signed integers" in the past, but let's try the shorter "undecided sign". Each of these is a subtype of both the signed and the unsigned integral in its name, e.g. __intuint is a subtype of both int and uint. (Originally I thought of defining __byteubyte and __shortushort as well but dropped them in the interest of simplicity.) The sign-ambiguous operations (1) - (6) yield __intuint if no operand size was larger than 32 bits, and __longulong otherwise. Undecided sign types define their own operations. Let x and y be values of undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous integral (the size is that of the largest operand). However, the other operators do not work on sign-ambiguous integrals, e.g. x / y would not compile because you must decide what sign x and y should have prior to invoking the operation. (Rationale: multiplication/division work differently depending on the signedness of their operands). User code cannot define a symbol of sign-ambiguous type, e.g. auto a = u + i; would not compile. However, given that __intuint is a subtype of both int and uint, it can be freely converted to either whenever there's no ambiguity: int a = u + i; // fine uint b = u + i; // fine The advantage of this scheme is that it weeds out many (most? all?) surprises and oddities caused by the abusive unsigned rule of C and C++. The disadvantage is that it is more complex and may surprise the novice in its own way by refusing to compile code that looks legit. At the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers. My strong belief is that we need to address this mess somehow, which type inference will only make more painful (in the hand of the beginner, auto can be a quite dangerous tool for wrong belief propagation). I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type. Your opinions, comments, and suggestions for improvements would as always be welcome. Andrei
Nov 25 2008
(You may want to check your system's date, unless of course you traveled in time.) Russell Lewis wrote:I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.IMHO, it's a mistake to have implicit casts that lose information.Hear, hear.Want to hear a funny/sad, but somewhat related story? I was chasing down a segfault recently at work. I hunted and hunted, and finally found out that the pointer returned from malloc() was bad. I figured that I was overwriting the heap, right? So I added tracing and debugging everywhere...no luck. I finally, in desperation, included <stdlib.h> to the source file (there was a warning about malloc() not being prototyped)...and the segfaults vanished!!! The problem was that the xlc compiler, when it doesn't have the prototype for a function, assumes that it returns int...but int is 32 bits. Moreover, the compiler was happily implicitly casting that int to a pointer...which was 64 bits. The compiler was silently cropping the top 32 bits off my pointers. And it all was a "feature" to make programming "easier."Good story for reminding ourselves of the advantages of type safety! Andrei
Nov 25 2008
Andrei Alexandrescu:The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.That can be solved making array.length signed. Can you list few other annoying situations? Bye, bearophile
Nov 25 2008
"bearophile" <bearophileHUGS lycos.com> wrote in message news:gghsa1$2u0c$1 digitalmars.com...Andrei Alexandrescu:I disagree. If you start using that as a solution, then you may as well eliminate unsigned values entirely. I think the root problem with disallowing mixed-sign operations is that math just doesn't work that way. What I mean by that is, disallowing mixed-sign operations implies that we have these nice cleanly separated worlds of "signed math" and "unsigned math". But depending on the operator, the signs/ordering of the operands, and what the operands actually represent, math has tendancy to switch back and forth between the signed ("can be negative") and unsigned ("can't be negative") worlds. So if we have a type system that forces us to jump through hoops every time that world-switch happens, and we then decide that it's justifiable to say "well, let's fix it for array.length by tossing that over to the 'can be negative' world, even though it cuts our range of allowable values in half", then there's nothing stopping us from solving the rest of the cases by throwing them over the "can be negative" wall as well. All of a sudden, we have no unsigned. Just a thought: Maybe some sort of built-in "units" system could help here? Instead of just making array.length a "signed" or "unsigned" and leavng it as that, add a "units system" and tag array.length as being a length, with length tags carring the connotation that negative is disallowed. Adding/subtracting a pure constant to a length would cause the constant to be automaticlly tagged as a "length delta" (which can be negative). And the units system would, of course, contain the rule that a length delta added/subtracted from a length results in a length. The units system could then translate all of that into "signed vs unsigned".The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.That can be solved making array.length signed. Can you list few other annoying situations?
Nov 25 2008
Nick Sabalausky Wrote:happens, and we then decide that it's justifiable to say "well, let's fix it for array.length by tossing that over to the 'can be negative' world, even though it cuts our range of allowable values in half", then there's nothing stopping us from solving the rest of the cases by throwing them over the "can be negative" wall as well. All of a sudden, we have no unsigned.Well... cutting out range can be no problem, after all a thought was floating around that structs shouldn't be larger that a couple of kb, note that array of shorts with signed length spans entire 32-bit address space.
Nov 26 2008
On Tue, 25 Nov 2008 16:56:17 -0500, bearophile wrote:Andrei Alexandrescu:Is that conceptually clean/clear? (If so, I'd like to request an array of length -1.) I like Andrei's proposal because it keeps clarity in such cases: sizes are non-negative quantities. Once you start subtracting ints, it's possibly not a size anymore, in such cases you want the user to decide explicitly. -- DanielThe problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.That can be solved making array.length signed.
Nov 25 2008
bearophile wrote:Andrei Alexandrescu:unsigned types, the length of a list, array, etc., is always int. In this way, they prevented the bugs and problems everyone mention here.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.That can be solved making array.length signed.
Nov 26 2008
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article(You may want to check your system's date, unless of course you traveled in time.) Russell Lewis wrote:Perhaps not, but the fact that constants are signed integers has been mentioned as a problem before. Would making these polysemous values help at all? That seems to be what your proposal is effectively trying to do anyway. SeanI'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Nov 25 2008
Sean Kelly wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleWell with constants we can do many tricks; I mentioned an extreme example. Polysemy does indeed help but my latest design (described in the post starting this thread) gets away with simple subtyping. I like polysemy (the name is really cool :o)) but I don't want to be concept-heavy: if a classic technique words, I'd use that and save polysemy for a tougher task that cannot be comfortably tackled with existing means. Andrei(You may want to check your system's date, unless of course you traveled in time.) Russell Lewis wrote:Perhaps not, but the fact that constants are signed integers has been mentioned as a problem before. Would making these polysemous values help at all? That seems to be what your proposal is effectively trying to do anyway.I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Nov 25 2008
On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Russell Lewis wrote:Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved. I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not an int, not even a constant: it's just a number which doesn't imply any type at all until you place it into a variable (or a constant, or an enum, etc.). And if you're afraid the word polysemous will scare people, don't say the word and call it a "integer litteral". Polysemy in this case is just a mechanism used by the compiler to make the value work as expected with all integral types. All you really need is a type implicitly castable to everything capable of holding the numerical value (much like your __intuint). I'd make "auto x = 1" create a signed integer variable for the sake of simplicity. And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant. -- Michel Fortin michel.fortin michelf.com http://michelf.com/I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Nov 26 2008
Michel Fortin wrote:On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Well that at best takes care of _some_ operations involving constants, but for example does not quite take care of array.length - 1. I am now sorry I gave the silly example of array.length + 1. Many people latched on it and thought that solving that solves the whole problem. That's not quite the case. Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.Russell Lewis wrote:Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved.I'm of the opinion that we should make mixed-sign operations a compile-time error. I know that it would be annoying in some situations, but IMHO it gives you clearer, more reliable code.The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not an int, not even a constant: it's just a number which doesn't imply any type at all until you place it into a variable (or a constant, or an enum, etc.). And if you're afraid the word polysemous will scare people, don't say the word and call it a "integer litteral". Polysemy in this case is just a mechanism used by the compiler to make the value work as expected with all integral types. All you really need is a type implicitly castable to everything capable of holding the numerical value (much like your __intuint). I'd make "auto x = 1" create a signed integer variable for the sake of simplicity.That can be formalized by having polysemous types have a "lemma", a default type.And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant.Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue. Andrei
Nov 26 2008
On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Well that at best takes care of _some_ operations involving constants, but for example does not quite take care of array.length - 1.How does it not solve the problem. array.length is of type uint, 1 is polysemous(byte, ubyte, short, ushort, int, uint, long, ulong). Only "uint - uint" is acceptable, and its result is "uint".Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length; By casting explicitly, you indicate in the code that if a1.length or a2.length contain numbers which are too big to be represented as int, you'll get garbage. In this case, it'd be pretty surprising to get that problem. In other cases it may not be so clear-cut. Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag). auto diff = a1.length.sign - a2.length.sign; And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking. One last thing. I think that in general it's a much better habit to change the type to signed prior doing the substratction. It may be harmless in the case of a substraction, but as you said when starting the thread, it isn't for others (multiply, divide, modulo). I think the scheme above promotes this good habit by making it easier to change the type at the operands rather than at the result.That's indeed what I'm suggesting.I'd make "auto x = 1" create a signed integer variable for the sake of simplicity.That can be formalized by having polysemous types have a "lemma", a default type.Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner. -- Michel Fortin michel.fortin michelf.com http://michelf.com/And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant.Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.
Nov 26 2008
Michel Fortin wrote:On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Actually, there's no solution. Imagine a 32 bit system, where one object can be greater than 2GB in size (not possible in Windows AFAIK, but theoretically possible). Then if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it requires an int for storage, since result is less than 0. ==> I think length has to be an int. It's less bad than uint.Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length;Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag). auto diff = a1.length.sign - a2.length.sign; And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking.That's not bad.We are of one mind. I think that constants are the root cause of the problem.Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.
Nov 26 2008
Don wrote:Michel Fortin wrote:There is. We need to find the block of marble it's in and then chip the extra marble off it.On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Actually, there's no solution.Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length;Imagine a 32 bit system, where one object can be greater than 2GB in size (not possible in Windows AFAIK, but theoretically possible).It is possible in Windows if you change some I-forgot-which parameter in boot.ini.Then if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it requires an int for storage, since result is less than 0. ==> I think length has to be an int. It's less bad than uint.I'm not sure how the conclusion follows from the premises, but consider this. If someone deals with large arrays, they do have the possibility of doing things like: if (a1.length >= a2.length) { size_t delta = a1.length - a2.length; ... use delta ... } else { size_t rDelta = a2.length - a1.length; ... use rDelta ... } I'm not saying it's better than sliced bread, but it is a solution. And it is correct on all systems. And cooperates with the typechecker by adding flow information to which typecheckers are usually oblivious. And types are out in the clear. And it's the programmer, not the compiler, who decides the signedness. In contrast, using ints for array lengths beyond 2GB is a nightmare. I'm not saying it's a frequent thing though, but since you woke up the sleeping dog, I'm just barking :o).Well let's look closer at this. Consider a system in which the current rules are in vigor, plus the overflow check for uint. auto i = arr.length - offset1 + offset2; Although the context makes it clear that offset1 < offset2 and therefore i is within range and won't overflow, the poor code generator has no choice but insert checks throughout. Even though the entire expression is always correct, it will dynamically fail on the way to its correct form. Contrast with the proposed system in which the expression will not compile. They will indeed require the user to somewhat redundantly insert guides for operations, but during compilation, not through runtime failure.Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag). auto diff = a1.length.sign - a2.length.sign; And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking.That's not bad.Well I strongly disagree. (I assume you mean "literals", not "constants".) I see constants as just a small part of the signedness mess. Moreover, I consider that in fact creating symbolic names with "auto" compounds the problem, and this belief runs straight against yours that it's about literals. No, IMHO it's about espousing and then propagating wrong beliefs through auto! Maybe if you walked me through your reasoning on why literals bear a significant importance I could get convinced. As far as my code is concerned, I tend to loosely go along the lines of the old adage "the only literals in a program should be 0, 1, and -1". True, the adage doesn't say how many of these three may reasonably occur, but at the end of the day I'm confused about this alleged importance of literals. AndreiWe are of one mind. I think that constants are the root cause of the problem.Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.
Nov 26 2008
On 2008-11-26 13:30:30 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Well let's look closer at this. Consider a system in which the current rules are in vigor, plus the overflow check for uint. auto i = arr.length - offset1 + offset2; Although the context makes it clear that offset1 < offset2 and therefore i is within range and won't overflow, the poor code generator has no choice but insert checks throughout. Even though the entire expression is always correct, it will dynamically fail on the way to its correct form.That's because you're relying on a specific behaviour for overflows and that changes with range checking. True: in some cases the values going circular is desirable. But in this specific case I'd say it'd be better to just add parenthesis at the right place, or change the order of the arguments to avoid overflow. Avoiding overflows is a good practice in general. The only reason it doesn't bite here is because you're limited to additions and subtractions. If you dislike the compiler checking for overflows, just tell it not to check. That's why we need a compiler switch. Perhaps it'd be good to have a pragma to disable those checks for specific pieces of code too.Contrast with the proposed system in which the expression will not compile. They will indeed require the user to somewhat redundantly insert guides for operations, but during compilation, not through runtime failure.If you're just adding a special rule to prevent the result of substractions of unsigned values to be put into auto variables, I'm not terribly against that. I'm just unconvinced of its usefullness. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently. Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled. A better solution would be to write code as follows: auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4") auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1") auto delta = a1.length - a2.length; // won't compile // this one is also handy: auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflow I have previously shown an implementation of unsigned/signed: import std.stdio; int signed(lazy int dg) { auto result = dg(); asm { jo overflow; } return result; overflow: throw new Exception("Integer overflow occured"); } int main() { int t = int.max; try { int s = signed(t + 1); writefln("Result is %d", s); } catch(Exception e) { writefln("Whoops! %s", e.toString()); } return 0; } But Andrei has correctly pointed out that it has a problem - it may throw without a reason: int i = int.max + 1; // sets an overflow flag auto result = expectSigned(1); // raises an exception Overflow flag may also be cleared in a complex expression: auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.
Nov 26 2008
Denis Koroskin wrote:On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary. By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.Put an enforce(a1.length > a2.length) then.A better solution would be to write code as follows: auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4") auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1") auto delta = a1.length - a2.length; // won't compileAmazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.// this one is also handy: auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflowThis could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.I have previously shown an implementation of unsigned/signed: import std.stdio; int signed(lazy int dg) { auto result = dg(); asm { jo overflow; } return result; overflow: throw new Exception("Integer overflow occured"); } int main() { int t = int.max; try { int s = signed(t + 1); writefln("Result is %d", s); } catch(Exception e) { writefln("Whoops! %s", e.toString()); } return 0; }Ah, there we go! Thanks for pasting this code.But Andrei has correctly pointed out that it has a problem - it may throw without a reason: int i = int.max + 1; // sets an overflow flag auto result = expectSigned(1); // raises an exception Overflow flag may also be cleared in a complex expression: auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.Can't you clear the overflow flag prior to invoking the operation? I'll also mention that making it a delegate reduces appeal quite a bit; expressions under the check tend to be simple which makes the relative overhead huge. Andrei
Nov 26 2008
On Wed, 26 Nov 2008 21:45:30 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Denis Koroskin wrote:Right, it is better. Problem is, you don't want to put checks like "a1.length > a2.length" into your code (I don't, at least). All you want is to be sure that "auto result = a1.length - a2.length" is positive. You *then* decide and solve the "a1.length - a2.length >= 0" equation that leads to the check. Moreover, why evaluate both a1.length and a2.length twice? And you should update all your checks everytime you change your code.On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary. By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.Put an enforce(a1.length > a2.length) then.It is an implementation detail. Expression can be calculated with higher bit precision and result compared to needed range.A better solution would be to write code as follows: auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4") auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1") auto delta = a1.length - a2.length; // won't compileAmazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.// this one is also handy: auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflowThis could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.No need for this, it adds one more instruction for no gain, flag is automatically set/reset at any of add/sub/mul operations. It can only save you from "auto result = signed(1)" error, that's why I said it should be disallowed in first place.I have previously shown an implementation of unsigned/signed: import std.stdio; int signed(lazy int dg) { auto result = dg(); asm { jo overflow; } return result; overflow: throw new Exception("Integer overflow occured"); } int main() { int t = int.max; try { int s = signed(t + 1); writefln("Result is %d", s); } catch(Exception e) { writefln("Whoops! %s", e.toString()); } return 0; }Ah, there we go! Thanks for pasting this code.But Andrei has correctly pointed out that it has a problem - it may throw without a reason: int i = int.max + 1; // sets an overflow flag auto result = expectSigned(1); // raises an exception Overflow flag may also be cleared in a complex expression: auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.Can't you clear the overflow flag prior to invoking the operation?I'll also mention that making it a delegate reduces appeal quite a bit; expressions under the check tend to be simple which makes the relative overhead huge.Such simple instructions are usually inlined, aren't they?
Nov 26 2008
Andrei Alexandrescu wrote:Denis Koroskin wrote:I doubt that would be used in practice.On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary.Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice. If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled.Put an enforce(a1.length > a2.length) then.A better solution would be to write code as follows: auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4") auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1") auto delta = a1.length - a2.length; // won't compileAmazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
Nov 27 2008
Don wrote:Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
Andrei Alexandrescu wrote:Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
Don wrote:Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on. What do you think? AndreiDon wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
Andrei Alexandrescu wrote:Don wrote:So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. AndreiWhat do you think? Andrei
Nov 27 2008
KennyTM~ wrote:Andrei Alexandrescu wrote:Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)Don wrote:So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. AndreiWhat do you think? Andrei
Nov 27 2008
KennyTM~ wrote:KennyTM~ wrote:The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different. By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o). AndreiAndrei Alexandrescu wrote:Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)Don wrote:So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
Andrei Alexandrescu wrote:KennyTM~ wrote:I just remembered a problem with simplemindedly going with the tightest type. Consider: uint a = ...; ubyte b = ...; auto c = a & b; c <<= 16; ... The programmer may reasonably expect that the bitwise operation yields an unsigned integer because it involved one. However, the zealous compiler cleverly notices the operation really never yields something larger than a ubyte, and therefore returns that "tightest" type, thus making c a ubyte. Subsequent uses of c will be surprising to the programmer who thought c has 32 bits. It looks like polysemy is the only solution here: return a polysemous value with principal type uint and possible type ubyte. That way, c will be typed as uint. But at the same time, continuing the example: ubyte d = a & b; will go through without a cast. That's pretty cool! One question I had is: say polysemy will be at work for integral arithmetic. Should we provide means in the language for user-defined polysemous functions? Or is it ok to leave it as compiler magic that saves redundant casts? AndreiKennyTM~ wrote:The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different. By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o). AndreiAndrei Alexandrescu wrote:Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)Don wrote:So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
On 2008-11-27 22:34:50 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:One question I had is: say polysemy will be at work for integral arithmetic. Should we provide means in the language for user-defined polysemous functions? Or is it ok to leave it as compiler magic that saves redundant casts?I think that'd be a must. Otherwise how would you define your own arithmetical types so they work like the built-in ones? struct ArbitraryPrecisionInt { ... } ArbitraryPrecisionInt a = ...; uint b = ...; auto c = a & b; c <<= 16; ... Should't c be of type ArbitraryPresisionInt? And shouldn't the following work too? uint d = a & b; That said, how can a function return a polysemous value at all? Should the function return a special kind of struct with a sample of every supported type? That'd be utterly inefficient. Should it return a custom-made struct with the ability to implicitly cast itself to other types? That would make the polysemous value propagatable through auto, and probably less efficient too. The only way I can see this work correctly is with function overloading on return type, with a way to specify the default function (for when the return type is not specified, such as with auto). In the case above, you'd need something like this: struct ArbitraryPrecisionInt { default ArbitraryPrecisionInt opAnd(uint i); uint opAnd(uint i); } -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 28 2008
Andrei Alexandrescu wrote:KennyTM~ wrote:The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically. I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'. I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which isKennyTM~ wrote:Andrei Alexandrescu wrote:Don wrote:Andrei Alexandrescu wrote:I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.Don wrote:Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.Andrei Alexandrescu wrote:In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei2GB, and a small object y, then x.length - y.length will erroneouslybe negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.
Nov 28 2008
(I lost track of quotes, so I yanked them all beyond Don's message.) Don wrote:The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is >2GB, and a small object y, then x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention. Andrei
Nov 28 2008
Andrei Alexandrescu wrote:(I lost track of quotes, so I yanked them all beyond Don's message.) Don wrote:It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2) Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is >2GB, and a small object y, then x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.
Nov 28 2008
Don wrote:Andrei Alexandrescu wrote:(You mean x.length = 2, y.length = 4 in the second case.)(I lost track of quotes, so I yanked them all beyond Don's message.) Don wrote:It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2)The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is >2GB, and a small object y, then x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems. I agree the solution has problems. Following this thread that in turn follows my sleepless nights poring over the subject, I'm glad to reach a design that is better than what we currently have. I think that disallowing the signed -> unsigned conversions will be a net improvement. Andrei
Nov 28 2008
Andrei Alexandrescu wrote:Don wrote:Yes.Andrei Alexandrescu wrote:(You mean x.length = 2, y.length = 4 in the second case.)(I lost track of quotes, so I yanked them all beyond Don's message.) Don wrote:It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2)The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'.I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is >2GB, and a small object y, then x.length - y.length will erroneously be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.Yes. I think much existing code would fail with sizes over 2GB, though. But it's not any worse.Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.I agree the solution has problems. Following this thread that in turn follows my sleepless nights poring over the subject, I'm glad to reach a design that is better than what we currently have. I think that disallowing the signed -> unsigned conversions will be a net improvement.I agree. And dealing with compile-time constants will improve things even more.
Nov 28 2008
On 2008-11-28 17:44:39 +0100, Don <nospam nospam.com> said:Andrei Alexandrescu wrote:[...]Don wrote:Andrei Alexandrescu wrote:(I lost track of quotes, so I yanked them all beyond Don's message.) Don wrote:The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically.I found a couple of instances where to compare addresses simply a-b was done, instead of something like ((a<b)?-1:((a==b)?0:1)), so yes this is a pitfall that happens. Note that normally the subtraction of lengths is ok (because normally one is interested in the result and a>b), it is when it is used as quick way to introduce ordering (i.e. as comparison) that it becomes problematic. By the way the solution of going beyond 2GB is clearly using size_t, as I think is done (at least in tango). FawziYes. I think much existing code would fail with sizes over 2GB, though. But it's not any worse.Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.
Dec 01 2008
On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB?It could be transformed by the compiler into more something like ... if ((x.length <= y.length) || ((x.length - y.length) < 100)) ... -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 28 2008
Derek Parnell wrote:On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:Then it'd have different behavior from ---- auto diff = x.length - y.length; if (diff < 100) ... ---- This seems like a *bad* thing...It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB?It could be transformed by the compiler into more something like ... if ((x.length <= y.length) || ((x.length - y.length) < 100)) ...
Nov 28 2008
On Sat, 29 Nov 2008 01:17:27 +0100, Frits van Bommel wrote:Then it'd have different behavior from ---- auto diff = x.length - y.length; if (diff < 100) ... ---- This seems like a *bad* thing...I see the problem a little differently. To me, "x.length - y.length" is ambiguous and thus meaningless. The ambiguity is are you after the difference between two values or are you after the value required to add to x.length to get to y.length? These are not necessarily the same thing. The difference is always positive, as in the difference between the length of X and length of Y is 4. The answer tells us the difference between two lengths but not of course which is the smaller. So it all depends on what you are trying to find out. And note that the difference is not a length because it is not associated with any specific array. So having looked at it like this, I'm now inclined to consider that the 'diff' being declared here should be a signed type and, if possible, have more bits than '.length'. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 28 2008
Don wrote:length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.This is why I never understood ptrdiff_t in C. Having to choose between a signed value and narrower range vs. unsigned and sufficient range just stinks. Sean
Nov 28 2008
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleAt the moment, we're in limbo regarding the decision to go forward with this. Walter, as many good long-time C programmers, knows the abusive unsigned rule so well he's not hurt by it and consequently has little incentive to see it as a problem. I have had to teach C and C++ to young students coming from Java introductory courses and have a more up-to-date perspective on the dangers.I'll address your actual suggestion separately, but personally, I always build C/C++ code at the max warning level, and treat warnings as errors. This typically catches all signed-unsigned interactions and requires me to add a cast for the build to succeed. The advantage of this is that if I see a cast in my code then I know that the statement is deliberate rather than accidental. I would wholeheartedly support such an approach in D as well, though I can see how this may not be terribly appealing to some experienced C/C++ programmers. Sean
Nov 25 2008
Andrei Alexandrescu wrote:D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -uI think that most of these problems are caused by C enforcing a foolish consitency between literals and variables. The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'. uint a = 1; does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea. For example, when is it reasonable to use -u? It's useful with literals like uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF. Anywhere else, it's probably a bug. My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int. Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!') Interestingly, none of these problems exist in assembly language programming, where every arithmetic instruction affects the overflow flag (for signed operations) as well as the carry flag (for unsigned).
Nov 26 2008
Don wrote:Andrei Alexandrescu wrote:Yah, polysemy will take care of the constants. It's also rather easy to implement for them.D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics. A classic problem with C and C++ integer arithmetic is that any operation involving at least an unsigned integral receives automatically an unsigned type, regardless of how silly that actually is, semantically. About the only advantage of this rule is that it's simple. IMHO it only has disadvantages from then on. The following operations suffer from the "abusive unsigned syndrome" (u is an unsigned integral, i is a signed integral): (1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -uI think that most of these problems are caused by C enforcing a foolish consitency between literals and variables. The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'. uint a = 1; does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea.For example, when is it reasonable to use -u? It's useful with literals like uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF. Anywhere else, it's probably a bug.Maybe not even for constants as all uses of -u can be easily converted in ~u + 1. I'd gladly agree to disallow -u entirely.My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.Interestingly, none of these problems exist in assembly language programming, where every arithmetic instruction affects the overflow flag (for signed operations) as well as the carry flag (for unsigned).They do exist. You need to use imul/idiv vs. mul/div depending on what signedness your operators have. Andrei
Nov 26 2008
Andrei Alexandrescu wrote:Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level. Sean
Nov 26 2008
Sean Kelly wrote:Andrei Alexandrescu wrote:There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand. 3. The C Veteran: Everything should be allowed. And when unsigned is within a mile, the type is unsigned. I'll take care of the rest. 4. The Assembly Programmer: Use whatever type you want. The assembly language operation for subtraction is the same. 5. The Dynamic Language Fan: Allow whatever and check it dynamically. 6. The Static Typing Nut: Use some scheme to magically weed out 73.56% mistakes and disallow only 14.95% valid uses. Your example is in fact perfect. It shows how the result of a subtraction has ultimately its fate decided by case-by-case use, not picked properly by a rule. The example perfectly underlines the advantage of my scheme: the decision of how to type u1 - u2 is left to the only entity able to account: the user of the operation. Of course there remains the question, should all that be implicit or should the user employ more syntax to specify what they want? I don't know. AndreiNotice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
Nov 26 2008
Andrei Alexandrescu wrote:Sean Kelly wrote:How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility? As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int. Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window". -LarsAndrei Alexandrescu wrote:There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
Nov 26 2008
Lars Kyllingstad wrote:Andrei Alexandrescu wrote:Another point: nint would also be implicitly castable to uint and so on, so making these types the standard choice of unsigned integers in Phobos shouldn't cause too much breakage. -LarsSean Kelly wrote:How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility? As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int. Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window". -LarsAndrei Alexandrescu wrote:There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
Nov 26 2008
Andrei Alexandrescu Wrote:There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed.I thought, mathematics doesn't distinguish between, say, natural 5, integral 5 and real 5. N, Z and R are sets, not types of numbers. There is even notion of equivalence class to deem numbers with different representation as the same (not just equal).
Nov 27 2008
Kagamin wrote:Andrei Alexandrescu Wrote:Right, but the notion of set closedness for an operation comes from math. AndreiThere are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed.I thought, mathematics doesn't distinguish between, say, natural 5, integral 5 and real 5. N, Z and R are sets, not types of numbers. There is even notion of equivalence class to deem numbers with different representation as the same (not just equal).
Nov 27 2008
Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:Don wrote:I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
Nov 26 2008
Sergey Gromov wrote:Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like. I guess length would return an int in that case? AndreiDon wrote:I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
Nov 26 2008
Wed, 26 Nov 2008 15:57:55 -0600, Andrei Alexandrescu wrote:Sergey Gromov wrote:I guess so. Actually, simply disallowing signed<=>unsigned cast and making length signed would force most people to abandon unsigned types. And moving unsgned types documentation in a separate chapter would warn newcomers about their special status. Not a lot of changes on the compiler side, mostly throwing stuff away.Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like. I guess length would return an int in that case?Don wrote:I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int.Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
Nov 26 2008
Andrei Alexandrescu:That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like. I guess length would return an int in that case?I don't know what the solution is, but I am very happy to see that in this newsgroup there are people willing to reconsider such basic things, to try to improve the language. Most ideas turn out to be wrong, but if you aren't bold enough to consider them, there will no improvements :-) In my programs I use use unsigned integers and unsigned longs as: - bitfields, a single size_t, for example to represent a small set of items. - bitarrays, in an array of size_t, to represent a larger set, to have array of bit flags, etc. - to pack small variables into a uint, size_t, etc, for example use the first 5 bits to represent a, the following 2 bits to represent b, etc. In such situation I have never pack such variables into a signed int. - when I need very large integer values, but this has to be done with care, because they can't be converted back to ints. - I'd also like to use unsigned ints to denote that for example a function takes a nonnegative argument. I used to do this in Delphi, but I have seen it's too much unsafe in D, so now in D I prefer to use ints and then inside the function test for a negative argument and throw an exception (generally I don't use an assert for this but in the most speed critical situations). - I use unsigned bytes in some situations, now and then. I don't use signed bytes anymore, I used to use them for 8 bit digital audio, but not anymore. Now 16 bit signed audio is the norm (a short) or even 24 bit (I have created a slow 24 bit value time ago). - Probably there are few other situations, for example I think I've used an ushort once, but not many of them. Bye, bearophile
Nov 26 2008
bearophile Wrote:In my programs I use use unsigned integers and unsigned longs as: - bitfields, a single size_t, for example to represent a small set of items. - bitarrays, in an array of size_t, to represent a larger set, to have array of bit flags, etc. - to pack small variables into a uint, size_t, etc, for example use the first 5 bits to represent a, the following 2 bits to represent b, etc. In such situation I have never pack such variables into a signed int.I think, signed ints can hold bits as gracefully as unsigned ones.- when I need very large integer values, but this has to be done with care, because they can't be converted back to ints.I don't think that large integers know or respect computers-specific integers limits. They just get larger and larger.- Probably there are few other situations, for example I think I've used an ushort once, but not many of them.legacy technologies tend to use unsigneds intensively and people got used to unsigned chars (for comparison and character maps).
Nov 27 2008
Kagamin wrote:bearophile Wrote:Problem is there is an odd jump whenever the sign bit gets into play. An expert programmer can easily deal with that, but it's rather tricky.In my programs I use use unsigned integers and unsigned longs as: - bitfields, a single size_t, for example to represent a small set of items. - bitarrays, in an array of size_t, to represent a larger set, to have array of bit flags, etc. - to pack small variables into a uint, size_t, etc, for example use the first 5 bits to represent a, the following 2 bits to represent b, etc. In such situation I have never pack such variables into a signed int.I think, signed ints can hold bits as gracefully as unsigned ones.Often large integers hold counts or sizes of objects fitting in computer memory. There is a sense of completeness of a systems-level language in being able to use a native type to express any offset in memory. That's why it would be some of a bummer if we defined size_t as int on 32-bit systems: I, at least, would feel like giving something up. Andrei- when I need very large integer values, but this has to be done with care, because they can't be converted back to ints.I don't think that large integers know or respect computers-specific integers limits. They just get larger and larger.
Nov 27 2008
Sergey Gromov wrote:So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.SafeD is about memory safety, i.e. no corrupted memory. Dealing with integer overflows falls outside its agenda.
Nov 26 2008
Don wrote:Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it. Sean
Nov 26 2008
Sean Kelly wrote:Don wrote:For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations. I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers). AndreiAlthough it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it. Sean
Nov 26 2008
Andrei Alexandrescu wrote:Sean Kelly wrote:To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.Don wrote:For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers).Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course. Sean
Nov 26 2008
27.11.08 в 03:46 Sean Kelly в своём письме писал(а):Andrei Alexandrescu wrote:If they can be more than 2Gb, why can't they be more than 4GB? It is dangerous to assume that they won't, that's why uint is dangerous. You exchange one additional bit of information for safety, this is wrong. Soon enough we won't use uints the same way we don't use ushorts (I should have asked if anyone uses ushort these day first, but there is so little gain to use ushort as opposed to short or int that I consider it impractical). 64bit era will give us 64bit pointers and 64 bit counters. Do you think you will prefer ulong over long for an additional bit? You really shoudn't. My proposal Short summary: - Disallow bitwise operations on both signed types and unsigned types, allow arithmetic operations - Discourage usage of unsigned types. Introduce bits8, bits16, bits32 and bits64 as a replacement - Disallow arithmetic operations on bits* types, allow bitwise operations on them - Disallow mixed-type operations (compare, add, sub, mul and div) - Disallow implicit casts between all types - Use int and long (or ranged types) for length and indices with runtime checks (a.length-- is always dangerous no mater what CT checks you will make). - Add type constructors for int/uint/etc: "auto x = int(int.max + 1);" throws at run-time The two most common uses of uints are: 0) Bitfields or masks, packed values and hexademical constants (bitfields later on) 1) Numbers that can't be negative (counters, sizes/lengths etc) Bitfields Bitfields are handy, and using of an unsigned type over a signed is surely preferable. Most common operations on bitfields are bitwise AND, OR, (R/L)SHIFT and XOR. You shouldn't substruct from or add to them, it is an error in most cases. This is what new bits8, bits16, bits32 and bits64 types should be used for: bits32 argbColor; int alphaShift = 24; // any type here, actually // shift bits32 alphaMask = (0xFF << alphaShift); // 0xFF is of type bits8 auto value2 = value1 & mask; // all 3 are of type bits* // you can only shift bits, result is in bits, too, i.e. the following is incorrect: int i = -42; int x = (i << 8); // An error // 1) can't shift value of type int // 2) can't assign valus of type bits32 to variable of type int // ubyte is still handy sometimes (color should belong to [0..255] range) auto red = (argbColor & alphaMask) >> alphaShift; // result is in bits32, use explicit cast to convert it to target data type: ubyte red = cast(ubyte)((argbColor & alphaMask) >> alphaShift); // Alternatively: ubyte alpha = ubyte((argbColor & alphaMask) >> alphaShift); Type constructor throws an error if source value (which is of type bits32 in this example) can't be stored in ubyte. This might be a replacement for signed/unsigned methods. int i = 0xFFFFFFFF; // an error, can't convert value of type bits32 to variable of type int int i = int.max + 1; // ok int i = int(int.max + 1); // an exception is raised at runtime int i = 0xABCD - 0xDCBA; // not allowed. Add explicit casts auto u = cast(uint)0xABCD - cast(uint)0xDCBA; // result type is uint, no checks for overflow auto i = cast(int)0xABCD - cast(int)0xDCBA; // result type is int, no checks for overflow auto e = cast(uint)0xABCD - cast(int)0xDCBA; // an error, can't substruct int from uint // type ctors in action: auto i = int(cast(int)0xABCD - cast(int)0xDCBA); // result type is int, an exception on overflow auto u = int(cast(uint)0xABCD - cast(uint)0xDCBA); // same here for uint Non-negative values Just use int/long. Or some ranged type ([0..short.max], [0..int.max], [0..long.max]) could be used as well. A library type, perhaps. Let's call it nshort/nint/nlong. It should have the same set of operations as short/int/long but makes additional checks. Throws on under- and overflow. int x = 42; nint nx = x; // ok nx = -x; // throws nx = int.max; // ok ++nx; // throws nx = 0; --nx; // throws nx = 0; nint ny = 42; nx = ny; // no checking is done int y = ny; // no checking is done, either short s = ny; // error, cast needed short s = cast(short)ny; // never throws short s = short(ny); // might throwSean Kelly wrote:To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.Don wrote:For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by thisYeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course. Sean2GB issue because it's a very practical and very rare issue in a weirdcontrast with a very principled issue (modeling natural numbers).
Nov 26 2008
Denis Koroskin wrote:27.11.08 в 03:46 Sean Kelly в своём письме писал(а):Bigger than 4GB on a 32-bit system? Files perhaps, but I'm talking about memory ranges here.Andrei Alexandrescu wrote:If they can be more than 2Gb, why can't they be more than 4GB? It is dangerous to assume that they won't, that's why uint is dangerous. You exchange one additional bit of information for safety, this is wrong.Sean Kelly wrote:To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.Don wrote:For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!')This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers).Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course.Soon enough we won't use uints the same way we don't use ushorts (I should have asked if anyone uses ushort these day first, but there is so little gain to use ushort as opposed to short or int that I consider it impractical). 64bit era will give us 64bit pointers and 64 bit counters. Do you think you will prefer ulong over long for an additional bit? You really shoudn't.long vs. ulong for sizes is less of an issue, because we're a long way away from running against the limitations of a 63-bit size value. The point of size_t to me, however, is that it scales automatically, so if I write array operations using size_t then I can be sure they will work on both a 32 and 64-bit system. I do like Don's point about unsigned really meaning "unsigned" however, rather than "positive." I clearly use unsigned numbers for both, even if I flag the "positive" uses via type alias such as size_t. In C/C++ I rely on compiler warnings to trap the sort of mistakes we're talking about here, but I'd love a more logically sound solution if one could be found. Sean
Nov 27 2008
Andrei Alexandrescu Wrote:I also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type.1) I see no danger here. 2) I doubt this proposal solves the danger, wheatever it is. 3) -u is funny and looks like wrong desing to me.
Nov 26 2008
Kagamin wrote:Andrei Alexandrescu Wrote:I didn't want runtime checks inserted, just to tighten compilation rules. AndreiI also know seasoned programmers who had no idea that -u compiles and that it also oddly returns an unsigned type.1) I see no danger here. 2) I doubt this proposal solves the danger, wheatever it is. 3) -u is funny and looks like wrong desing to me.
Nov 26 2008
Andrei Alexandrescu:I didn't want runtime checks inserted, just to tighten compilation rules.The compiler may use both :-) Bye, bearophile
Nov 26 2008
bearophile Wrote:One solution is to "disable" some of the more error-prone syntax allowed in C, turning it into a compilation error. For example I have seen newbies write bugs caused by leaving & where a && was necessary. In such case just adopting "and" and making "&&" a syntax error solves the problem and doesn't lead to bugs when you convert C code to D (you just use a search&replace, replacing && with and on the code).Why do you want to turn D into Python? You already has one. Just write in python, migrate others to it and be done with C family.
Nov 26 2008
bearophile:Kagamin:One solution is to "disable" some of the more error-prone syntax allowed in C, turning it into a compilation error. For example I have seen newbies write bugs caused by leaving & where a && was necessary. In such case just adopting "and" and making "&&" a syntax error solves the problem and doesn't lead to bugs when you convert C code to D (you just use a search&replace, replacing && with and on the code).<<Why do you want to turn D into Python? You already has one. Just write in python, migrate others to it and be done with C family.<The mistake I have shown of using "&&" instead of "&" or vice-versa, and "|" instead of "||" and vice-versa comes from code I have seen written by new programmers at he University. But no only newbies can put such bugs, see for example this post: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html It says:People sometimes code "a && MASK" when they intended "a & MASK". gcc itself does not seem to have examples of this, here are some in the linux-2.4.20 kernel:<I want to copy the syntax that leads to less bugs and more readability, and often Python gives good examples, because it's often well designed. Note that this change doesn't lead to less performance of D code. Also note that G++ already allows you to write programs with and, or, not, xor, etc. The following code compiles and run correctly, so instead of Python you may also say I want to copy G++: #include "stdio.h" #include "stdlib.h" int main(int argc, char** argv) { int b1 = argc >= 2 ? atoi(argv[1]) : 0; int b2 = argc >= 3 ? atoi(argv[2]) : 0; printf("%d\n", b1 and b2); return 0; } That can be disabled with "-fno-operator-names" while the "-foperator-names" is enabled by default. So maybe the G++ designers agree with me, instead of you. Bye, bearophile
Nov 26 2008
bearophile Wrote:Also note that G++ already allows you to write programs with and, or, not, xor, etc. The following code compiles and run correctly, so instead of Python you may also say I want to copy G++:copying G++ is not always a good idea :) As I remember this alternative syntax is supported for compatibility with keyboards which don't have kinda exotic ~^&| characters. And I don't think that there is a method to make && a syntax error as you proposed.
Nov 26 2008
bearophile Wrote:http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html It says:that thread is about an extra compiler warning (which is always good), not about breaking C syntax.People sometimes code "a && MASK" when they intended "a & MASK". gcc itself does not seem to have examples of this, here are some in the linux-2.4.20 kernel:<
Nov 26 2008
Kagamin:that thread is about an extra compiler warning (which is always good), not about breaking C syntax.<You seem unaware of the current stance of Walter towards warnings. And please don't forget that D purposes are different from C ones (D is designed to be safer, especially if this has little or no costs), and that D comes after a long experience of coding in C, and that D runs on machine thousands of times faster than the original ones the C language was designed for (today having fast kernels in your program is more and more important. Less code uses most of the running time). And that thread was more generally an example that shows why that specific C syntax is error-prone, and it also explains why some languages, among them there's Python too but it's not the only one, have refused this specific C syntax. Note that there are several other C syntaxes/semantics that are error-prone, and thanks Walter D already fixes some of them, and I hope to see more improvements in the future.And I don't think that there is a method to make && a syntax error as you proposed.<Keeping two syntaxes to do the same thing is a bad form of complexity. Generally it's better to have only one obvious way to do something :-) Bye, bearophile
Nov 26 2008
"Kagamin" <spam here.lot> wrote in message news:ggjcfg$fqq$1 digitalmars.com...bearophile Wrote:Python has other issues.One solution is to "disable" some of the more error-prone syntax allowed in C, turning it into a compilation error. For example I have seen newbies write bugs caused by leaving & where a && was necessary. In such case just adopting "and" and making "&&" a syntax error solves the problem and doesn't lead to bugs when you convert C code to D (you just use a search&replace, replacing && with and on the code).Why do you want to turn D into Python? You already has one. Just write in python, migrate others to it and be done with C family.
Nov 26 2008
On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:(3) u - uJust a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero. If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking). Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
Michel Fortin wrote:On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:It's also a problem of signedness, considering that int can hold the difference of two small unsigned integrals. So if the result is unsigned there may be overflow (I abusively call it "underflow"), but if the result is an int that overflow may be avoided, or a different overflow may occur.(3) u - uJust a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero.If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking). Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it.I understand. It's what I have so far, so I'm looking forward to better ideas. Resorting to runtime checks is always a possibility but I'd like to focus on the static checking aspect for now. Andrei
Nov 26 2008
"Michel Fortin" <michel.fortin michelf.com> wrote in message news:ggjpn4$1v0m$1 digitalmars.com...On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:I'd love to see D get the ability to turn on/off runtime range checking, but doing nothing more than a program-wide (or module-wide if compiling one-at-a-time) compiler switch is way too large-grained and blunt. I would checked(expr) unchecked(expr) checked { code } unchecked { code }(3) u - uJust a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero. If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking).Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
I'm not really sure what I think about all this. I try to always insert assertions before operations like this, which makes me think the nicest solution would be if the compiler errors out if it detects a problematic expression that is unchecked... uint diff(uint begin, uint end) { return end - begin; // error } uint diff(uint begin, uint end) { assert(begin <= end); return end - begin; // ok because of the assert } I'm not going to get into how this would be implemented in the compiler, but it sure would be sweet :)
Nov 26 2008
Tomas Lindquist Olsen wrote:I'm not really sure what I think about all this. I try to always insert assertions before operations like this, which makes me think the nicest solution would be if the compiler errors out if it detects a problematic expression that is unchecked... uint diff(uint begin, uint end) { return end - begin; // error } uint diff(uint begin, uint end) { assert(begin <= end); return end - begin; // ok because of the assert } I'm not going to get into how this would be implemented in the compiler, but it sure would be sweet :)On the other hand, the CPU can report on integer overflow, so you could turn that into an exception if the expression doesn't include a cast.
Nov 26 2008
The more I read about this, the more I am convinced that removing the following - implicit int <-> uint conversion - uint - uint (not 100% sure about this) - mixed int / uint arithmetic As well as changing array.length to int, would remove most problems. If you desperately need a > 2^31 element array, having to roll your own is not the main problem. The fact that the type of uint - uint could be int or uint depending on what the programmer wants, tells me that the programmer should be tasked with informing the compiler what he really wants - i.e. cast. -- Simen
Nov 26 2008
On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.(1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -uNote that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely "(-1 * u)". I am assming that there is no difference between 'unsigned' and 'positive', in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. It seems to me that the issue then is not so much one of sign but of size. It needs an extra bit to hold the sign information thus a 32-bit unsigned value needs a minimum of 33 bits to convert it to a signed equivalent. In the types (1) - (4) above, I would have the compiler compute a signed type for these. Then if the target of the result is a signed type AND larger than the 'unsigned' portion used, then the complier would not have to complain. In every other case the complier should complain because of the potential for information loss. To avoid the complaint, the coder would need to either change the result type, the input types or add a 'message' to the compliler that in effects says "I know what I'm doing, ok?" - I suggest a cast would suffice. In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression. e.g. auto x = int * uint; ==> 'x' is long. If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up. The scenario (5) above should also include equality comparisions, and should cause the compiler to issue a message AND generate code like ... if (u < i) ====> if ( i < 0 ? false : u < cast(typeof(u))i) if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i) if (u = i) ====> if ( i < 0 ? false : u = cast(typeof(u))i) if (u >= i) ====> if ( i < 0 ? true : u >= cast(typeof(u))i) if (u > i) ====> if ( i < 0 ? true : u > cast(typeof(u))i) The coder should be able to avoid the message and the suboptimal generated code my adding a cast ... if (u < cast(typeof u)i) I am also assuming that syntax 'cast(unsigned-type)signed-type' is telling the complier to assume that the bits in the signed-value already represent a valid unsigned-value and so therefore the compiler should not generate code to 'transform' the signed-value bits to form an unsigned-value. To summarize, (1) Perpetuating poor quality C/C++ code should not be encouraged. (2) The compiler should help the coder be aware of potential information loss. (3) The coder should have mechanisms to override the compiler's concerns. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 27 2008
Derek Parnell wrote:On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.Correct.(1) u + i, i + u (2) u - i, i - u (3) u - u (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C requires that these all return unsigned, ouch) (5) u < i, i < u, u <= i etc. (all ordering comparisons) (6) -uNote that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely "(-1 * u)".I am assming that there is no difference between 'unsigned' and 'positive', in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. It seems to me that the issue then is not so much one of sign but of size. It needs an extra bit to hold the sign information thus a 32-bit unsigned value needs a minimum of 33 bits to convert it to a signed equivalent. In the types (1) - (4) above, I would have the compiler compute a signed type for these. Then if the target of the result is a signed type AND larger than the 'unsigned' portion used, then the complier would not have to complain. In every other case the complier should complain because of the potential for information loss. To avoid the complaint, the coder would need to either change the result type, the input types or add a 'message' to the compliler that in effects says "I know what I'm doing, ok?" - I suggest a cast would suffice. In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression. e.g. auto x = int * uint; ==> 'x' is long.I don't think this will fly with Walter.If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up.I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.The scenario (5) above should also include equality comparisions, and should cause the compiler to issue a message AND generate code like ... if (u < i) ====> if ( i < 0 ? false : u < cast(typeof(u))i) if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i) if (u = i) ====> if ( i < 0 ? false : u = cast(typeof(u))i) if (u >= i) ====> if ( i < 0 ? true : u >= cast(typeof(u))i) if (u > i) ====> if ( i < 0 ? true : u > cast(typeof(u))i) The coder should be able to avoid the message and the suboptimal generated code my adding a cast ... if (u < cast(typeof u)i)Yah, comparisons need to be looked at too. Andrei
Nov 27 2008
On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:Derek Parnell wrote:I think we are saying the same thing. If the C code compiles AND if it has the potential to lose information then the D compiler should not compile it *if* the coder has not given explicit permission to the compiler to do so.On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.And that there is our single point of failure.In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression. e.g. auto x = int * uint; ==> 'x' is long.I don't think this will fly with Walter.Of course. *If* the compiler can determine that the result will not lose information when being used, then it is fine. However, that is not always going to be the case. -- Derek Parnell Melbourne, Australia skype: derek.j.parnellIf this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up.I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
Nov 27 2008
Derek Parnell wrote:On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:Oh, sorry. Yes, absolutely!Derek Parnell wrote:I think we are saying the same thing. If the C code compiles AND if it has the potential to lose information then the D compiler should not compile it *if* the coder has not given explicit permission to the compiler to do so.On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.D pursues compatibility with C and C++ in the following manner: if a code snippet compiles in both C and D or C++ and D, then it should have the same semantics.Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.Well here are two objective at odds with each other. One is the systems-y level-y aspect: on 32-bit systems there is a 32-bit multiplication operation that ought to be mapped to naturally by the 32-bit D primitive. I think there is some good reason to expect that. Then there's also the argument you're making - and with which I agree - that 32-bit multiplication really yields a 64-bit value, so the type of the result should be long. But if we really start down that path, infinite-precision integrals are the only solution. Because when you multiply two longs, you'd need something even longer and so on. Anyhow, the ultimate reality is: we won't be able to satisfy every objective we have. We'll need to strike a good compromise. AndreiAnd that there is our single point of failure.In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression. e.g. auto x = int * uint; ==> 'x' is long.I don't think this will fly with Walter.Of course. *If* the compiler can determine that the result will not lose information when being used, then it is fine. However, that is not always going to be the case.If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up.I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
Nov 27 2008
Some of the purposes of a good arithmetic are: - To give the system programmer freedom, essentially to use all the speed and flexibility of the CPU instructions. - To allow fast-running code, it means that having ways to specify 32 or 64 bit operations in a short way. - To allow programs that aren't bug-prone, both with compile-time safeties and where they aren't enough with run-time ones (array bounds, arithmetic overflow among not-long types, etc). - Allow more flexibility, coming from certain usages of multi-precision integers. - Good CommonLisp implementations are supposed to allow both fast code (fixnums) and safe/multiprecision integers (and even untagged fixnums). Andrei Alexandrescu:But if we really start down that path, infinite-precision integrals are the only solution. Because when you multiply two longs, you'd need something even longer and so on.Well, having built-in multi-precision integer values isn't bad. You then need ways to specify where you want the compiler to use fixed length numbers, for more efficiency. Bye, bearophile
Nov 28 2008
Andrei Alexandrescu Wrote:Often large integers hold counts or sizes of objects fitting in computer memory.Yes, if that object is system-specific like size of allocated heap chunk. Business objects don't seem to respect system constraints (they are nearly storage-agnostic). Files are the good example.There is a sense of completeness of a systems-level language in being able to use a native type to express any offset in memory. That's why it would be some of a bummer if we defined size_t as int on 32-bit systems: I, at least, would feel like giving something up.Yes, giving somethink up always feels like giving something up. But can you rely on large numbers? I heard a story about program crash on attempt to allocate memory chunk larger than half address space. It was intended to be valid since there was enough address space, it turned out that one dll happened to be relocated at the middle of address space, so there was no continuous mem chunk of requested size.
Nov 28 2008