digitalmars.D - Numeric limits tracking
- Manu (36/36) May 12 2013 So, here's an issue that constantly drives me nuts, and an elegant solut...
- John Colvin (4/49) May 12 2013 I've been thinking about this for a while from an optimisation
- Manu (18/67) May 12 2013 It is, but there are limits to what it can do.
- Walter Bright (16/18) May 12 2013 It's a form of "data flow analysis". This is done by the optimizer, alth...
- bearophile (10/13) May 12 2013 A first step to improve the situation is to propagate the range
- Manu (20/40) May 12 2013 It is done by the optimiser, but I'm suggesting that it doesn't only hav...
- Walter Bright (10/26) May 12 2013 It's true you could do a sequential only, very conservative form of data...
- Manu (13/53) May 12 2013 Yes indeed, there are certainly many ways to interfere with it, but I
- David Nadlinger (7/9) May 13 2013 There is no such thing as a language feature invisible to the
- deadalnix (3/12) May 13 2013 is(typeof()) say that any permissive change can also break code
- Marco Leise (24/37) May 15 2013 Yes, I came across that one every so often myself. But as
- Simen Kjaeraas (9/26) May 16 2013 I believe the intention was to show an issue with the language, not to
So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able. void func(int x) { x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) if(x < 256) { byte b = x; // Error! (we also know x is 0 .. 255) } } It would be really nice if the compiler would carry around the known possible range of values, and refer to that information when performing down-casting assignments. It would also be very useful information for the back end, it can choose more efficient types if it knows the range, or produce jump tables rather than branch sequences. I think the compiler would need a min, max, and mask for any integers, and update them as it works. There are many sources of this information. Comparisons effectively limit the range: if(x > 0) { // x = 0 .. int.max } else { // x = int.min .. -1 } Masks, obviously: x &= 15; Also contracts are a really great source of seeding this information on function entry. Has this been discussed? It seems simple, and it's invisible to the language. It would just reduce some boilerplate (tedious casts), and also offer some great optimisation opportunities.
May 12 2013
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able. void func(int x) { x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) if(x < 256) { byte b = x; // Error! (we also know x is 0 .. 255) } } It would be really nice if the compiler would carry around the known possible range of values, and refer to that information when performing down-casting assignments. It would also be very useful information for the back end, it can choose more efficient types if it knows the range, or produce jump tables rather than branch sequences. I think the compiler would need a min, max, and mask for any integers, and update them as it works. There are many sources of this information. Comparisons effectively limit the range: if(x > 0) { // x = 0 .. int.max } else { // x = int.min .. -1 } Masks, obviously: x &= 15; Also contracts are a really great source of seeding this information on function entry. Has this been discussed? It seems simple, and it's invisible to the language. It would just reduce some boilerplate (tedious casts), and also offer some great optimisation opportunities.I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
May 12 2013
It is, but there are limits to what it can do. D's contracts potentially offer a lot of information that the optimiser wouldn't naturally have (any function receiving blind arguments can't make any assumptions without a contract present), but more from a usability/safety point of view, I really hate casting when the compiler should know the assignment is safe. It's not only an inconvenience, it's also a safety thing. Consider: I have an int that I assign to a byte, I know the value fits in a byte, but I'm forced to type the manual cast anyway. Once that cast is written, if the valid range of my int happens to change in outer code, I will not receive an error informing me that the new range doesn't fit within a byte anymore, it'll just truncate it anyway, since a blind cast is already there in the code. It would be much better for the compiler to start complaining that it can no longer assure that the int fits into the byte. With the cast present, my bug has been hidden until I eventually crash. On 13 May 2013 10:18, John Colvin <john.loughran.colvin gmail.com> wrote:On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able. void func(int x) { x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) if(x < 256) { byte b = x; // Error! (we also know x is 0 .. 255) } } It would be really nice if the compiler would carry around the known possible range of values, and refer to that information when performing down-casting assignments. It would also be very useful information for the back end, it can choose more efficient types if it knows the range, or produce jump tables rather than branch sequences. I think the compiler would need a min, max, and mask for any integers, and update them as it works. There are many sources of this information. Comparisons effectively limit the range: if(x > 0) { // x = 0 .. int.max } else { // x = int.min .. -1 } Masks, obviously: x &= 15; Also contracts are a really great source of seeding this information on function entry. Has this been discussed? It seems simple, and it's invisible to the language. It would just reduce some boilerplate (tedious casts), and also offer some great optimisation opportunities.I've been thinking about this for a while from an optimisation point of view. Is this not a common practice for an optimising compiler?
May 12 2013
On 5/12/2013 5:00 PM, Manu wrote:So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able.It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges. It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow. x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) Ok, but what about: x &= 0xFFFF; while (expr) { short s = x; x += 1; } There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
May 12 2013
Walter Bright:There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.A first step to improve the situation is to propagate the range of const/immutable values inside a function: void main() { uint x = 1000; immutable y = x % 256; ubyte z = y; // currently error. } Bye, bearophile
May 12 2013
On 13 May 2013 11:03, Walter Bright <newshound2 digitalmars.com> wrote:On 5/12/2013 5:00 PM, Manu wrote:It is done by the optimiser, but I'm suggesting that it doesn't only have value in terms of optimisation. It could improve language usability if performed in the front end. In that context, unless the analysis is very powerful(/mature) and can statically determine 'expr', then x would need to be relaxed to unlimited at that point, since it obviously can't know the limits within (or after) that loop. In this case, the compiler would complain that it can't determine x and you would need an explicit cast as usual. I can see the effective forward reference issue here, which could be addressed in time, but even just reasonably simple limits tracking (ie, within sequential code) would make a massive difference. I think programmers would intuitively understand this situation and wouldn't expect magic from the compiler. The places where it matters the most are places where it's dead obvious what the numeric limits are, and it should be equally obvious to the compiler. It's the sort of thing that could be improved with time, but even a very simple start would be a big help in many cases.So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able.It's a form of "data flow analysis". This is done by the optimizer, although it doesn't track ranges. It isn't quite as simple as you suggest, there are issues like dealing with arbitrary control flow. x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) Ok, but what about: x &= 0xFFFF; while (expr) { short s = x; x += 1; } There's a forward reference going on. We cannot semantically evaluate s=x until we semantically evaluate the rest of the function in order to know that x is or is not reassigned.
May 12 2013
On 5/12/2013 6:23 PM, Manu wrote:It is done by the optimiser, but I'm suggesting that it doesn't only have value in terms of optimisation. It could improve language usability if performed in the front end.I understand.In that context, unless the analysis is very powerful(/mature) and can statically determine 'expr', then x would need to be relaxed to unlimited at that point, since it obviously can't know the limits within (or after) that loop. In this case, the compiler would complain that it can't determine x and you would need an explicit cast as usual. I can see the effective forward reference issue here, which could be addressed in time, but even just reasonably simple limits tracking (ie, within sequential code) would make a massive difference. I think programmers would intuitively understand this situation and wouldn't expect magic from the compiler. The places where it matters the most are places where it's dead obvious what the numeric limits are, and it should be equally obvious to the compiler. It's the sort of thing that could be improved with time, but even a very simple start would be a big help in many cases.It's true you could do a sequential only, very conservative form of data flow analysis. But be aware there are other problems, such as: int *p = ...; ... x &= 0xFFFF; ++(*p); // hmm, is x affected? short s = x;
May 12 2013
On 13 May 2013 13:46, Walter Bright <newshound2 digitalmars.com> wrote:On 5/12/2013 6:23 PM, Manu wrote:Yes indeed, there are certainly many ways to interfere with it, but I certainly hope that most lines of code you deal with on a daily basis aren't like this ;) Obviously it would grow in maturity over time. In that example, just revert to unlimited assumptions when it encounters the pointer operation. Limits may then be refined again in following code. I can imagine things like proper escaping analysis as planned could eventually help in cases like this. Anyway, it's just something to think about. I think it would offer additional safety/correctness to the language; anywhere you type a blind cast, you are potentially hiding a future bug. I'd rather never cast when I don't absolutely need/intend to.It is done by the optimiser, but I'm suggesting that it doesn't only have value in terms of optimisation. It could improve language usability if performed in the front end.I understand. In that context, unless the analysis is very powerful(/mature) and canstatically determine 'expr', then x would need to be relaxed to unlimited at that point, since it obviously can't know the limits within (or after) that loop. In this case, the compiler would complain that it can't determine x and you would need an explicit cast as usual. I can see the effective forward reference issue here, which could be addressed in time, but even just reasonably simple limits tracking (ie, within sequential code) would make a massive difference. I think programmers would intuitively understand this situation and wouldn't expect magic from the compiler. The places where it matters the most are places where it's dead obvious what the numeric limits are, and it should be equally obvious to the compiler. It's the sort of thing that could be improved with time, but even a very simple start would be a big help in many cases.It's true you could do a sequential only, very conservative form of data flow analysis. But be aware there are other problems, such as: int *p = ...; ... x &= 0xFFFF; ++(*p); // hmm, is x affected? short s = x;
May 12 2013
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:Has this been discussed? It seems simple, and it's invisible to the language.There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way. David
May 13 2013
On Monday, 13 May 2013 at 12:37:34 UTC, David Nadlinger wrote:On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:is(typeof()) say that any permissive change can also break code in D. This is our little specificity ;)Has this been discussed? It seems simple, and it's invisible to the language.There is no such thing as a language feature invisible to the language – or at least there should not be. Even a feature, such as this, which only makes the language more permissive needs exact specifications, for all D compilers have to implement it in the same way. David
May 13 2013
Am Mon, 13 May 2013 10:00:40 +1000 schrieb Manu <turkeyman gmail.com>:So, here's an issue that constantly drives me nuts, and an elegant solution seems so do-able. void func(int x) { x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) if(x < 256) { byte b = x; // Error! (we also know x is 0 .. 255) } }Yes, I came across that one every so often myself. But as David pointed out, such a feature would have to make it into the D specification and implemented in all D compilers. In any case I wouldn't call a half-baked "I will solve the halting problem" flow-analysis elegant. :p Let's think different. You could write short s = x & 0xFFFF; on your first line, as you may already know DMD does this bit of range analysis (same for modulo). Also what was your intention? func() practically takes a short parameter. Maybe it's signature should be void func(short x) {...} If that's a problem on the calling site, maybe that can be fixed? bearophile's suggestion sounds like something that's pretty solid and naturally extends the likes of "short s = x & 0xFFF;" So if your "int x" was immutable it would really be trivial to have your "if" change the range of immutable x during that scope and have the assignment only fail because x may be -1000. ;) -- Marco
May 15 2013
On Thu, 16 May 2013 07:34:18 +0200, Marco Leise <Marco.Leise gmx.de> wrote:Am Mon, 13 May 2013 10:00:40 +1000 schrieb Manu <turkeyman gmail.com>:[snip]void func(int x) { x &= 0xFFFF; short s = x; // Error! (but we know x is 0 .. 65535) if(x < 256) { byte b = x; // Error! (we also know x is 0 .. 255) } }Also what was your intention? func() practically takes a short parameter. Maybe it's signature should be void func(short x) {...} If that's a problem on the calling site, maybe that can be fixed?I believe the intention was to show an issue with the language, not to make a function that assigns to a short and a byte only to throw them away. Perhaps he needed to do some intermediate calculations at higher precision, perhaps he needs it to be an int for interfacing with a C function, etc. -- Simen
May 16 2013