## digitalmars.D - Numeric limits tracking

• Manu (36/36) May 12 2013 So, here's an issue that constantly drives me nuts, and an elegant solut...
• John Colvin (4/49) May 12 2013 I've been thinking about this for a while from an optimisation
• Manu (18/67) May 12 2013 It is, but there are limits to what it can do.
• Walter Bright (16/18) May 12 2013 It's a form of "data flow analysis". This is done by the optimizer, alth...
• bearophile (10/13) May 12 2013 A first step to improve the situation is to propagate the range
• Manu (20/40) May 12 2013 It is done by the optimiser, but I'm suggesting that it doesn't only hav...
• Walter Bright (10/26) May 12 2013 It's true you could do a sequential only, very conservative form of data...
• Manu (13/53) May 12 2013 Yes indeed, there are certainly many ways to interfere with it, but I
• David Nadlinger (7/9) May 13 2013 There is no such thing as a language feature invisible to the
• deadalnix (3/12) May 13 2013 is(typeof()) say that any permissive change can also break code
• Marco Leise (24/37) May 15 2013 Yes, I came across that one every so often myself. But as
• Simen Kjaeraas (9/26) May 16 2013 I believe the intention was to show an issue with the language, not to
Manu <turkeyman gmail.com> writes:
```So, here's an issue that constantly drives me nuts, and an elegant solution
seems so do-able.

void func(int x)
{
x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

if(x < 256)
{
byte b = x; // Error! (we also know x is 0 .. 255)
}
}

It would be really nice if the compiler would carry around the known
possible range of values, and refer to that information when performing
down-casting assignments.
It would also be very useful information for the back end, it can choose
more efficient types if it knows the range, or produce jump tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any integers, and
update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
// x = 0 .. int.max
}
else
{
// x = int.min .. -1
}

x &= 15;

Also contracts are a really great source of seeding this information on
function entry.

Has this been discussed? It seems simple, and it's invisible to the
language. It would just reduce some boilerplate (tedious casts), and also
offer some great optimisation opportunities.
```
May 12 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
So, here's an issue that constantly drives me nuts, and an
elegant solution
seems so do-able.

void func(int x)
{
x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

if(x < 256)
{
byte b = x; // Error! (we also know x is 0 .. 255)
}
}

It would be really nice if the compiler would carry around the
known
possible range of values, and refer to that information when
performing
down-casting assignments.
It would also be very useful information for the back end, it
can choose
more efficient types if it knows the range, or produce jump
tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any
integers, and
update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
// x = 0 .. int.max
}
else
{
// x = int.min .. -1
}

x &= 15;

Also contracts are a really great source of seeding this
information on
function entry.

Has this been discussed? It seems simple, and it's invisible to
the
language. It would just reduce some boilerplate (tedious
casts), and also
offer some great optimisation opportunities.

point of view. Is this not a common practice for an optimising
compiler?
```
May 12 2013
Manu <turkeyman gmail.com> writes:
```It is, but there are limits to what it can do.
D's contracts potentially offer a lot of information that the optimiser
wouldn't naturally have (any function receiving blind arguments can't make
any assumptions without a contract present), but more from a
usability/safety point of view, I really hate casting when the compiler
should know the assignment is safe.

It's not only an inconvenience, it's also a safety thing.
Consider:
I have an int that I assign to a byte, I know the value fits in a byte, but
I'm forced to type the manual cast anyway.
Once that cast is written, if the valid range of my int happens to change
in outer code, I will not receive an error informing me that the new range
doesn't fit within a byte anymore, it'll just truncate it anyway, since a
blind cast is already there in the code.
It would be much better for the compiler to start complaining that it can
no longer assure that the int fits into the byte. With the cast present, my
bug has been hidden until I eventually crash.

On 13 May 2013 10:18, John Colvin <john.loughran.colvin gmail.com> wrote:

On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:

So, here's an issue that constantly drives me nuts, and an elegant
solution
seems so do-able.

void func(int x)
{
x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

if(x < 256)
{
byte b = x; // Error! (we also know x is 0 .. 255)
}
}

It would be really nice if the compiler would carry around the known
possible range of values, and refer to that information when performing
down-casting assignments.
It would also be very useful information for the back end, it can choose
more efficient types if it knows the range, or produce jump tables rather
than branch sequences.

I think the compiler would need a min, max, and mask for any integers, and
update them as it works.

There are many sources of this information.
Comparisons effectively limit the range:
if(x > 0)
{
// x = 0 .. int.max
}
else
{
// x = int.min .. -1
}

x &= 15;

Also contracts are a really great source of seeding this information on
function entry.

Has this been discussed? It seems simple, and it's invisible to the
language. It would just reduce some boilerplate (tedious casts), and also
offer some great optimisation opportunities.

view. Is this not a common practice for an optimising compiler?

```
May 12 2013
Walter Bright <newshound2 digitalmars.com> writes:
```On 5/12/2013 5:00 PM, Manu wrote:
So, here's an issue that constantly drives me nuts, and an elegant solution
seems so do-able.

It's a form of "data flow analysis". This is done by the optimizer, although it
doesn't track ranges.

It isn't quite as simple as you suggest, there are issues like dealing with
arbitrary control flow.

x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

x &= 0xFFFF;
while (expr) {
short s = x;
x += 1;
}

There's a forward reference going on. We cannot semantically evaluate s=x until
we semantically evaluate the rest of the function in order to know that x is or
is not reassigned.
```
May 12 2013
"bearophile" <bearophileHUGS lycos.com> writes:
```Walter Bright:

There's a forward reference going on. We cannot semantically
evaluate s=x until we semantically evaluate the rest of the
function in order to know that x is or is not reassigned.

A first step to improve the situation is to propagate the range
of const/immutable values inside a function:

void main() {
uint x = 1000;
immutable y = x % 256;
ubyte z = y; // currently error.
}

Bye,
bearophile
```
May 12 2013
Manu <turkeyman gmail.com> writes:
```On 13 May 2013 11:03, Walter Bright <newshound2 digitalmars.com> wrote:

On 5/12/2013 5:00 PM, Manu wrote:

So, here's an issue that constantly drives me nuts, and an elegant
solution
seems so do-able.

It's a form of "data flow analysis". This is done by the optimizer,
although it doesn't track ranges.

It isn't quite as simple as you suggest, there are issues like dealing
with arbitrary control flow.

x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

x &= 0xFFFF;
while (expr) {
short s = x;
x += 1;
}

There's a forward reference going on. We cannot semantically evaluate s=x
until we semantically evaluate the rest of the function in order to know
that x is or is not reassigned.

It is done by the optimiser, but I'm suggesting that it doesn't only have
value in terms of optimisation. It could improve language usability if
performed in the front end.

In that context, unless the analysis is very powerful(/mature) and can
statically determine 'expr', then x would need to be relaxed to unlimited
at that point, since it obviously can't know the limits within (or after)
that loop.
In this case, the compiler would complain that it can't determine x and you
would need an explicit cast as usual.

I can see the effective forward reference issue here, which could be
addressed in time, but even just reasonably simple limits tracking (ie,
within sequential code) would make a massive difference. I think
programmers would intuitively understand this situation and wouldn't expect
magic from the compiler.
The places where it matters the most are places where it's dead obvious
what the numeric limits are, and it should be equally obvious to the
compiler.

It's the sort of thing that could be improved with time, but even a very
simple start would be a big help in many cases.
```
May 12 2013
Walter Bright <newshound2 digitalmars.com> writes:
```On 5/12/2013 6:23 PM, Manu wrote:
It is done by the optimiser, but I'm suggesting that it doesn't only have value
in terms of optimisation. It could improve language usability if performed in
the front end.

I understand.

In that context, unless the analysis is very powerful(/mature) and can
statically determine 'expr', then x would need to be relaxed to unlimited at
that point, since it obviously can't know the limits within (or after) that
loop.
In this case, the compiler would complain that it can't determine x and you
would need an explicit cast as usual.

I can see the effective forward reference issue here, which could be addressed
in time, but even just reasonably simple limits tracking (ie, within sequential
code) would make a massive difference. I think programmers would intuitively
understand this situation and wouldn't expect magic from the compiler.
The places where it matters the most are places where it's dead obvious what
the
numeric limits are, and it should be equally obvious to the compiler.

It's the sort of thing that could be improved with time, but even a very simple
start would be a big help in many cases.

It's true you could do a sequential only, very conservative form of data flow
analysis.

But be aware there are other problems, such as:

int *p = ...;
...
x &= 0xFFFF;
++(*p);      // hmm, is x affected?
short s = x;
```
May 12 2013
Manu <turkeyman gmail.com> writes:
```On 13 May 2013 13:46, Walter Bright <newshound2 digitalmars.com> wrote:

On 5/12/2013 6:23 PM, Manu wrote:

It is done by the optimiser, but I'm suggesting that it doesn't only have
value
in terms of optimisation. It could improve language usability if
performed in
the front end.

I understand.

In that context, unless the analysis is very powerful(/mature) and can
statically determine 'expr', then x would need to be relaxed to unlimited
at
that point, since it obviously can't know the limits within (or after)
that loop.
In this case, the compiler would complain that it can't determine x and
you
would need an explicit cast as usual.

I can see the effective forward reference issue here, which could be
in time, but even just reasonably simple limits tracking (ie, within
sequential
code) would make a massive difference. I think programmers would
intuitively
understand this situation and wouldn't expect magic from the compiler.
The places where it matters the most are places where it's dead obvious
what the
numeric limits are, and it should be equally obvious to the compiler.

It's the sort of thing that could be improved with time, but even a very
simple
start would be a big help in many cases.

It's true you could do a sequential only, very conservative form of data
flow analysis.

But be aware there are other problems, such as:

int *p = ...;
...
x &= 0xFFFF;
++(*p);      // hmm, is x affected?
short s = x;

Yes indeed, there are certainly many ways to interfere with it, but I
certainly hope that most lines of code you deal with on a daily basis
aren't like this ;)
Obviously it would grow in maturity over time. In that example, just revert
to unlimited assumptions when it encounters the pointer operation. Limits
may then be refined again in following code.
I can imagine things like proper escaping analysis as planned could
eventually help in cases like this.

Anyway, it's just something to think about. I think it would offer
additional safety/correctness to the language; anywhere you type a blind
cast, you are potentially hiding a future bug. I'd rather never cast when I
don't absolutely need/intend to.
```
May 12 2013
```On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
Has this been discussed? It seems simple, and it's invisible to
the language.

There is no such thing as a language feature invisible to the
language – or at least there should not be. Even a feature, such
as this, which only makes the language more permissive needs
exact specifications, for all D compilers have to implement it in
the same way.

David
```
May 13 2013
```On Monday, 13 May 2013 at 12:37:34 UTC, David Nadlinger wrote:
On Monday, 13 May 2013 at 00:00:54 UTC, Manu wrote:
Has this been discussed? It seems simple, and it's invisible
to the language.

There is no such thing as a language feature invisible to the
language – or at least there should not be. Even a feature,
such as this, which only makes the language more permissive
needs exact specifications, for all D compilers have to
implement it in the same way.

David

is(typeof()) say that any permissive change can also break code
in D. This is our little specificity ;)
```
May 13 2013
Marco Leise <Marco.Leise gmx.de> writes:
```Am Mon, 13 May 2013 10:00:40 +1000
schrieb Manu <turkeyman gmail.com>:

So, here's an issue that constantly drives me nuts, and an elegant solution
seems so do-able.

void func(int x)
{
x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

if(x < 256)
{
byte b = x; // Error! (we also know x is 0 .. 255)
}
}

Yes, I came across that one every so often myself. But as
David pointed out, such a feature would have to make it into
the D specification and implemented in all D compilers.
In any case I wouldn't call a half-baked "I will solve the
halting problem" flow-analysis elegant. :p

Let's think different. You could write
short s = x & 0xFFFF;
on your first line, as you may already know DMD does this bit
of range analysis (same for modulo).

Also what was your intention? func() practically takes a short
parameter. Maybe it's signature should be
void func(short x) {...}
If that's a problem on the calling site, maybe that can be
fixed?

bearophile's suggestion sounds like something that's pretty
solid and naturally extends the likes of "short s = x & 0xFFF;"
So if your "int x" was immutable it would really be trivial to
have your "if" change the range of immutable x during that
scope and have the assignment only fail because x may be
-1000. ;)

--
Marco
```
May 15 2013
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
```On Thu, 16 May 2013 07:34:18 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

Am Mon, 13 May 2013 10:00:40 +1000
schrieb Manu <turkeyman gmail.com>:

void func(int x)
{
x &= 0xFFFF;
short s = x; // Error! (but we know x is 0 .. 65535)

if(x < 256)
{
byte b = x; // Error! (we also know x is 0 .. 255)
}
}

[snip]
Also what was your intention? func() practically takes a short
parameter. Maybe it's signature should be
void func(short x) {...}
If that's a problem on the calling site, maybe that can be
fixed?

I believe the intention was to show an issue with the language, not to
make a
function that assigns to a short and a byte only to throw them away.

Perhaps he needed to do some intermediate calculations at higher precision,
perhaps he needs it to be an int for interfacing with a C function, etc.

--
Simen
```
May 16 2013