digitalmars.D.learn - Why is size

digitalmars.D.learn - Why is size_t unsigned?

JS (10/10) Jul 21 2013 Doing simple stuff like

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (14/23) Jul 21 2013 checks.

JS (6/33) Jul 21 2013 for strings themselves, I would prefer an int to be returned. The

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/24) Jul 21 2013 So, you agree with the answer to the question in the subject line but
Andrej Mitrovic (18/20) Jul 22 2013 It does:

JS (2/22) Jul 22 2013 Cool... This should make life easier! Thanks.
monarch_dodra (3/23) Jul 22 2013 99% sure that's unspecified behavior. I wouldn't rely on anything

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (15/42) Jul 22 2013 Two more solutions one with foreach one without any explicit looping:
Maxim Fomin (26/53) Jul 22 2013 Of course it is specified behavior.

monarch_dodra (24/79) Jul 22 2013 So... you are saying that if the grammar allows it, then the

Maxim Fomin (26/53) Jul 22 2013 You may argue that although grammar does allows it, the feature

John Colvin (4/14) Jul 22 2013 defined: yes

Andrej Mitrovic (5/7) Jul 22 2013 Actually it used to be a bug that writing to the index /without/ ref

bearophile (16/21) Jul 22 2013 The right design in my opinion is to have the iteration variable

Maxim Fomin (3/7) Jul 22 2013 This comes with another issue embedded here

H. S. Teoh (17/23) Jul 21 2013 I'm not sure if it's your intention, but your code above has an off-by-1

JS (18/50) Jul 21 2013 but a size has nothing to do with an address. Sure in x86 we may

H. S. Teoh (38/72) Jul 21 2013 Size is the absolute difference between two addresses. So it must be

monarch_dodra (6/10) Jul 22 2013 Not really, you could instead just write your loop correctly.

JS (8/19) Jul 22 2013 Oh sure... problem solved... rriiiighhhtt.....

monarch_dodra (4/26) Jul 22 2013 What about "s[i - 1..n]"? I don't see how having your "i" be
David (2/8) Jul 22 2013 -1 as index is illegal anyways, so this code will fail even with signed

Regan Heath (14/24) Jul 22 2013 I have always found the whole size is an unsigned int thing annoying too...

John Colvin (4/12) Jul 22 2013 ARM directly supports saturating arithmetic, so this approach

Marco Leise (17/28) Jul 22 2013 And my opinion on the matter is that it is catastrophic style

"JS" <js.mdnq gmail.com> writes:

Doing simple stuff like

for(int i = 0; i < s.length - 1; i++) fails catastrophically if s 
is empty. To make right one has to reduce performance by writing 
extra checks.

There seems to be no real good reason why size_t is unsigned... 
Surely one doesn't require too many strings larger than 2^63 bits 
on an x64 os...

I running into a lot of trouble because of the way D deals with 
implicit casting of between signed and unsigned.

please don't tell me to use foreach... isn't not a panacea.

Jul 21 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/21/2013 08:47 PM, JS wrote:

 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is
 empty. To make right one has to reduce performance by writing extra 

checks.

Checks are needed for program correctness. If not in source code, in 
compiler generated code, or the microprocessor itself. The compiler and 
the microprocessor would not do such things for performance reasons. It 
is because sometimes only the programmer knows that the check is 
unnecessary.

 There seems to be no real good reason why size_t is unsigned...

How about, every addressable memory locations must be countable?

 Surely one doesn't require too many strings larger than 2^63 bits on 

an x64
 os...

Agreed.

 I running into a lot of trouble because of the way D deals with implicit
 casting of between signed and unsigned.

D is behaving the same way as C and C++ there.

 please don't tell me to use foreach... isn't not a panacea.

I would still prefer foreach because it is more convenient and safer 
because of needing less code.

Ali

Jul 21 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 22 July 2013 at 03:58:31 UTC, Ali Çehreli wrote:
 On 07/21/2013 08:47 PM, JS wrote:

 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically

 if s is
 empty. To make right one has to reduce performance by writing

 extra checks.

 Checks are needed for program correctness. If not in source 
 code, in compiler generated code, or the microprocessor itself. 
 The compiler and the microprocessor would not do such things 
 for performance reasons. It is because sometimes only the 
 programmer knows that the check is unnecessary.

 There seems to be no real good reason why size_t is

 unsigned...

 How about, every addressable memory locations must be countable?

for strings themselves, I would prefer an int to be returned. The 
size of a string has nothing to do with it's location in memory.

 Surely one doesn't require too many strings larger than 2^63

 bits on an x64
 os...

 Agreed.

 I running into a lot of trouble because of the way D deals

 with implicit
 casting of between signed and unsigned.

 D is behaving the same way as C and C++ there.



 please don't tell me to use foreach... isn't not a panacea.

 I would still prefer foreach because it is more convenient and 
 safer because of needing less code.

 Ali

foreach doesn't allow you to modify the index to skip over 
elements.

Jul 21 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/21/2013 09:36 PM, JS wrote:

 On Monday, 22 July 2013 at 03:58:31 UTC, Ali Çehreli wrote:

 There seems to be no real good reason why size_t is

 unsigned...

 How about, every addressable memory locations must be countable?

 for strings themselves, I would prefer an int to be returned. The size
 of a string has nothing to do with it's location in memory.

So, you agree with the answer to the question in the subject line but 
you want to change the topic to strings. Fair enough...

 D is behaving the same way as C and C++ there.





 please don't tell me to use foreach... isn't not a panacea.

 I would still prefer foreach because it is more convenient and safer
 because of needing less code.

 Ali

 foreach doesn't allow you to modify the index to skip over elements.

I did not claim otherwise. I said "more convenient", which is 
indisputable; and I said "safer", which your original code has become an 
example of.

Ali

Jul 21 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

It does:

-----
import std.stdio;

void main()
{
    int[] x = [1, 2, 3, 4, 5];
    foreach (ref i; 0 .. 5)
    {
        writeln(x[i]);
        ++i;
    }
}
-----

Writes:
1
3
5

Jul 22 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 22 July 2013 at 12:51:31 UTC, Andrej Mitrovic wrote:
 On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

 It does:

 -----
 import std.stdio;

 void main()
 {
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         writeln(x[i]);
         ++i;
     }
 }
 -----

 Writes:
 1
 3
 5

Cool... This should make life easier! Thanks.

Jul 22 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 22 July 2013 at 12:51:31 UTC, Andrej Mitrovic wrote:
 On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

 It does:

 -----
 import std.stdio;

 void main()
 {
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         writeln(x[i]);
         ++i;
     }
 }
 -----

 Writes:
 1
 3
 5

99% sure that's unspecified behavior. I wouldn't rely on anything 
like that.

Jul 22 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/22/2013 08:04 AM, monarch_dodra wrote:

 On Monday, 22 July 2013 at 12:51:31 UTC, Andrej Mitrovic wrote:
 On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

 It does:

 -----
 import std.stdio;

 void main()
 {
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         writeln(x[i]);
         ++i;
     }
 }
 -----

 Writes:
 1
 3
 5

 99% sure that's unspecified behavior. I wouldn't rely on anything like
 that.

Two more solutions one with foreach one without any explicit looping:

import std.stdio;
import std.range;

void main()
{
     int[] x = [1, 2, 3, 4, 5];

     foreach (i; iota(0, 5, 2))
     {
         writeln(x[i]);
         ++i;
     }

     writeln(x.indexed(iota(0, x.length, 2)));
}

Ali

Jul 22 2013

"Maxim Fomin" <maxim maxim-fomin.ru> writes:

On Monday, 22 July 2013 at 15:04:25 UTC, monarch_dodra wrote:
 On Monday, 22 July 2013 at 12:51:31 UTC, Andrej Mitrovic wrote:
 On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

 It does:

 -----
 import std.stdio;

 void main()
 {
    int[] x = [1, 2, 3, 4, 5];
    foreach (ref i; 0 .. 5)
    {
        writeln(x[i]);
        ++i;
    }
 }
 -----

 Writes:
 1
 3
 5

 99% sure that's unspecified behavior. I wouldn't rely on 
 anything like that.

Of course it is specified behavior.

ForeachStatement:
     Foreach (ForeachTypeList ; Aggregate) NoScopeNonEmptyStatement

Foreach:
     foreach
     foreach_reverse

ForeachTypeList:
     ForeachType
     ForeachType , ForeachTypeList

ForeachType:
     refopt BasicType Declarator
     refopt Identifier

Aggregate:
     Expression

This is an example of unspecified behavior:

import std.stdio;

void main()
{
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         __limit1631--;
         writeln(x[i]);
     }
}

Jul 22 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 22 July 2013 at 15:39:11 UTC, Maxim Fomin wrote:
 On Monday, 22 July 2013 at 15:04:25 UTC, monarch_dodra wrote:
 On Monday, 22 July 2013 at 12:51:31 UTC, Andrej Mitrovic wrote:
 On 7/22/13, JS <js.mdnq gmail.com> wrote:
 foreach doesn't allow you to modify the index to skip over
 elements.

 It does:

 -----
 import std.stdio;

 void main()
 {
   int[] x = [1, 2, 3, 4, 5];
   foreach (ref i; 0 .. 5)
   {
       writeln(x[i]);
       ++i;
   }
 }
 -----

 Writes:
 1
 3
 5

 99% sure that's unspecified behavior. I wouldn't rely on 
 anything like that.

 Of course it is specified behavior.

 ForeachStatement:
     Foreach (ForeachTypeList ; Aggregate) 
 NoScopeNonEmptyStatement

 Foreach:
     foreach
     foreach_reverse

 ForeachTypeList:
     ForeachType
     ForeachType , ForeachTypeList

 ForeachType:
     refopt BasicType Declarator
     refopt Identifier

 Aggregate:
     Expression

So... you are saying that if the grammar allows it, then the 
behavior is specified?

All I see, is you iterating over references to the elements of an 
aggregate. The final behavior really depends on how said 
aggregate is implemented. If anything, if the behavior *was* 
defined, then I'd simply argue the behavior is wrong: I don't see 
why changing the values of the elements of the aggregate should 
change the amount of elements you iterate on at all. Also:

//----
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; iota(0, 5))
     {
         writeln(x[i]);
         ++i;
     }
//----

This also compiles, but I used a different aggregate, yet 
represents the same thing. Because it is implemented differently, 
I get a completely different result. Unless I'm mistaken, when a 
result depends on the implementation, and the implementation 
doesn't state what the result is, then that's what unspecified 
behavior is. (unspecified, not undefined).

 This is an example of unspecified behavior:

 import std.stdio;

 void main()
 {
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         __limit1631--;
         writeln(x[i]);
     }
 }

What is "__limit1631" ? Doesn't compile for me.

Jul 22 2013

"Maxim Fomin" <maxim maxim-fomin.ru> writes:

On Monday, 22 July 2013 at 15:51:45 UTC, monarch_dodra wrote:
 So... you are saying that if the grammar allows it, then the 
 behavior is specified?

You may argue that although grammar does allows it, the feature 
is semantically not defined. However here it is known what "ref 
int i" means, to be more precise what you can do with objects 
marked with ref attribute.

 All I see, is you iterating over references to the elements of 
 an aggregate. The final behavior really depends on how said 
 aggregate is implemented. If anything, if the behavior *was* 
 defined, then I'd simply argue the behavior is wrong: I don't 
 see why changing the values of the elements of the aggregate 
 should change the amount of elements you iterate on at all. 
 Also:

 //----
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; iota(0, 5))
     {
         writeln(x[i]);
         ++i;
     }
 //----

 This also compiles, but I used a different aggregate, yet 
 represents the same thing. Because it is implemented 
 differently, I get a completely different result. Unless I'm 
 mistaken, when a result depends on the implementation, and the 
 implementation doesn't state what the result is, then that's 
 what unspecified behavior is. (unspecified, not undefined).

This is different because in 0..5 ref int maps directly to 
variable modified, but in iota() it maps to value returned by 
.front property function and since it doesn't return by ref, 
refness is wiped out. Behavior is defined in both cases.

 This is an example of unspecified behavior:

 <...>

 What is "__limit1631" ? Doesn't compile for me.

This one may http://dpaste.dzfl.pl/3faf27ba

extern(C) int printf (const char*, ...);

void main()
{
     int[] x = [1, 2, 3, 4, 5];
     foreach (ref i; 0 .. 5)
     {
         __limit6--; // or 5  depending on dmd version
         printf("%d\n", x[i]);
     }
}

1
2
3

This is example of unspecified behavior (better undefined) due to 
playing with __identifiers and how dmd bug can make D code looks 
strange.

Jul 22 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 22 July 2013 at 16:29:39 UTC, Maxim Fomin wrote:
 This also compiles, but I used a different aggregate, yet 
 represents the same thing. Because it is implemented 
 differently, I get a completely different result. Unless I'm 
 mistaken, when a result depends on the implementation, and the 
 implementation doesn't state what the result is, then that's 
 what unspecified behavior is. (unspecified, not undefined).

 This is different because in 0..5 ref int maps directly to 
 variable modified, but in iota() it maps to value returned by 
 .front property function and since it doesn't return by ref, 
 refness is wiped out. Behavior is defined in both cases.

defined: yes
entirely dependant on implementation details: also yes

It's not a pattern to be relied on in the slightest.

Jul 22 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 7/22/13, monarch_dodra <monarchdodra gmail.com> wrote:
 99% sure that's unspecified behavior. I wouldn't rely on anything
 like that.

Actually it used to be a bug that writing to the index /without/ ref
would end up changing the iteration order, but this was fixed in
2.063. It's in the changelog:

http://dlang.org/changelog.html#foreachref

Jul 22 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Andrej Mitrovic:

 Actually it used to be a bug that writing to the index 
 /without/ ref
 would end up changing the iteration order, but this was fixed in
 2.063. It's in the changelog:

 http://dlang.org/changelog.html#foreachref

The right design in my opinion is to have the iteration variable 
immutable on default, and mutable/reference on request. This 
saves from bugs and offers new optimization opportunities. But 
unfortunately D doesn't have a "mutable" keyword, D variables are 
generally mutable on default, and Walter seemed not interested in 
my numerous explanations that the mutable foreach iteration 
variable is bug-prone. So Hara has adopted a compromise, now if 
you don't use "ref" the actual iteration variable on an interval 
doesn't change. But it's mutable on default.

So the standard idiom to use foreach on interval needs to be:

foreach (immutable i; 0 .. 10) { ... }

And the programmer has to remove that immutable only where really 
the iteration variable must change :-)

Bye,
bearophile

Jul 22 2013

"Maxim Fomin" <maxim maxim-fomin.ru> writes:

On Monday, 22 July 2013 at 21:08:48 UTC, bearophile wrote:
 So the standard idiom to use foreach on interval needs to be:

 foreach (immutable i; 0 .. 10) { ... }


 Bye,
 bearophile

This comes with another issue embedded here

http://forum.dlang.org/thread/felqszcrbvtrepjtfpul forum.dlang.org

Jul 22 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
 Doing simple stuff like
 
 for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is
 empty. To make right one has to reduce performance by writing extra
 checks.

I'm not sure if it's your intention, but your code above has an off-by-1
error (unless you were planning on iterating over one less element than
there are).


 There seems to be no real good reason why size_t is unsigned...

[...]

The reason is because it must span the range of CPU-addressable memory
addresses. Note that due to way virtual memory works, that may have
nothing to do with the actual size of your data (e.g. on Linux, it's
possible to allocate more memory than you actually have, as long as you
don't actually use it all -- the kernel simply maps the addresses in
your page tables into a single zeroed-out page, and marks it as
copy-on-write, so you can actually have an array bigger than available
memory as long as most of the elements are binary zeroes (though I don't
know if druntime currently actually supports such a thing)).


T

-- 
MASM = Mana Ada Sistem, Man!

Jul 21 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 22 July 2013 at 04:31:12 UTC, H. S. Teoh wrote:
 On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:
 Doing simple stuff like
 
 for(int i = 0; i < s.length - 1; i++) fails catastrophically 
 if s is
 empty. To make right one has to reduce performance by writing 
 extra
 checks.

 I'm not sure if it's your intention, but your code above has an 
 off-by-1
 error (unless you were planning on iterating over one less 
 element than
 there are).

yeah, I know...
 There seems to be no real good reason why size_t is unsigned...

 [...]

 The reason is because it must span the range of CPU-addressable 
 memory
 addresses. Note that due to way virtual memory works, that may 
 have
 nothing to do with the actual size of your data (e.g. on Linux, 
 it's
 possible to allocate more memory than you actually have, as 
 long as you
 don't actually use it all -- the kernel simply maps the 
 addresses in
 your page tables into a single zeroed-out page, and marks it as
 copy-on-write, so you can actually have an array bigger than 
 available
 memory as long as most of the elements are binary zeroes 
 (though I don't
 know if druntime currently actually supports such a thing)).


 T

but a size has nothing to do with an address. Sure in x86 we may 
need to allocate 3GB of data and this would require size_t > 2^31 
==> it must be unsigned. But strings really don't need to have an 
unsigned length. If you really need a string of length > size_t/2 
then have the string type implement a different length property.

string s;

s.length <== a signed size_t
s.size <= an unsigned size_t

this way, for 99.99999999% of the cases where strings are 
actually < 1/2 size_t, one doesn't have to waste cycles doing 
extra comparing or typing extra code... or better, spending hours 
looking for some obscure bug because one compared an int to a 
uint and no warning was thrown.

Alternatively,

for(int i = 0; i < s.length - 1; i++) could at lease check for 
underflow on the cmp and break the loop.

Jul 21 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jul 22, 2013 at 06:43:47AM +0200, JS wrote:
 On Monday, 22 July 2013 at 04:31:12 UTC, H. S. Teoh wrote:
On Mon, Jul 22, 2013 at 05:47:34AM +0200, JS wrote:


[...]
There seems to be no real good reason why size_t is unsigned...

[...]

The reason is because it must span the range of CPU-addressable
memory addresses. Note that due to way virtual memory works, that may
have nothing to do with the actual size of your data (e.g. on Linux,
it's possible to allocate more memory than you actually have, as long
as you don't actually use it all -- the kernel simply maps the
addresses in your page tables into a single zeroed-out page, and
marks it as copy-on-write, so you can actually have an array bigger
than available memory as long as most of the elements are binary
zeroes (though I don't know if druntime currently actually supports
such a thing)).


T

 
 but a size has nothing to do with an address.

Size is the absolute difference between two addresses.  So it must be
able to represent up to diff(0, maxAddress).

Besides, the whole thing about size being unsigned is because negative
size makes no sense.

Basically, you have to know that size_t is unsigned, and so you should
be aware of the pitfalls of underflow.


 Sure in x86 we may need to allocate 3GB of data and this would require
 size_t > 2^31 ==> it must be unsigned. But strings really don't need
 to have an unsigned length. If you really need a string of length >
 size_t/2 then have the string type implement a different length
 property.

It would add too much complication to have some types use unsigned size
and others use signed size.


[...]
 this way, for 99.99999999% of the cases where strings are actually <
 1/2 size_t, one doesn't have to waste cycles doing extra comparing
 or typing extra code... or better, spending hours looking for some
 obscure bug because one compared an int to a uint and no warning was
 thrown.

The real issue here is not whether size_t is signed or unsigned, but the
implicit conversion between them.  This, arguably, is a flaw in the
language design.  Bearophile has been clamoring for a long time about
not allowing implicit signed/unsigned conversion. If you search in
bugzilla you should find the issues he filed for this. :)

Once implicit conversion between signed/unsigned is removed, the root
problem disappears -- mistakes like (i < array.length-1) where i is an
int will cause a compile error (comparing signed with unsigned). In the
cases where you actually want wraparound behaviour, an explicit cast
will be required, which is self-documenting and makes the programmer
aware of the potential pitfalls.


 Alternatively,
 
 for(int i = 0; i < s.length - 1; i++) could at lease check for
 underflow on the cmp and break the loop.

If you're bent on subtracting array lengths, do this:

	assert(s.length <= int.max);
	int len = cast(int)s.length;
	for (int i=0; i < len-1; i++) {
		...
	}

The optimizer should be able to reduce len to whatever it does when you
write s.length inside the loop condition. The cast incurs no runtime
penalty, because 2's complement representation for signed/unsigned
numbers are identical when the numbers concerned are positive.

This way, you make the intent of the code clear, and force it to fail if
your assumptions didn't hold. Self-documenting code is always a good
thing.


T

-- 
Век живи - век учись. А дураком помрёшь.

Jul 21 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically if 
 s is empty. To make right one has to reduce performance by 
 writing extra checks.

Not really, you could instead just write your loop correctly.
1. Don't loop on int, you are handling a size_t.
2. Avoid substractions when handling unsigned.

for(size_t i = 0; i + 1 < s.length; i++)

Problem solved?

Jul 22 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 22 July 2013 at 07:12:07 UTC, monarch_dodra wrote:
 On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically 
 if s is empty. To make right one has to reduce performance by 
 writing extra checks.

 Not really, you could instead just write your loop correctly.
 1. Don't loop on int, you are handling a size_t.
 2. Avoid substractions when handling unsigned.

 for(size_t i = 0; i + 1 < s.length; i++)

 Problem solved?

Oh sure... problem solved... rriiiighhhtt.....

how about s[i - 1..n]?

You going to go throw some ifs around the statement that uses 
that? Use a ternary if? So I'm forced to use a longer more 
verbose method, and also introduce bugs, because the most 
obvious, simplest, and logical solution, s[max(0, i-1)..n] won't 
work.

Jul 22 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 22 July 2013 at 09:34:35 UTC, JS wrote:
 On Monday, 22 July 2013 at 07:12:07 UTC, monarch_dodra wrote:
 On Monday, 22 July 2013 at 03:47:36 UTC, JS wrote:
 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically 
 if s is empty. To make right one has to reduce performance by 
 writing extra checks.

 Not really, you could instead just write your loop correctly.
 1. Don't loop on int, you are handling a size_t.
 2. Avoid substractions when handling unsigned.

 for(size_t i = 0; i + 1 < s.length; i++)

 Problem solved?

 Oh sure... problem solved... rriiiighhhtt.....

 how about s[i - 1..n]?

 You going to go throw some ifs around the statement that uses 
 that? Use a ternary if? So I'm forced to use a longer more 
 verbose method, and also introduce bugs, because the most 
 obvious, simplest, and logical solution, s[max(0, i-1)..n] 
 won't work.

What about "s[i - 1..n]"? I don't see how having your "i" be 
signed save your ass in any shape, way or form. What is your 
point?

Jul 22 2013

David <d dav1d.de> writes:

 how about s[i - 1..n]?
 
 You going to go throw some ifs around the statement that uses that? Use
 a ternary if? So I'm forced to use a longer more verbose method, and
 also introduce bugs, because the most obvious, simplest, and logical
 solution, s[max(0, i-1)..n] won't work.

-1 as index is illegal anyways, so this code will fail even with signed
indices

Jul 22 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Mon, 22 Jul 2013 04:47:34 +0100, JS <js.mdnq gmail.com> wrote:

 Doing simple stuff like

 for(int i = 0; i < s.length - 1; i++) fails catastrophically if s is  
 empty. To make right one has to reduce performance by writing extra  
 checks.

 There seems to be no real good reason why size_t is unsigned... Surely  
 one doesn't require too many strings larger than 2^63 bits on an x64  
 os...

 I running into a lot of trouble because of the way D deals with implicit  
 casting of between signed and unsigned.

 please don't tell me to use foreach... isn't not a panacea.

I have always found the whole size is an unsigned int thing annoying too.   
In C/C++ I simply cast strlen() and co to 'int' because in all but very  
specific and well known cases this is entirely sufficient, and it avoids  
the underflow issue entirely.

If we were to design the perfect type for representing a size or length it  
would hold the maximum value of an unsigned int, but would not undeflow to  
max unsigned int, instead it would truncate.

This type would have to be built on top of the existing primitives and  
would therefore be less performant, which is a shame and likely the reason  
it doesn't already exist.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 22 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 22 July 2013 at 11:56:35 UTC, Regan Heath wrote:
 If we were to design the perfect type for representing a size 
 or length it would hold the maximum value of an unsigned int, 
 but would not undeflow to max unsigned int, instead it would 
 truncate.

 This type would have to be built on top of the existing 
 primitives and would therefore be less performant, which is a 
 shame and likely the reason it doesn't already exist.

 R

ARM directly supports saturating arithmetic, so this approach 
could be both practical and fast. However, x86 only supports it 
in SIMD, so it's not ideal.

Jul 22 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 22 Jul 2013 05:47:34 +0200
schrieb "JS" <js.mdnq gmail.com>:

 Doing simple stuff like
 
 for(int i = 0; i < s.length - 1; i++) fails catastrophically if s 
 is empty. To make right one has to reduce performance by writing 
 extra checks.

And my opinion on the matter is that it is catastrophic style
to subtract 1 from a length that's possibly 0. Please write:

if (s.length) foreach (i; 0 .. s.length - 1)

 There seems to be no real good reason why size_t is unsigned... 
 Surely one doesn't require too many strings larger than 2^63 bits 
 on an x64 os...

So the size_t should be signed on 64-bit systems and unsigned
on 32-bit systems? And please note that all length properties
are unsigned, strings, other arrays, range structures, tuples
and bit arrays have a ulong length, because they can
theoretically hold more bits than 2^32 on 32-bit systems.

 I running into a lot of trouble because of the way D deals with 
 implicit casting of between signed and unsigned.

That's a good point. There are languages that disallow this
implicit conversion. I'd also like size_t to be a type of its
own, so you cannot mess up by assigning a size_t to a uint
while developing on 32-bit.

 please don't tell me to use foreach... isn't not a panacea.

Yes, please.

-- 
Marco

Jul 22 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why is size_t unsigned?