D - Slicing

Karim Sharif (63/63) Nov 08 2002 Hi,

Patrick Down (3/6) Nov 08 2002 I think your problem with this example lies in the

Karim Sharif (13/19) Nov 08 2002 Thanks, nice try, however�

Mac Reiter (28/33) Nov 08 2002 First off, I didn't look terribly closely at your code (at work and in a...

Lloyd Dupont (10/13) Nov 08 2002 Hi,

Sean L. Palmer (4/17) Nov 08 2002 We did, and it got vetoed by Walter. Check the older threads.
Mike Wynn (12/25) Nov 09 2002 I actually think that it is right and expected it to work as it does,

Mac Reiter (32/62) Nov 11 2002 But if what interests you is creating a slice based on its length, why n...

Sean L. Palmer (28/59) Nov 11 2002 I'm in favor of that, but I'd also like range slices to use the

Burton Radons (7/12) Nov 11 2002 How about a "this is a stupid thing to be thinking about for more than

Mac Reiter (11/17) Nov 11 2002 And what percentage of the bugs in software are "off by one" errors?
Sean L. Palmer (10/23) Nov 11 2002 Do you have *ANY* idea just how many bugs fall in the "off by one" categ...

Mark Evans (2/6) Nov 08 2002 Yeah, it really stinks badly. I have to open my window now...

Karim Sharif <Karim_member pathlink.com> writes:

Hi,

As posted before, Im working on a front end to zlib using the stream framework
provided in phobos. I successfully managed to get gnu�s zlib library working
using Digital Mars�s compiler on Win2k. One can assume lots of binary array
manipulation when dealing with compression and streams, however I have
constantly got strange results depending on the approach I used for distributing
memory during compression. When using a static array and repeatedly calling the
C library functions, the less memory I used the more my byte count was off
(exactly one byte for each call). 

After several hours of research, I determined this was due to the method of
copying I chose� slicing. My understanding of arrays (I come from a strong C
background) for D was that they are 0 based as in C and that slicing was done
based on the array operations as defined in the snippet I copied into the
posting.

I determined that when utilizing slicing operations, arrays are in fact both 0
and 1 based for the upper bound but always 0 based for the lower bound? So I
decided to write a quick test application (below) to test the theory and, well,
the results are below.

So, can anybody tell me whether this is intentional, just the way slicing is
really meant to be, and if so, what are the rules for the bounds of arrays,
because one byte can through off a cyclic redundancy check pretty quickly. This
is the first time I�ve actually used slicing in my work so maybe I am just dumb
to the way it�s supposed to work.

My two cents, just keep it 0 based all the way! 

Not trying to flame, just trying to get it right,

Karim Sharif

snippet from D lang. spec;

Array Operations

In general, (a[n..m] op e) is defined as: 
for (i = n; i < m; i++)
a[i] op e;

So, for the expression: 
a[] = b[] + 3;

the result is equivalent to: 
for (i = 0; i < a.length; i++)
a[i] = b[i] + 3; 


simple test application to test theory (test.d);

import c.stdio;

void main(){

// regular char array
char [] a = "This is a test string";
char [] b;

b = a[0..a.length];
puts(b);
b = a[];
puts(b);
b = a[0..21];             <-  But then why would this work ?
puts(b);
b = a[0..20];
puts(b);
puts(&a[20]);
puts(&a[21]);             <-  This would be the over bound error

}

compiled and produced output of cmd

C:\dmd\src\dzlib>dmd test.d
link short,,,user32+kernel32/noi

C:\dmd\src\dzlib>test
This is a test string
This is a test string
This is a test string
This is a test string
g
Error: ArrayBoundsError short.d(18)

Nov 08 2002

Patrick Down <Patrick_member pathlink.com> writes:

In article <aqgk59$19l2$1 digitaldaemon.com>, Karim Sharif says...
// regular char array
char [] a = "This is a test string";
char [] b;

I think your problem with this example lies in the
fact that D will null terminate string constants.

Nov 08 2002

Karim Sharif <Karim_member pathlink.com> writes:

Thanks, nice try, however�

Although my example code used a char[] the class I am working on only uses
byte[](except for the toString() method, but Im not currently using that), and
the phenomenon continues. I can only see this happening in char * or char[]
casts because the compiler is trying to make up for the fact that D strings and
C strings are not really the same (and my code doesn�t use any constants in this
respect either) I would have been happy to post the entire file, but that seemed
a little excessive in terms of asking readers to drudge through it all, rather
pass me an email and Ill send the code to you if your really interested.

Thanks for the thought though,

Karim

Karim SharifClan.com

In article <aqgltd$1bi3$1 digitaldaemon.com>, Patrick Down says...
In article <aqgk59$19l2$1 digitaldaemon.com>, Karim Sharif says...
// regular char array
char [] a = "This is a test string";
char [] b;

I think your problem with this example lies in the
fact that D will null terminate string constants.

Nov 08 2002

Mac Reiter <Mac_member pathlink.com> writes:

After several hours of research, I determined this was due to the method of
copying I chose� slicing. My understanding of arrays (I come from a strong C
background) for D was that they are 0 based as in C and that slicing was done
based on the array operations as defined in the snippet I copied into the
posting.

First off, I didn't look terribly closely at your code (at work and in a hurry).
So if the following doesn't apply, I apologize.

Keep in mind that the slicing syntax uses a "half open" range.  The right hand
side is actually non-inclusive.  To get away from the pseudo-mathematician terms
and get concrete:

array[5..10]

DOES contain a[5], a[6], a[7], a[8], and a[9]

does NOT contain a[10]

I think that would explain the feeling that the slice is 1 based for the upper
bound.  It isn't.  It is 0 based, it just doesn't include the upper bound.

I personally hate this syntax.  I find it to be misleading purely for the sake
of convenience.  But I've had the week-long war over it and lost, so that's just
the way it is.  It pretty much means that I won't ever use slices, but that's
OK, I suppose.

The precise reason why I hate this syntax is because, when discussing ranges,
there is an old, and amazingly well established syntax for describing ranges.
We leaned it in grade school when we were learning the number line:

[5..10)  means 5,6,7,8,9
[5..10]  means 5,6,7,8,9,10
(5..10]  means   6,7,8,9,10
(5..10)  means   6,7,8,9

I don't care whether or not D supports all 4 forms -- some of those would be
impossible to distinguish from function calls without taking symbol table
context into account.  But D ONLY supports the first version, semantically, so I
would prefer if it used the accepted standard way of expressing those semantics.
And since the lead symbol is still '[', it should still be easy to parse and lex
out.

Mac

Nov 08 2002

"Lloyd Dupont" <lloyd galador.net> writes:

 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]

Hi,

I'm a new user..
anyway I do agree you, I found this syntax quite strange and counter
intuitive...
it took me twice as much as it should be to understand this syntax..

I found this is very different from
a[N] which include a[0], a[1], ... a[N-1].


a[i..j] should, in my mind, either countain all element from i to j
inclusive or begin at i and include j elements....

I think more voice should complain about that....

Nov 08 2002

"Sean L. Palmer" <seanpalmer directvinternet.com> writes:

We did, and it got vetoed by Walter.  Check the older threads.

Sean

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]

 Hi,

 I'm a new user..
 anyway I do agree you, I found this syntax quite strange and counter
 intuitive...
 it took me twice as much as it should be to understand this syntax..

 I found this is very different from
 a[N] which include a[0], a[1], ... a[N-1].


 a[i..j] should, in my mind, either countain all element from i to j
 inclusive or begin at i and include j elements....

 I think more voice should complain about that....

Nov 08 2002

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

I actually think that it is right and expected it to work as it does,
because D has zero based arrays.
it is a matter of convenience.
int[10] a; // creates an array of ten items, 0..(10-1)
a[n..n+len] creates a slice of length 'len'
much the same as java.lang.String's substring method.

its just another one of those "depends who you are" questions that separate
programmer from each other and programmers from mathematicians.  along with
should arrays start from index 0 or 1, should post-increment operators be in
the language

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]

 Hi,

 I'm a new user..
 anyway I do agree you, I found this syntax quite strange and counter
 intuitive...
 it took me twice as much as it should be to understand this syntax..

 I found this is very different from
 a[N] which include a[0], a[1], ... a[N-1].


 a[i..j] should, in my mind, either countain all element from i to j
 inclusive or begin at i and include j elements....

 I think more voice should complain about that....

Nov 09 2002

Mac Reiter <Mac_member pathlink.com> writes:

In article <aqjl6t$214u$1 digitaldaemon.com>, Mike Wynn says...
I actually think that it is right and expected it to work as it does,
because D has zero based arrays.
it is a matter of convenience.
int[10] a; // creates an array of ten items, 0..(10-1)
a[n..n+len] creates a slice of length 'len'
much the same as java.lang.String's substring method.

But if what interests you is creating a slice based on its length, why not:

a[n:len];
a[n#len]; // arguably better - kinda looks like a +, also has denotation of
// "number of things"

That saves the extra "n+" part.  It also focuses on the length of the slice,
rather than the endpoints, so it is a closer match to what all of the people who
like the current slicing syntax seem to use.

What bothers me is that ".." denotes a range, not a length, so you shouldn't
argue that "it works well for lengths".  If a length-based syntax (like one of
the above) is introduced, I would have absolutely *no* complaints, since it
would be doing what it is supposed to do.  But ranges have their own syntax
(".."), and are orthogonal to concerns of length or of 0 or 1 based arrays.

The other thing that bothers me is that the justification for this slicing
system *always* comes back as "it works well for lengths".  Which means that
novices are going to be taught that they can pull out a subarray of length n
with the syntax:

a[0..n]

and they're going to misunderstand what really happened.  Because the
explanation will focus on length, and how a substring of length n in a 0 based
array will not include [n] itself, they will start thinking in terms of lengths,
not ranges.  Then, when they need to extract a subarray from further into the
array, they're going to try:

a[5..n]

and not understand why it failed.  They didn't see the "full" version of the
first syntax:

a[0..0+n]

because *nobody* is going to write that.  So they weren't aware of having to add
the starting location to the length to get what they wanted.

I vote for switching to length-based slices, with the a[start#len] syntax.
Anyone else in favor?
Mac

its just another one of those "depends who you are" questions that separate
programmer from each other and programmers from mathematicians.  along with
should arrays start from index 0 or 1, should post-increment operators be in
the language

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]

 Hi,

 I'm a new user..
 anyway I do agree you, I found this syntax quite strange and counter
 intuitive...
 it took me twice as much as it should be to understand this syntax..

 I found this is very different from
 a[N] which include a[0], a[1], ... a[N-1].


 a[i..j] should, in my mind, either countain all element from i to j
 inclusive or begin at i and include j elements....

 I think more voice should complain about that....

Nov 11 2002

"Sean L. Palmer" <seanpalmer directvinternet.com> writes:

I'm in favor of that, but I'd also like range slices to use the
a[first..last] syntax.

I hope to God they don't switch over from the "it works well for lengths"
argument to "there's too much legacy code written in D to change it now"
argument.  ;)  I think it's bad the way it is now, and needs changing.

Anyone from any background is going to look at a[1..3] and assume it's
talking about a range including entry 1, entry 3, and everything in between.
That's intuitively how people think about ranges;  that's how they work in
other languages.

Sean

"Mac Reiter" <Mac_member pathlink.com> wrote in message
news:aqoobq$tmn$1 digitaldaemon.com...
 But if what interests you is creating a slice based on its length, why

not:
 a[n:len];
 a[n#len]; // arguably better - kinda looks like a +, also has denotation

of
 // "number of things"

 That saves the extra "n+" part.  It also focuses on the length of the

slice,
 rather than the endpoints, so it is a closer match to what all of the

people who
 like the current slicing syntax seem to use.

 What bothers me is that ".." denotes a range, not a length, so you

shouldn't
 argue that "it works well for lengths".  If a length-based syntax (like

one of
 the above) is introduced, I would have absolutely *no* complaints, since

it
 would be doing what it is supposed to do.  But ranges have their own

syntax
 (".."), and are orthogonal to concerns of length or of 0 or 1 based

arrays.
 The other thing that bothers me is that the justification for this slicing
 system *always* comes back as "it works well for lengths".  Which means

that
 novices are going to be taught that they can pull out a subarray of length

n
 with the syntax:

 a[0..n]

 and they're going to misunderstand what really happened.  Because the
 explanation will focus on length, and how a substring of length n in a 0

based
 array will not include [n] itself, they will start thinking in terms of

lengths,
 not ranges.  Then, when they need to extract a subarray from further into

the
 array, they're going to try:

 a[5..n]

 and not understand why it failed.  They didn't see the "full" version of

the
 first syntax:

 a[0..0+n]

 because *nobody* is going to write that.  So they weren't aware of having

to add
 the starting location to the length to get what they wanted.

 I vote for switching to length-based slices, with the a[start#len] syntax.
 Anyone else in favor?
 Mac

Nov 11 2002

Burton Radons <loth users.sourceforge.net> writes:

Sean L. Palmer wrote:

 I'm in favor of that, but I'd also like range slices to use the
 a[first..last] syntax.

 I hope to God they don't switch over from the "it works well for lengths"
 argument to "there's too much legacy code written in D to change it now"
 argument.  ;)  I think it's bad the way it is now, and needs changing.


How about a "this is a stupid thing to be thinking about for more than 
twenty seconds" argument?  I use both Python and D and _never even 
noticed_ that they use different end-point inclusiveness with slicing; 
it was as subconscious a part of programming as the difference in logic 
operators.  Even in comparable code it's the whole difference of a plus 
one or a minus one.

Nov 11 2002

Mac Reiter <reiter nomadics.com> writes:

Burton Radons wrote:
 How about a "this is a stupid thing to be thinking about for more than 
 twenty seconds" argument?  I use both Python and D and _never even 
 noticed_ that they use different end-point inclusiveness with slicing; 
 it was as subconscious a part of programming as the difference in logic 
 operators.  Even in comparable code it's the whole difference of a plus 
 one or a minus one.

And what percentage of the bugs in software are "off by one" errors? 
Most heap corruption bugs are only "the whole difference of a plus one 
or a minus one".

If you "never even noticed" that they use different end-point 
inclusiveness rules, I wonder how many bugs and incorrect behaviors you 
also "never even noticed".

Hmmm...  Should I even mention my concerns about the author of "D for 
Linux" not being aware of how D slicing works?  Oops, too late...  So 
how does slicing work in DLI?

Mac

Nov 11 2002

"Sean L. Palmer" <seanpalmer directvinternet.com> writes:

Do you have *ANY* idea just how many bugs fall in the "off by one" category?

If you think for one minute that this is an issue not worth thinking about
for more than twenty seconds, I for one can't respect your judgment.  And if
you have any influence on the design of D, I fear it will turn out a worse
language for it.

Oh well.

Sean

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:aqoskm$11v9$1 digitaldaemon.com...
 Sean L. Palmer wrote:

 I'm in favor of that, but I'd also like range slices to use the
 a[first..last] syntax.

 I hope to God they don't switch over from the "it works well for


lengths"
 argument to "there's too much legacy code written in D to change it now"
 argument.  ;)  I think it's bad the way it is now, and needs changing.


 How about a "this is a stupid thing to be thinking about for more than
 twenty seconds" argument?  I use both Python and D and _never even
 noticed_ that they use different end-point inclusiveness with slicing;
 it was as subconscious a part of programming as the difference in logic
 operators.  Even in comparable code it's the whole difference of a plus
 one or a minus one.

Nov 11 2002

Mark Evans <Mark_member pathlink.com> writes:

array[5..10]

DOES contain a[5], a[6], a[7], a[8], and a[9]

does NOT contain a[10]

I personally hate this syntax.

Yeah, it really stinks badly.  I have to open my window now...

Mark

Nov 08 2002

D Programming

C/C++ Programming

Other

D - Slicing