www.digitalmars.com         C & C++   DMDScript  

D - Slicing

reply Karim Sharif <Karim_member pathlink.com> writes:
Hi,

As posted before, Im working on a front end to zlib using the stream framework
provided in phobos. I successfully managed to get gnu’s zlib library working
using Digital Mars’s compiler on Win2k. One can assume lots of binary array
manipulation when dealing with compression and streams, however I have
constantly got strange results depending on the approach I used for distributing
memory during compression. When using a static array and repeatedly calling the
C library functions, the less memory I used the more my byte count was off
(exactly one byte for each call). 

After several hours of research, I determined this was due to the method of
copying I chose… slicing. My understanding of arrays (I come from a strong C
background) for D was that they are 0 based as in C and that slicing was done
based on the array operations as defined in the snippet I copied into the
posting.

I determined that when utilizing slicing operations, arrays are in fact both 0
and 1 based for the upper bound but always 0 based for the lower bound? So I
decided to write a quick test application (below) to test the theory and, well,
the results are below.

So, can anybody tell me whether this is intentional, just the way slicing is
really meant to be, and if so, what are the rules for the bounds of arrays,
because one byte can through off a cyclic redundancy check pretty quickly. This
is the first time I’ve actually used slicing in my work so maybe I am just dumb
to the way it’s supposed to work.

My two cents, just keep it 0 based all the way! 

Not trying to flame, just trying to get it right,

Karim Sharif

snippet from D lang. spec;

Array Operations

In general, (a[n..m] op e) is defined as: 
for (i = n; i < m; i++)
a[i] op e;

So, for the expression: 
a[] = b[] + 3;

the result is equivalent to: 
for (i = 0; i < a.length; i++)
a[i] = b[i] + 3; 


simple test application to test theory (test.d);

import c.stdio;

void main(){

// regular char array
char [] a = "This is a test string";
char [] b;

b = a[0..a.length];
puts(b);
b = a[];
puts(b);
b = a[0..21];             <-  But then why would this work ?
puts(b);
b = a[0..20];
puts(b);
puts(&a[20]);
puts(&a[21]);             <-  This would be the over bound error

}

compiled and produced output of cmd

C:\dmd\src\dzlib>dmd test.d
link short,,,user32+kernel32/noi

C:\dmd\src\dzlib>test
This is a test string
This is a test string
This is a test string
This is a test string
g
Error: ArrayBoundsError short.d(18)
Nov 08 2002
next sibling parent reply Patrick Down <Patrick_member pathlink.com> writes:
In article <aqgk59$19l2$1 digitaldaemon.com>, Karim Sharif says...
// regular char array
char [] a = "This is a test string";
char [] b;
I think your problem with this example lies in the fact that D will null terminate string constants.
Nov 08 2002
parent Karim Sharif <Karim_member pathlink.com> writes:
Thanks, nice try, however…

Although my example code used a char[] the class I am working on only uses
byte[](except for the toString() method, but Im not currently using that), and
the phenomenon continues. I can only see this happening in char * or char[]
casts because the compiler is trying to make up for the fact that D strings and
C strings are not really the same (and my code doesn’t use any constants in this
respect either) I would have been happy to post the entire file, but that seemed
a little excessive in terms of asking readers to drudge through it all, rather
pass me an email and Ill send the code to you if your really interested.

Thanks for the thought though,

Karim

Karim SharifClan.com

In article <aqgltd$1bi3$1 digitaldaemon.com>, Patrick Down says...
In article <aqgk59$19l2$1 digitaldaemon.com>, Karim Sharif says...
// regular char array
char [] a = "This is a test string";
char [] b;
I think your problem with this example lies in the fact that D will null terminate string constants.
Nov 08 2002
prev sibling parent reply Mac Reiter <Mac_member pathlink.com> writes:
After several hours of research, I determined this was due to the method of
copying I chose… slicing. My understanding of arrays (I come from a strong C
background) for D was that they are 0 based as in C and that slicing was done
based on the array operations as defined in the snippet I copied into the
posting.
First off, I didn't look terribly closely at your code (at work and in a hurry). So if the following doesn't apply, I apologize. Keep in mind that the slicing syntax uses a "half open" range. The right hand side is actually non-inclusive. To get away from the pseudo-mathematician terms and get concrete: array[5..10] DOES contain a[5], a[6], a[7], a[8], and a[9] does NOT contain a[10] I think that would explain the feeling that the slice is 1 based for the upper bound. It isn't. It is 0 based, it just doesn't include the upper bound. I personally hate this syntax. I find it to be misleading purely for the sake of convenience. But I've had the week-long war over it and lost, so that's just the way it is. It pretty much means that I won't ever use slices, but that's OK, I suppose. The precise reason why I hate this syntax is because, when discussing ranges, there is an old, and amazingly well established syntax for describing ranges. We leaned it in grade school when we were learning the number line: [5..10) means 5,6,7,8,9 [5..10] means 5,6,7,8,9,10 (5..10] means 6,7,8,9,10 (5..10) means 6,7,8,9 I don't care whether or not D supports all 4 forms -- some of those would be impossible to distinguish from function calls without taking symbol table context into account. But D ONLY supports the first version, semantically, so I would prefer if it used the accepted standard way of expressing those semantics. And since the lead symbol is still '[', it should still be easy to parse and lex out. Mac
Nov 08 2002
next sibling parent reply "Lloyd Dupont" <lloyd galador.net> writes:
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]
Hi, I'm a new user.. anyway I do agree you, I found this syntax quite strange and counter intuitive... it took me twice as much as it should be to understand this syntax.. I found this is very different from a[N] which include a[0], a[1], ... a[N-1]. a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... I think more voice should complain about that....
Nov 08 2002
next sibling parent "Sean L. Palmer" <seanpalmer directvinternet.com> writes:
We did, and it got vetoed by Walter.  Check the older threads.

Sean

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]
Hi, I'm a new user.. anyway I do agree you, I found this syntax quite strange and counter intuitive... it took me twice as much as it should be to understand this syntax.. I found this is very different from a[N] which include a[0], a[1], ... a[N-1]. a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... I think more voice should complain about that....
Nov 08 2002
prev sibling parent reply "Mike Wynn" <mike.wynn l8night.co.uk> writes:
I actually think that it is right and expected it to work as it does,
because D has zero based arrays.
it is a matter of convenience.
int[10] a; // creates an array of ten items, 0..(10-1)
a[n..n+len] creates a slice of length 'len'
much the same as java.lang.String's substring method.

its just another one of those "depends who you are" questions that separate
programmer from each other and programmers from mathematicians.  along with
should arrays start from index 0 or 1, should post-increment operators be in
the language

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]
Hi, I'm a new user.. anyway I do agree you, I found this syntax quite strange and counter intuitive... it took me twice as much as it should be to understand this syntax.. I found this is very different from a[N] which include a[0], a[1], ... a[N-1]. a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... I think more voice should complain about that....
Nov 09 2002
parent reply Mac Reiter <Mac_member pathlink.com> writes:
In article <aqjl6t$214u$1 digitaldaemon.com>, Mike Wynn says...
I actually think that it is right and expected it to work as it does,
because D has zero based arrays.
it is a matter of convenience.
int[10] a; // creates an array of ten items, 0..(10-1)
a[n..n+len] creates a slice of length 'len'
much the same as java.lang.String's substring method.
But if what interests you is creating a slice based on its length, why not: a[n:len]; a[n#len]; // arguably better - kinda looks like a +, also has denotation of // "number of things" That saves the extra "n+" part. It also focuses on the length of the slice, rather than the endpoints, so it is a closer match to what all of the people who like the current slicing syntax seem to use. What bothers me is that ".." denotes a range, not a length, so you shouldn't argue that "it works well for lengths". If a length-based syntax (like one of the above) is introduced, I would have absolutely *no* complaints, since it would be doing what it is supposed to do. But ranges have their own syntax (".."), and are orthogonal to concerns of length or of 0 or 1 based arrays. The other thing that bothers me is that the justification for this slicing system *always* comes back as "it works well for lengths". Which means that novices are going to be taught that they can pull out a subarray of length n with the syntax: a[0..n] and they're going to misunderstand what really happened. Because the explanation will focus on length, and how a substring of length n in a 0 based array will not include [n] itself, they will start thinking in terms of lengths, not ranges. Then, when they need to extract a subarray from further into the array, they're going to try: a[5..n] and not understand why it failed. They didn't see the "full" version of the first syntax: a[0..0+n] because *nobody* is going to write that. So they weren't aware of having to add the starting location to the length to get what they wanted. I vote for switching to length-based slices, with the a[start#len] syntax. Anyone else in favor? Mac
its just another one of those "depends who you are" questions that separate
programmer from each other and programmers from mathematicians.  along with
should arrays start from index 0 or 1, should post-increment operators be in
the language

"Lloyd Dupont" <lloyd galador.net> wrote in message
news:aqhfjn$27n6$1 digitaldaemon.com...
 array[5..10]
 DOES contain a[5], a[6], a[7], a[8], and a[9]
 does NOT contain a[10]
Hi, I'm a new user.. anyway I do agree you, I found this syntax quite strange and counter intuitive... it took me twice as much as it should be to understand this syntax.. I found this is very different from a[N] which include a[0], a[1], ... a[N-1]. a[i..j] should, in my mind, either countain all element from i to j inclusive or begin at i and include j elements.... I think more voice should complain about that....
Nov 11 2002
parent reply "Sean L. Palmer" <seanpalmer directvinternet.com> writes:
I'm in favor of that, but I'd also like range slices to use the
a[first..last] syntax.

I hope to God they don't switch over from the "it works well for lengths"
argument to "there's too much legacy code written in D to change it now"
argument.  ;)  I think it's bad the way it is now, and needs changing.

Anyone from any background is going to look at a[1..3] and assume it's
talking about a range including entry 1, entry 3, and everything in between.
That's intuitively how people think about ranges;  that's how they work in
other languages.

Sean

"Mac Reiter" <Mac_member pathlink.com> wrote in message
news:aqoobq$tmn$1 digitaldaemon.com...
 But if what interests you is creating a slice based on its length, why
not:
 a[n:len];
 a[n#len]; // arguably better - kinda looks like a +, also has denotation
of
 // "number of things"

 That saves the extra "n+" part.  It also focuses on the length of the
slice,
 rather than the endpoints, so it is a closer match to what all of the
people who
 like the current slicing syntax seem to use.

 What bothers me is that ".." denotes a range, not a length, so you
shouldn't
 argue that "it works well for lengths".  If a length-based syntax (like
one of
 the above) is introduced, I would have absolutely *no* complaints, since
it
 would be doing what it is supposed to do.  But ranges have their own
syntax
 (".."), and are orthogonal to concerns of length or of 0 or 1 based
arrays.
 The other thing that bothers me is that the justification for this slicing
 system *always* comes back as "it works well for lengths".  Which means
that
 novices are going to be taught that they can pull out a subarray of length
n
 with the syntax:

 a[0..n]

 and they're going to misunderstand what really happened.  Because the
 explanation will focus on length, and how a substring of length n in a 0
based
 array will not include [n] itself, they will start thinking in terms of
lengths,
 not ranges.  Then, when they need to extract a subarray from further into
the
 array, they're going to try:

 a[5..n]

 and not understand why it failed.  They didn't see the "full" version of
the
 first syntax:

 a[0..0+n]

 because *nobody* is going to write that.  So they weren't aware of having
to add
 the starting location to the length to get what they wanted.

 I vote for switching to length-based slices, with the a[start#len] syntax.
 Anyone else in favor?
 Mac
Nov 11 2002
parent reply Burton Radons <loth users.sourceforge.net> writes:
Sean L. Palmer wrote:

 I'm in favor of that, but I'd also like range slices to use the
 a[first..last] syntax.

 I hope to God they don't switch over from the "it works well for lengths"
 argument to "there's too much legacy code written in D to change it now"
 argument.  ;)  I think it's bad the way it is now, and needs changing.
How about a "this is a stupid thing to be thinking about for more than twenty seconds" argument? I use both Python and D and _never even noticed_ that they use different end-point inclusiveness with slicing; it was as subconscious a part of programming as the difference in logic operators. Even in comparable code it's the whole difference of a plus one or a minus one.
Nov 11 2002
next sibling parent Mac Reiter <reiter nomadics.com> writes:
Burton Radons wrote:
 How about a "this is a stupid thing to be thinking about for more than 
 twenty seconds" argument?  I use both Python and D and _never even 
 noticed_ that they use different end-point inclusiveness with slicing; 
 it was as subconscious a part of programming as the difference in logic 
 operators.  Even in comparable code it's the whole difference of a plus 
 one or a minus one.
And what percentage of the bugs in software are "off by one" errors? Most heap corruption bugs are only "the whole difference of a plus one or a minus one". If you "never even noticed" that they use different end-point inclusiveness rules, I wonder how many bugs and incorrect behaviors you also "never even noticed". Hmmm... Should I even mention my concerns about the author of "D for Linux" not being aware of how D slicing works? Oops, too late... So how does slicing work in DLI? Mac
Nov 11 2002
prev sibling parent "Sean L. Palmer" <seanpalmer directvinternet.com> writes:
Do you have *ANY* idea just how many bugs fall in the "off by one" category?

If you think for one minute that this is an issue not worth thinking about
for more than twenty seconds, I for one can't respect your judgment.  And if
you have any influence on the design of D, I fear it will turn out a worse
language for it.

Oh well.

Sean

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:aqoskm$11v9$1 digitaldaemon.com...
 Sean L. Palmer wrote:

 I'm in favor of that, but I'd also like range slices to use the
 a[first..last] syntax.

 I hope to God they don't switch over from the "it works well for
lengths"
 argument to "there's too much legacy code written in D to change it now"
 argument.  ;)  I think it's bad the way it is now, and needs changing.
How about a "this is a stupid thing to be thinking about for more than twenty seconds" argument? I use both Python and D and _never even noticed_ that they use different end-point inclusiveness with slicing; it was as subconscious a part of programming as the difference in logic operators. Even in comparable code it's the whole difference of a plus one or a minus one.
Nov 11 2002
prev sibling parent Mark Evans <Mark_member pathlink.com> writes:
array[5..10]

DOES contain a[5], a[6], a[7], a[8], and a[9]

does NOT contain a[10]

I personally hate this syntax.
Yeah, it really stinks badly. I have to open my window now... Mark
Nov 08 2002