www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Array indexing and slicing

reply "Luís Marques" <luismarques+spam gmail.com> writes:
Hello,

In the array section of the D docs (http://www.digitalmars.com/d/arrays.html),
there is no mention of the $ operator. Only the "a = a[2 .. a.length]" idiom is
shown. I suppose that should be fixed.

My experience of array slicing comes from Python, so I found it a bit surprising
that a "array[2..]" notation didn't exist ("array[2:]" in Python). I later found
the $ operator in others' code, which mostly fixes that.

How about supporting, as in Python:

a[-1] == a[$];
a[0..] == a[0..$];

I personally dislike the $ operator, it seems a casual fix for a lack of better
notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.

The Python syntax seems to me a better generalization of the concepts. It
doesn't require an extra operator, the indexes and the slice notation have all
the expressiveness required.

With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could go away.
Is it too late for that?

Luís Marques
Apr 03 2006
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Lu=EDs Marques wrote:
 My experience of array slicing comes from Python, so I found it a bit s=
urprising
 that a "array[2..]" notation didn't exist ("array[2:]" in Python). I la=
ter found
 the $ operator in others' code, which mostly fixes that.
The omissive indices are not yet supported, but they could be added as an abbreviation without much trouble. It is just one character ("0" or "$" vs. "") difference. I don't have a strong opinion on that point.
 How about supporting, as in Python:
=20
 a[-1] =3D=3D a[$];
That should be: a[-1] =3D=3D a[$-1]
 a[0..] =3D=3D a[0..$];
=20
 I personally dislike the $ operator, it seems a casual fix for a lack o=
f better
 notation and someting you'd expect to find in Perl.
I actually quite like the notation. At first, I didn't like the idea of "wasting" the precious ASCII character "$" for this profane purpose, but compared to all other proposed solutions, this is the cleanest. Also, to me "$" does not seem too counterintuitive. The character has the long tradition of signifying the end of a line in regular expressions, which somehow feels related to the issue at hand, even though I have never seriously worked with Perl.
 I still prefer it to the more verbose "a[2 .. a.length]" notation, thou=
gh. That may be fine for "a", but it scales up very badly if you replace a by a more complex expression. The advantage of "$" will become even more obvious once there are is better support for multidimensional arrays (yes, I'm still working on a refined proposal for those) and the integer property "length" is replaces by an integer array "shape": consider adressing the interior (everything but the boundaries) of a 2D array: somearray[1..somearray.shape[0]-1,1..somearray.shape[1]-1] vs. somearray[1..$-1,1..$-1] Or consider indexing via periodic boundary conditions on a 3D grid: mygrid[x%mygrid.shape[0],y%mygrid.shape[1],z%mygrid.shape[2]] vs. mygrid[x%$,y%$,z%$]
 The verbose notation has the
 disadvantage of taking more whitespace; while most people read "a[2..3]=
" just
 fine it's a bit confusing to read "a[1..a.length]". Also, if you rename=
or use a
 different array you have two instances to correct, instead of one.
=20
 The Python syntax seems to me a better generalization of the concepts. =
It
 doesn't require an extra operator, the indexes and the slice notation h=
ave all
 the expressiveness required.
=20
 With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could=
go away. These would be a bad replacement: * omissive indexes are a nice abbreviation but have far less power than the "$" symbol * negative indices would have to be checked for at run time giving a performace penalty that is unacceptable if D tries to aim for a position in the league of high-performance computing languages.
Apr 03 2006
parent reply "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0rrgk$1gko$1 digitaldaemon.com>, Norbert Nemec says...
That may be fine for "a", but it scales up very badly if you replace a
by a more complex expression.
Notice that I did say I still prefer $ to the verbose .length, not that I prefer using .length I just find it ugly.
	somearray[1..$-1,1..$-1]
or somearray[1..-2,1..-2]
* omissive indexes are a nice abbreviation but have far less power than
the "$" symbol
Which cases are you thinking about?
* negative indices would have to be checked for at run time giving a
performace penalty that is unacceptable if D tries to aim for a position
in the league of high-performance computing languages.
Kind of like array out of bounds exceptions, which is already implemented? Thanks for you feedback, Luís
Apr 03 2006
next sibling parent "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0rupf$1kbg$1 digitaldaemon.com>, Luís Marques says...
or somearray[1..-2,1..-2]
somearray[1..-1,1..-1], of course. You are right about having missed the right indexes, but otherwise my opinion remain. I guess it's a matter of style, mostly. Luís
Apr 03 2006
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Obviously we both agree that "$" is nicer than nothing (i.e. better than
having to use ".length")

I also agree with you that Python syntax looks nicer than D syntax. However

a) Python syntax is not an option for D, because it would require
runtime checks that result in an unnecessary performace penalty. This
cannot be compared to array bounds checks, which can be turned off after
debugging phase. The performance penalty would have to be payed by
everyone using arrays, even if they are not even interested in negative
indexing

b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!

B.t.w.: Be careful!

D syntax: somearray[1..$-1,1..$-1]
is equivalent to
Python syntax: somearray[1..-1,1..-1]

you mixed this up in your examples bringing in a -2 somewhere.


Finally you asked:
* omissive indexes are a nice abbreviation but have far less power than
the "$" symbol
Which cases are you thinking about?
I call "omissive indices" only "[.." instead of "[0.." and "..]" instead of "..$]". Obviously these cover only the trivial corner cases and the syntax cannot naturally be extended to anything beyond these. Greetings, Norbert
Apr 03 2006
next sibling parent reply "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!
Yes, that one is very nice.
you mixed this up in your examples bringing in a -2 somewhere.
Thanks, I noticed it later.
I call "omissive indices" only "[.." instead of "[0.." and "..]" instead
of "..$]". Obviously these cover only the trivial corner cases and the
syntax cannot naturally be extended to anything beyond these.
How would you extend it with $? That's a good review. Things like this should be noted somewhere.. So, $ is better designed for multidimensional arrays $ is shorter than .length Still thinking, then, couldn't this work: somearray[x%length,y%length,z%length] length would evaluate differently for each dimension of the array. The only advantage would be $ being shorter (somearray[0..] would work for the simpler cases) Luís Marques
Apr 03 2006
next sibling parent Derek Parnell <derek psych.ward> writes:
On Mon, 3 Apr 2006 22:47:37 +0000 (UTC), Luís Marques wrote:

 $ is shorter than .length
One problem with 'length' is that is not a keyword in all contexts, and so an innocent coding mistake such as the one below can occur and is manifested at run time. And even then its not so easy to see if you're not focusing on 'length'. import std.stdio; void main() { int length; char[] foo; foo = "derek parnell".dup; length = 4; while( foo[length] != 'a' ) foo = foo[1..$]; writefln("%s", foo); } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 4/04/2006 11:40:09 AM
Apr 03 2006
prev sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Lu=EDs Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
=20
b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!
=20 =20 Yes, that one is very nice. =20 =20
you mixed this up in your examples bringing in a -2 somewhere.
=20 =20 Thanks, I noticed it later. =20 =20
I call "omissive indices" only "[.." instead of "[0.." and "..]" instea=
d
of "..$]". Obviously these cover only the trivial corner cases and the
syntax cannot naturally be extended to anything beyond these.
=20 =20 How would you extend it with $?
Sorry, I was not very clear: With "extending" I meant doing something like "..$-3]". What I wanted to say is: an omissive end-index "..]" would be a nice shorthand for "..$]", but omissive indices are not a real alternative to the discussed alternatives that allow arithmetics.
 Still thinking, then, couldn't this work:
=20
 somearray[x%length,y%length,z%length]
=20
 length would evaluate differently for each dimension of the array.
 The only advantage would be $ being shorter (somearray[0..] would work =
for the
 simpler cases)
Putting Dereks answer in different words: The parser is built upon the assumption that every "word" is either a "keyword" or an "identifier" at all times during the compilation process Making "length" a keyword would, in principle, be a solution, because one could in principle pack in any special meaning that is needed. However: "length" could then not be a property any more and could not be used as variable name or anything else. Leaving "length" an identifier, one would have to find a way to give it the special meaning it needs. In any case, this would be problematic, because it might always override a variable "length" that is defined in the surrounding namespace. The "$" solution is very much like the "keyword" solution from the compiler perspective, but it does not collide with any existing code.
Apr 03 2006
prev sibling parent reply "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
	somearray[x%$,y%$,z%$]
Before asking some more questions, how could you please give me a working example for this? Thanks, Luís Marques
Apr 04 2006
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Lu=EDs Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
=20
	somearray[x%$,y%$,z%$]
=20 =20 Before asking some more questions, how could you please give me a worki=
ng
 example for this?
No, because multidimensional arrays do not exist yet, but are still in the planning phase. I've been spending some hours working on reworking the proposal lately, but I have nothing new finished yet. The 2-years-old version of the proposal can still be found at: http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.h= tml (It is quite outdated in the details, but should at least give an idea of where it may be going.)
Apr 04 2006
parent Don Clugston <dac nospam.com.au> writes:
Norbert Nemec wrote:
 Luís Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...

 	somearray[x%$,y%$,z%$]
Before asking some more questions, how could you please give me a working example for this?
No, because multidimensional arrays do not exist yet, but are still in the planning phase. I've been spending some hours working on reworking the proposal lately, but I have nothing new finished yet.
I'm looking forward to seeing it. Could really be a killer feature.
 The 2-years-old version of the proposal can still be found at:
 	http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
 
 (It is quite outdated in the details, but should at least give an idea
 of where it may be going.)
Apr 05 2006
prev sibling parent reply Hong <Hong_member pathlink.com> writes:
In article <e0rmga$1bi6$1 digitaldaemon.com>, Luís Marques says...

notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.
You can do a[1..length]. length always refer to the array length when it is used inside the indexing operator. So there is only 1 place to correct when you change the array name. length is more verbose but definitely more readable.
Apr 03 2006
next sibling parent "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0s0m0$1mp9$1 digitaldaemon.com>, Hong says...
In article <e0rmga$1bi6$1 digitaldaemon.com>, Luís Marques says...

notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.
You can do a[1..length]. length always refer to the array length when it is used inside the indexing operator. So there is only 1 place to correct when you change the array name. length is more verbose but definitely more readable.
Oh yes, I was forgetting that variant. That's nice, thanks. Perhaps one more reason against having $, though I guess several will disagree. Luís
Apr 03 2006
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
The "length" variant will become problematic once we have
multidimensional arrays, where the integer number "length" is replaced
by an integer array "shape". "$" can be given a very special meaning by
the compiler without problems for the parser, allowing

	somearray[1..$-1,1..$-1,1..$-1]

to mean the same thing as

	somearray[1..shape[0]-1,1..shape[1]-1,1..shape[2]-1]

to get the same effect with using "length" instead of "$", the word
"length" would probably have to become a keyword, which would disallow
its use as property. (Unless you really want to mess up the parser...)

Greetings,
Norbert



Hong wrote:
 In article <e0rmga$1bi6$1 digitaldaemon.com>, Lu=E7=96=84 Marques says.=
=2E.
=20
=20
notation and someting you'd expect to find in Perl. I still prefer it t=
o the
more verbose "a[2 .. a.length]" notation, though. The verbose notation =
has the
disadvantage of taking more whitespace; while most people read "a[2..3]=
" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename=
or use a
different array you have two instances to correct, instead of one.
=20 =20 You can do a[1..length]. length always refer to the array length when i=
t is used
 inside the indexing operator. So there is only 1 place to correct when =
you
 change the array name. length is more verbose but definitely more reada=
ble.
=20
=20
Apr 03 2006
parent reply novice2 <novice2_member pathlink.com> writes:
please, can anybody explain me, wich "performance penalty" mentioned in this
thread? how it come?
Apr 04 2006
parent reply "Luís Marques" <luismarque+spam gmail.com> writes:
In article <e0tdh2$bsn$1 digitaldaemon.com>, novice2 says...
please, can anybody explain me, wich "performance penalty" mentioned in this
thread? how it come?
When array bounds checking is disabled, all the CPU has to do is something like: value = *(&array_contents + (index * element size)); The index is just added as an offset (times the element size, which can be really fast if it's a multiple of 2). When you add bounds checking or negative indexes, other operations must be performed. For bounds checking a conditional jump must be issued (conditional jumps can be expensive for performance -- on inner loops -- plus they add footprint). For negative indexes the corresponding positive offset must be calculated before accessing the array index. Under some circunstances the compiler could optimize that, when it would be possible to do the calculation at compilation or to assume the index was a unsigned integer. I suppose there's no general arithmetic algorithm for all cases (without conditional jumps). I guess that should be it, but others should know more about such matters. Luís Marques
Apr 04 2006
parent reply novice2 <novice2_member pathlink.com> writes:
Thanx, Luís Marques for reply, but...

When you add bounds checking or negative
indexes, other operations must be performed. For bounds checking a conditional
it is not depends on syntax or index sign. this check must be performed (if enabled), either positive or negative index. Even more: we all know, source code and compiler output is not the same. and source code with negative indexes or other index syntax will be compiled into the same cpu operations, if it means the same.
inner loops -- plus they add footprint). For negative indexes the corresponding
positive offset must be calculated before accessing the array index. Under some
this is not true. when we use other syntax for some operations the compiler output will be the same. even if we replace some several operations with one syntax construction compiler can generate more optimized code. i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.
Apr 04 2006
next sibling parent reply "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e0tvba$1aav$1 digitaldaemon.com>, novice2 says...
i just want to say, IMHO, all changes in index/slicing syntax will change
compiler only. not generated binary.
Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code... int index = get_int_from_user(); value = array[index]; could be translated into: value = *(&array_contents + (index * element size)); If we allow negative indexes then that doesn't work. "index" is computed at runtime so the code generated must change to allow that. At least that's the way I see it. About bounds checking, I didn't mean to imply it depends on the syntax. Luís Marques
Apr 04 2006
parent reply novice2 <novice2_member pathlink.com> writes:
i just want to say, IMHO, all changes in index/slicing syntax will change
compiler only. not generated binary.
Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code...
but programmer defines algo - sequence of some operations. we should compare equal code for currnet syntax and new syntax. equal code - with equal results. just imaginary example: task: get 3rd item from end in array; new synatx: b = a[-3]; old syntax: b = a[length-3]; only now we can compare, wich actions will be performed in runtime, and guess, will be it different or not.
Apr 05 2006
parent xs0 <xs0 xs0.com> writes:
novice2 wrote:
 i just want to say, IMHO, all changes in index/slicing syntax will change
 compiler only. not generated binary.
Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code...
but programmer defines algo - sequence of some operations. we should compare equal code for currnet syntax and new syntax. equal code - with equal results. just imaginary example: task: get 3rd item from end in array; new synatx: b = a[-3]; old syntax: b = a[length-3]; only now we can compare, wich actions will be performed in runtime, and guess, will be it different or not.
"old" syntax is actually b = a[$-3]; But that is not a good example, because the index is a constant. A better example would be: int idx=someFunc(); b=a[idx]; It currently compiles to something like int idx=someFunc(); debug { if (idx<0 || idx>=a.length) throw ...; } b=a[idx]; If you use -release, the debug {} section is eliminated and there is no cost in speed. On the other hand, with the new version, the result would be something like int idx=someFunc(); if (idx<0) { idx+=a.length; } debug { if (idx<0 || idx>=a.length) throw ...; } b=a[idx]; The problem is that in the general case, the compiler cannot determine whether idx will be negative or not, and cannot optimize away the first if() statement, meaning every array access will be notably slower, even if you don't use negative indices at all. Is that inefficiency worth the change? I don't think it is... xs0
Apr 05 2006
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
novice2 wrote:
 Thanx, Luís Marques for reply, but...
 
 
 When you add bounds checking or negative indexes, other operations
 must be performed. For bounds checking a conditional
it is not depends on syntax or index sign. this check must be performed (if enabled), either positive or negative index. Even more: we all know, source code and compiler output is not the same. and source code with negative indexes or other index syntax will be compiled into the same cpu operations, if it means the same.
 inner loops -- plus they add footprint). For negative indexes the
 corresponding positive offset must be calculated before accessing
 the array index. Under some
this is not true. when we use other syntax for some operations the compiler output will be the same. even if we replace some several operations with one syntax construction compiler can generate more optimized code. i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.
Well, we have two separate cases. First, if the array size is known at compile time, then all of this can be take care of without runtime penalty. Second, if we are talking about dynamic arrays, then it does make a difference whether we allow indexing from the end of the array.
Apr 04 2006
parent reply novice2 <novice2_member pathlink.com> writes:
 i just want to say, IMHO, all changes in index/slicing syntax will
 change compiler only. not generated binary.
penalty. Second, if we are talking about dynamic arrays, then it does make a difference whether we allow indexing from the end of the array.
but "indexing from the end". if we express it in current syntax, then will no difference. we should compare equivalent code. at least codes with equivalent results of it execution.
Apr 05 2006
parent "Luís Marques" <luismarques+spam gmail.com> writes:
In article <e106tf$1l5v$1 digitaldaemon.com>, novice2 says...
but "indexing from the end". if we express it in current syntax, then will no
difference. we should compare equivalent code. at least codes with equivalent
results of it execution.
See xs0's reponse.
Apr 05 2006