digitalmars.D - Array indexing and slicing
- Luís Marques (22/22) Apr 03 2006 Hello,
- Norbert Nemec (43/64) Apr 03 2006 ter found
- Luís Marques (9/17) Apr 03 2006 Notice that I did say I still prefer $ to the verbose .length, not that ...
- Luís Marques (5/6) Apr 03 2006 somearray[1..-1,1..-1], of course. You are right about having missed the...
- Norbert Nemec (25/29) Apr 03 2006 Obviously we both agree that "$" is nicer than nothing (i.e. better than
- Luís Marques (14/23) Apr 03 2006 Thanks, I noticed it later.
- Derek Parnell (22/23) Apr 03 2006 One problem with 'length' is that is not a keyword in all contexts, and ...
- Norbert Nemec (20/51) Apr 03 2006 Sorry, I was not very clear: With "extending" I meant doing something
- Luís Marques (5/6) Apr 04 2006 Before asking some more questions, how could you please give me a workin...
- Norbert Nemec (11/18) Apr 04 2006 No, because multidimensional arrays do not exist yet, but are still in
- Don Clugston (2/20) Apr 05 2006
- Hong (4/9) Apr 03 2006 You can do a[1..length]. length always refer to the array length when it...
- Luís Marques (4/14) Apr 03 2006 Oh yes, I was forgetting that variant. That's nice, thanks.
- Norbert Nemec (21/37) Apr 03 2006 The "length" variant will become problematic once we have
- novice2 (2/2) Apr 04 2006 please, can anybody explain me, wich "performance penalty" mentioned in ...
- Luís Marques (15/17) Apr 04 2006 When array bounds checking is disabled, all the CPU has to do is somethi...
- novice2 (11/15) Apr 04 2006 it is not depends on syntax or index sign. this check must be performed ...
- Luís Marques (13/15) Apr 04 2006 Actually I don't think that can be true. That would only be the case if ...
- novice2 (9/14) Apr 05 2006 but programmer defines algo - sequence of some operations.
- xs0 (32/50) Apr 05 2006 "old" syntax is actually
- Georg Wrede (5/31) Apr 04 2006 Well, we have two separate cases. First, if the array size is known at
- novice2 (3/7) Apr 05 2006 but "indexing from the end". if we express it in current syntax, then wi...
- Luís Marques (2/5) Apr 05 2006 See xs0's reponse.
Hello, In the array section of the D docs (http://www.digitalmars.com/d/arrays.html), there is no mention of the $ operator. Only the "a = a[2 .. a.length]" idiom is shown. I suppose that should be fixed. My experience of array slicing comes from Python, so I found it a bit surprising that a "array[2..]" notation didn't exist ("array[2:]" in Python). I later found the $ operator in others' code, which mostly fixes that. How about supporting, as in Python: a[-1] == a[$]; a[0..] == a[0..$]; I personally dislike the $ operator, it seems a casual fix for a lack of better notation and someting you'd expect to find in Perl. I still prefer it to the more verbose "a[2 .. a.length]" notation, though. The verbose notation has the disadvantage of taking more whitespace; while most people read "a[2..3]" just fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a different array you have two instances to correct, instead of one. The Python syntax seems to me a better generalization of the concepts. It doesn't require an extra operator, the indexes and the slice notation have all the expressiveness required. With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could go away. Is it too late for that? Luís Marques
Apr 03 2006
Lu=EDs Marques wrote:My experience of array slicing comes from Python, so I found it a bit s=urprisingthat a "array[2..]" notation didn't exist ("array[2:]" in Python). I la=ter foundthe $ operator in others' code, which mostly fixes that.The omissive indices are not yet supported, but they could be added as an abbreviation without much trouble. It is just one character ("0" or "$" vs. "") difference. I don't have a strong opinion on that point.How about supporting, as in Python: =20 a[-1] =3D=3D a[$];That should be: a[-1] =3D=3D a[$-1]a[0..] =3D=3D a[0..$]; =20 I personally dislike the $ operator, it seems a casual fix for a lack o=f betternotation and someting you'd expect to find in Perl.I actually quite like the notation. At first, I didn't like the idea of "wasting" the precious ASCII character "$" for this profane purpose, but compared to all other proposed solutions, this is the cleanest. Also, to me "$" does not seem too counterintuitive. The character has the long tradition of signifying the end of a line in regular expressions, which somehow feels related to the issue at hand, even though I have never seriously worked with Perl.I still prefer it to the more verbose "a[2 .. a.length]" notation, thou=gh. That may be fine for "a", but it scales up very badly if you replace a by a more complex expression. The advantage of "$" will become even more obvious once there are is better support for multidimensional arrays (yes, I'm still working on a refined proposal for those) and the integer property "length" is replaces by an integer array "shape": consider adressing the interior (everything but the boundaries) of a 2D array: somearray[1..somearray.shape[0]-1,1..somearray.shape[1]-1] vs. somearray[1..$-1,1..$-1] Or consider indexing via periodic boundary conditions on a 3D grid: mygrid[x%mygrid.shape[0],y%mygrid.shape[1],z%mygrid.shape[2]] vs. mygrid[x%$,y%$,z%$]The verbose notation has the disadvantage of taking more whitespace; while most people read "a[2..3]=" justfine it's a bit confusing to read "a[1..a.length]". Also, if you rename=or use adifferent array you have two instances to correct, instead of one. =20 The Python syntax seems to me a better generalization of the concepts. =Itdoesn't require an extra operator, the indexes and the slice notation h=ave allthe expressiveness required. =20 With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could=go away. These would be a bad replacement: * omissive indexes are a nice abbreviation but have far less power than the "$" symbol * negative indices would have to be checked for at run time giving a performace penalty that is unacceptable if D tries to aim for a position in the league of high-performance computing languages.
Apr 03 2006
In article <e0rrgk$1gko$1 digitaldaemon.com>, Norbert Nemec says...That may be fine for "a", but it scales up very badly if you replace a by a more complex expression.Notice that I did say I still prefer $ to the verbose .length, not that I prefer using .length I just find it ugly.somearray[1..$-1,1..$-1]or somearray[1..-2,1..-2]* omissive indexes are a nice abbreviation but have far less power than the "$" symbolWhich cases are you thinking about?* negative indices would have to be checked for at run time giving a performace penalty that is unacceptable if D tries to aim for a position in the league of high-performance computing languages.Kind of like array out of bounds exceptions, which is already implemented? Thanks for you feedback, Luís
Apr 03 2006
In article <e0rupf$1kbg$1 digitaldaemon.com>, Luís Marques says...or somearray[1..-2,1..-2]somearray[1..-1,1..-1], of course. You are right about having missed the right indexes, but otherwise my opinion remain. I guess it's a matter of style, mostly. Luís
Apr 03 2006
Obviously we both agree that "$" is nicer than nothing (i.e. better than having to use ".length") I also agree with you that Python syntax looks nicer than D syntax. However a) Python syntax is not an option for D, because it would require runtime checks that result in an unnecessary performace penalty. This cannot be compared to array bounds checks, which can be turned off after debugging phase. The performance penalty would have to be payed by everyone using arrays, even if they are not even interested in negative indexing b) The $ is even more powerful than Python-syntax in at least one very interesting way: somearray[x%$,y%$,z%$] is the most beautiful solution I ever saw for the common problem of periodic boundaries! B.t.w.: Be careful! D syntax: somearray[1..$-1,1..$-1] is equivalent to Python syntax: somearray[1..-1,1..-1] you mixed this up in your examples bringing in a -2 somewhere. Finally you asked:I call "omissive indices" only "[.." instead of "[0.." and "..]" instead of "..$]". Obviously these cover only the trivial corner cases and the syntax cannot naturally be extended to anything beyond these. Greetings, Norbert* omissive indexes are a nice abbreviation but have far less power than the "$" symbolWhich cases are you thinking about?
Apr 03 2006
In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...b) The $ is even more powerful than Python-syntax in at least one very interesting way: somearray[x%$,y%$,z%$] is the most beautiful solution I ever saw for the common problem of periodic boundaries!Yes, that one is very nice.you mixed this up in your examples bringing in a -2 somewhere.Thanks, I noticed it later.I call "omissive indices" only "[.." instead of "[0.." and "..]" instead of "..$]". Obviously these cover only the trivial corner cases and the syntax cannot naturally be extended to anything beyond these.How would you extend it with $? That's a good review. Things like this should be noted somewhere.. So, $ is better designed for multidimensional arrays $ is shorter than .length Still thinking, then, couldn't this work: somearray[x%length,y%length,z%length] length would evaluate differently for each dimension of the array. The only advantage would be $ being shorter (somearray[0..] would work for the simpler cases) Luís Marques
Apr 03 2006
On Mon, 3 Apr 2006 22:47:37 +0000 (UTC), Luís Marques wrote:$ is shorter than .lengthOne problem with 'length' is that is not a keyword in all contexts, and so an innocent coding mistake such as the one below can occur and is manifested at run time. And even then its not so easy to see if you're not focusing on 'length'. import std.stdio; void main() { int length; char[] foo; foo = "derek parnell".dup; length = 4; while( foo[length] != 'a' ) foo = foo[1..$]; writefln("%s", foo); } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 4/04/2006 11:40:09 AM
Apr 03 2006
Lu=EDs Marques wrote:In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says... =20db) The $ is even more powerful than Python-syntax in at least one very interesting way: somearray[x%$,y%$,z%$] is the most beautiful solution I ever saw for the common problem of periodic boundaries!=20 =20 Yes, that one is very nice. =20 =20you mixed this up in your examples bringing in a -2 somewhere.=20 =20 Thanks, I noticed it later. =20 =20I call "omissive indices" only "[.." instead of "[0.." and "..]" instea=Sorry, I was not very clear: With "extending" I meant doing something like "..$-3]". What I wanted to say is: an omissive end-index "..]" would be a nice shorthand for "..$]", but omissive indices are not a real alternative to the discussed alternatives that allow arithmetics.of "..$]". Obviously these cover only the trivial corner cases and the syntax cannot naturally be extended to anything beyond these.=20 =20 How would you extend it with $?Still thinking, then, couldn't this work: =20 somearray[x%length,y%length,z%length] =20 length would evaluate differently for each dimension of the array. The only advantage would be $ being shorter (somearray[0..] would work =for thesimpler cases)Putting Dereks answer in different words: The parser is built upon the assumption that every "word" is either a "keyword" or an "identifier" at all times during the compilation process Making "length" a keyword would, in principle, be a solution, because one could in principle pack in any special meaning that is needed. However: "length" could then not be a property any more and could not be used as variable name or anything else. Leaving "length" an identifier, one would have to find a way to give it the special meaning it needs. In any case, this would be problematic, because it might always override a variable "length" that is defined in the surrounding namespace. The "$" solution is very much like the "keyword" solution from the compiler perspective, but it does not collide with any existing code.
Apr 03 2006
In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...somearray[x%$,y%$,z%$]Before asking some more questions, how could you please give me a working example for this? Thanks, Luís Marques
Apr 04 2006
Lu=EDs Marques wrote:In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says... =20ngsomearray[x%$,y%$,z%$]=20 =20 Before asking some more questions, how could you please give me a worki=example for this?No, because multidimensional arrays do not exist yet, but are still in the planning phase. I've been spending some hours working on reworking the proposal lately, but I have nothing new finished yet. The 2-years-old version of the proposal can still be found at: http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.h= tml (It is quite outdated in the details, but should at least give an idea of where it may be going.)
Apr 04 2006
Norbert Nemec wrote:Luís Marques wrote:I'm looking forward to seeing it. Could really be a killer feature.In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...No, because multidimensional arrays do not exist yet, but are still in the planning phase. I've been spending some hours working on reworking the proposal lately, but I have nothing new finished yet.somearray[x%$,y%$,z%$]Before asking some more questions, how could you please give me a working example for this?The 2-years-old version of the proposal can still be found at: http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html (It is quite outdated in the details, but should at least give an idea of where it may be going.)
Apr 05 2006
In article <e0rmga$1bi6$1 digitaldaemon.com>, Luís Marques says...notation and someting you'd expect to find in Perl. I still prefer it to the more verbose "a[2 .. a.length]" notation, though. The verbose notation has the disadvantage of taking more whitespace; while most people read "a[2..3]" just fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a different array you have two instances to correct, instead of one.You can do a[1..length]. length always refer to the array length when it is used inside the indexing operator. So there is only 1 place to correct when you change the array name. length is more verbose but definitely more readable.
Apr 03 2006
In article <e0s0m0$1mp9$1 digitaldaemon.com>, Hong says...In article <e0rmga$1bi6$1 digitaldaemon.com>, Luís Marques says...Oh yes, I was forgetting that variant. That's nice, thanks. Perhaps one more reason against having $, though I guess several will disagree. Luísnotation and someting you'd expect to find in Perl. I still prefer it to the more verbose "a[2 .. a.length]" notation, though. The verbose notation has the disadvantage of taking more whitespace; while most people read "a[2..3]" just fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a different array you have two instances to correct, instead of one.You can do a[1..length]. length always refer to the array length when it is used inside the indexing operator. So there is only 1 place to correct when you change the array name. length is more verbose but definitely more readable.
Apr 03 2006
The "length" variant will become problematic once we have multidimensional arrays, where the integer number "length" is replaced by an integer array "shape". "$" can be given a very special meaning by the compiler without problems for the parser, allowing somearray[1..$-1,1..$-1,1..$-1] to mean the same thing as somearray[1..shape[0]-1,1..shape[1]-1,1..shape[2]-1] to get the same effect with using "length" instead of "$", the word "length" would probably have to become a keyword, which would disallow its use as property. (Unless you really want to mess up the parser...) Greetings, Norbert Hong wrote:In article <e0rmga$1bi6$1 digitaldaemon.com>, Lu=E7=96=84 Marques says.==2E.=20 =20o thenotation and someting you'd expect to find in Perl. I still prefer it t=has themore verbose "a[2 .. a.length]" notation, though. The verbose notation =" justdisadvantage of taking more whitespace; while most people read "a[2..3]=or use afine it's a bit confusing to read "a[1..a.length]". Also, if you rename=t is useddifferent array you have two instances to correct, instead of one.=20 =20 You can do a[1..length]. length always refer to the array length when i=inside the indexing operator. So there is only 1 place to correct when =youchange the array name. length is more verbose but definitely more reada=ble.=20 =20
Apr 03 2006
please, can anybody explain me, wich "performance penalty" mentioned in this thread? how it come?
Apr 04 2006
In article <e0tdh2$bsn$1 digitaldaemon.com>, novice2 says...please, can anybody explain me, wich "performance penalty" mentioned in this thread? how it come?When array bounds checking is disabled, all the CPU has to do is something like: value = *(&array_contents + (index * element size)); The index is just added as an offset (times the element size, which can be really fast if it's a multiple of 2). When you add bounds checking or negative indexes, other operations must be performed. For bounds checking a conditional jump must be issued (conditional jumps can be expensive for performance -- on inner loops -- plus they add footprint). For negative indexes the corresponding positive offset must be calculated before accessing the array index. Under some circunstances the compiler could optimize that, when it would be possible to do the calculation at compilation or to assume the index was a unsigned integer. I suppose there's no general arithmetic algorithm for all cases (without conditional jumps). I guess that should be it, but others should know more about such matters. Luís Marques
Apr 04 2006
Thanx, Luís Marques for reply, but...When you add bounds checking or negative indexes, other operations must be performed. For bounds checking a conditionalit is not depends on syntax or index sign. this check must be performed (if enabled), either positive or negative index. Even more: we all know, source code and compiler output is not the same. and source code with negative indexes or other index syntax will be compiled into the same cpu operations, if it means the same.inner loops -- plus they add footprint). For negative indexes the corresponding positive offset must be calculated before accessing the array index. Under somethis is not true. when we use other syntax for some operations the compiler output will be the same. even if we replace some several operations with one syntax construction compiler can generate more optimized code. i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.
Apr 04 2006
In article <e0tvba$1aav$1 digitaldaemon.com>, novice2 says...i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code... int index = get_int_from_user(); value = array[index]; could be translated into: value = *(&array_contents + (index * element size)); If we allow negative indexes then that doesn't work. "index" is computed at runtime so the code generated must change to allow that. At least that's the way I see it. About bounds checking, I didn't mean to imply it depends on the syntax. Luís Marques
Apr 04 2006
but programmer defines algo - sequence of some operations. we should compare equal code for currnet syntax and new syntax. equal code - with equal results. just imaginary example: task: get 3rd item from end in array; new synatx: b = a[-3]; old syntax: b = a[length-3]; only now we can compare, wich actions will be performed in runtime, and guess, will be it different or not.i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code...
Apr 05 2006
novice2 wrote:"old" syntax is actually b = a[$-3]; But that is not a good example, because the index is a constant. A better example would be: int idx=someFunc(); b=a[idx]; It currently compiles to something like int idx=someFunc(); debug { if (idx<0 || idx>=a.length) throw ...; } b=a[idx]; If you use -release, the debug {} section is eliminated and there is no cost in speed. On the other hand, with the new version, the result would be something like int idx=someFunc(); if (idx<0) { idx+=a.length; } debug { if (idx<0 || idx>=a.length) throw ...; } b=a[idx]; The problem is that in the general case, the compiler cannot determine whether idx will be negative or not, and cannot optimize away the first if() statement, meaning every array access will be notably slower, even if you don't use negative indices at all. Is that inefficiency worth the change? I don't think it is... xs0but programmer defines algo - sequence of some operations. we should compare equal code for currnet syntax and new syntax. equal code - with equal results. just imaginary example: task: get 3rd item from end in array; new synatx: b = a[-3]; old syntax: b = a[length-3]; only now we can compare, wich actions will be performed in runtime, and guess, will be it different or not.i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.Actually I don't think that can be true. That would only be the case if the indexes were defined at compilation time. Assuming positive indexes only, the following code...
Apr 05 2006
novice2 wrote:Thanx, Luís Marques for reply, but...Well, we have two separate cases. First, if the array size is known at compile time, then all of this can be take care of without runtime penalty. Second, if we are talking about dynamic arrays, then it does make a difference whether we allow indexing from the end of the array.When you add bounds checking or negative indexes, other operations must be performed. For bounds checking a conditionalit is not depends on syntax or index sign. this check must be performed (if enabled), either positive or negative index. Even more: we all know, source code and compiler output is not the same. and source code with negative indexes or other index syntax will be compiled into the same cpu operations, if it means the same.inner loops -- plus they add footprint). For negative indexes the corresponding positive offset must be calculated before accessing the array index. Under somethis is not true. when we use other syntax for some operations the compiler output will be the same. even if we replace some several operations with one syntax construction compiler can generate more optimized code. i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.
Apr 04 2006
but "indexing from the end". if we express it in current syntax, then will no difference. we should compare equivalent code. at least codes with equivalent results of it execution.i just want to say, IMHO, all changes in index/slicing syntax will change compiler only. not generated binary.penalty. Second, if we are talking about dynamic arrays, then it does make a difference whether we allow indexing from the end of the array.
Apr 05 2006
In article <e106tf$1l5v$1 digitaldaemon.com>, novice2 says...but "indexing from the end". if we express it in current syntax, then will no difference. we should compare equivalent code. at least codes with equivalent results of it execution.See xs0's reponse.
Apr 05 2006