www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - OpSlice for Ndim arrays?

reply Norbert Nemec <Norbert_member pathlink.com> writes:
Hi there,

after two years of silence, I just found a little time to return my thoughts to
D and the issues of NDim-arrays I started back then. (No, I never completely
forgot about D, I just had other things that I had to give priority.)

My design proposal for NDim arrays needs a complete overhaul before discussion
of details makes sense. However, there is one cucial issue that needs to be
solved and I have no idea how it could be done:

Currently we have OpIndex and OpIndexAssign fit for overloading multidimensional
indexing. OpSlice and OpSliceAssign, however, work only for one dimension. I've
been thinking how to generalize these, but it turns out to be extremely tricky.
First though is: Why not simply extend OpSlice to take N pairs of boundaries?
But then, what about mixed expressions like 'a[1,4..7]' ?

Working with Python, I use expressions like that all the time, and I think it is
crucial that Ndim-arrays in D allow this kind of slicing as well. For native
Ndim arrays, it should be straightforward to handle such expressions in the
compiler. However, how should they be overloaded?!?

Python goes the way of taking a tuple of objects as indices where '4..7' would
then be translated into a "slice object". The indexing-overloading routine then
goes through all the objects at run-time and decides whether each one is a
slicing or an indexing operation. All of this is rather simple with dynamic
typing and accepting the run-time overhead.

In D, this is not an option. Indexing/Slicing has to be type-safe and the
resulting type has to be known at compile time.

One rather clumsy solution that I came up so far is to split the resolve the
expression into two consecutive function calls:
a[1,4..7]
would become
a.OpIndexPartial(0,1).OpSlice(4,7)
where the first function is defined as
OpIndexPartial(int dim,int idx)
and returns an array one rank lower.

The assignment expression:
a[1,4..7] = b[0..3]
would turn into
a.OpIndexPartial(0,1).OpSliceAssign(b[0..3],4,7)

The ugly thing about this solution is, that some kind of marshal-object is
needed which is returned from OpIndexAssign.


Another possibility would be to change
OpSlice(size_t start, size_t end)
into
OpIndex(size_t[2] slc)
and interpret a complex expression like
a[1,3..5,2,6..3]
as a call to
OpIndex(size_t idx0, size_t[2] slc1, size_t idx2, size_t[2] slc3)
I don't know, however, whether D templates would allow to implement the long
list of routines in some comfortable way.

All the really clean solution that I could think of would demand for tuple-types
and list-handling at compile time, both things that have been discussed before
but are at best a topic for the very far future.

Any ideas?

Norbert Nemec
Mar 10 2006
next sibling parent Norbert Nemec <Norbert_member pathlink.com> writes:
Sorry the duplicate posting... :-(
Mar 10 2006
prev sibling parent reply S. Chancellor <dnewsgr mephit.kicks-ass.org> writes:
It would be nice if the .. operator just returned a variable of a 
sequence type template.  That would make it more useful than the 
current limitations of being in a [] operator.  Lots of other language 
features could take advantage of sequence types.  Then:

" But then, what about mixed expressions like 'a[1,4..7]' ?" Wouldn't 
have alot of baring, it would just be a vararg function, one element 
would be an int, while another would be a sequence.

-S.


On 2006-03-10 07:34:31 -0800, Norbert Nemec <Norbert_member pathlink.com> said:

 Hi there,
 
 after two years of silence, I just found a little time to return my thoughts to
 D and the issues of NDim-arrays I started back then. (No, I never completely
 forgot about D, I just had other things that I had to give priority.)
 
 My design proposal for NDim arrays needs a complete overhaul before discussion
 of details makes sense. However, there is one cucial issue that needs to be
 solved and I have no idea how it could be done:
 
 Currently we have OpIndex and OpIndexAssign fit for overloading 
 multidimensional
 indexing. OpSlice and OpSliceAssign, however, work only for one dimension. I've
 been thinking how to generalize these, but it turns out to be extremely tricky.
 First though is: Why not simply extend OpSlice to take N pairs of boundaries?
 But then, what about mixed expressions like 'a[1,4..7]' ?
 
 Working with Python, I use expressions like that all the time, and I 
 think it is
 crucial that Ndim-arrays in D allow this kind of slicing as well. For native
 Ndim arrays, it should be straightforward to handle such expressions in the
 compiler. However, how should they be overloaded?!?
 
 Python goes the way of taking a tuple of objects as indices where '4..7' would
 then be translated into a "slice object". The indexing-overloading routine then
 goes through all the objects at run-time and decides whether each one is a
 slicing or an indexing operation. All of this is rather simple with dynamic
 typing and accepting the run-time overhead.
 
 In D, this is not an option. Indexing/Slicing has to be type-safe and the
 resulting type has to be known at compile time.
 
 One rather clumsy solution that I came up so far is to split the resolve the
 expression into two consecutive function calls:
 a[1,4..7]
 would become
 a.OpIndexPartial(0,1).OpSlice(4,7)
 where the first function is defined as
 OpIndexPartial(int dim,int idx)
 and returns an array one rank lower.
 
 The assignment expression:
 a[1,4..7] = b[0..3]
 would turn into
 a.OpIndexPartial(0,1).OpSliceAssign(b[0..3],4,7)
 
 The ugly thing about this solution is, that some kind of marshal-object is
 needed which is returned from OpIndexAssign.
 
 
 Another possibility would be to change
 OpSlice(size_t start, size_t end)
 into
 OpIndex(size_t[2] slc)
 and interpret a complex expression like
 a[1,3..5,2,6..3]
 as a call to
 OpIndex(size_t idx0, size_t[2] slc1, size_t idx2, size_t[2] slc3)
 I don't know, however, whether D templates would allow to implement the long
 list of routines in some comfortable way.
 
 All the really clean solution that I could think of would demand for 
 tuple-types
 and list-handling at compile time, both things that have been discussed before
 but are at best a topic for the very far future.
 
 Any ideas?
 
 Norbert Nemec
Mar 10 2006
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
S. Chancellor wrote:

 It would be nice if the .. operator just returned a variable of a
 sequence type template.  That would make it more useful than the
 current limitations of being in a [] operator.  Lots of other language
 features could take advantage of sequence types.  Then:
 
 " But then, what about mixed expressions like 'a[1,4..7]' ?" Wouldn't
 have alot of baring, it would just be a vararg function, one element
 would be an int, while another would be a sequence.
The problem with vararg functions is that the argument types are resolved at runtime. This is what Norbert calls "not an option". First, the only way to make this fast is to hope that the indexing function gets inlined, its loops unrolled and the type information constantly folded in. Multidimensional arrays are often used where speed matters, so indexing should be as fast as possible. But this is not the main problem. The resulting type of an indexing expression depends on whether dimensions are indexed or sliced. A three dimensional array a, indexed like a[0..1,0,0..5] should result in a two dimensional array (sliced dimensions remain, indexed dimensions are removed). The return type has to be computed at compile time. I was going to post this at this moment, but reading Norberts post again gave me an idea I had to test...:
 On 2006-03-10 07:34:31 -0800, Norbert Nemec <Norbert_member pathlink.com>
 said: 
 All the really clean solution that I could think of would demand for
 tuple-types and list-handling at compile time, both things that have been
 discussed before but are at best a topic for the very far future.
It turns out this is not at all far away. Compile time varadic function arguments are available with the IFTI support in DMD 0.149. That means today! (Allright, not unlimited number of varadic function arguments, but a reasonable number of arguments at least.) Here is a demo: --- import std.stdio; struct Empty {} template TupleType(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*,...*/) { static if (is(A == Empty)) alias Empty TupleType; else alias List!(A,.TupleType!(B,C,D,E,F,G/*,...*/)) TupleType; } template Tuple(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*,...*/) { TupleType!(A,B,C,D,E,F,G /*,...*/) Tuple(A a = A.init, B b = B.init, C c = C.init, D d = D.init, E e = E.in\ it, F f = F.init, G g = G.init /*,...*/) { static if (is(A == Empty)) return Empty.init; else return List!(A,TupleType!(B,C,D,E,F,G/*,...*/) (a,.Tuple(b,c,d,e,f,g /*,...*/)); } } struct List(A,B) { A head; B tail; static List opCall(A a, B b) { List!(A,B) ret; ret.head = a; ret.tail = b; return ret; } } template func(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*, ...*/) { void func(A a = A.init, B b = B.init, C c = C.init, D d = D.init, E e = E.init, F f = F.init, G g = G.ini\ t /*,...*/) { realFunc(Tuple(a,b,c,d,e,f,g /*,...*/)); } } template realFunc(T) { void realFunc(T args) { static if (is(T == Empty)) { writefln("Done!"); } else { writefln("Got argument: (%s) %s",typeid(typeof(args.head)),args.head); .realFunc(args.tail); } } } void main() { int a = 1; float b = 5.6; long c = 3; char[] d = "test"; func(a,b,c,d); } --- And guess what? This prints: Got argument: (int) 1 Got argument: (float) 5.6 Got argument: (long) 3 Got argument: (char[]) test Done! All types are deduced at compile time! Notice the neat implementation of realFunc(). All other templates could be put in a library. I bet the func()-wrapper could declared by a mixin! So all we need to support multidimensional splice/indexing is to make .. evaluate into: struct Sequence { size_t start,end; } and we are set. /Oskar
Mar 11 2006
next sibling parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Oskar Linde wrote:

...

 
 void main() {
   int a = 1;
   float b = 5.6;
   long c = 3;
   char[] d = "test";
   func(a,b,c,d);
 }
 
 ---
 
 And guess what? This prints:
 
 Got argument: (int) 1
 Got argument: (float) 5.6
 Got argument: (long) 3
 Got argument: (char[]) test
 Done!
 
 All types are deduced at compile time!
 
 Notice the neat implementation of realFunc(). All other templates could be
 put in a library. I bet the func()-wrapper could declared by a mixin!
 
 So all we need to support multidimensional splice/indexing is to make ..
 evaluate into:
 
 struct Sequence {
         size_t start,end;
 }
 
 and we are set.
 
I kind of like the idea, but not sure if it is maybe to late for something like that, and not sure if it will cause any problems.
Mar 11 2006
parent Oskar Linde <olREM OVEnada.kth.se> writes:
Ivan Senji wrote:

 Oskar Linde wrote:
 
 ...
 
 
 void main() {
   int a = 1;
   float b = 5.6;
   long c = 3;
   char[] d = "test";
   func(a,b,c,d);
 }
 
 ---
 
 And guess what? This prints:
 
 Got argument: (int) 1
 Got argument: (float) 5.6
 Got argument: (long) 3
 Got argument: (char[]) test
 Done!
 
 All types are deduced at compile time!
 
 Notice the neat implementation of realFunc(). All other templates could
 be put in a library. I bet the func()-wrapper could declared by a mixin!
 
 So all we need to support multidimensional splice/indexing is to make ..
 evaluate into:
 
 struct Sequence {
         size_t start,end;
 }
 
 and we are set.
 
I kind of like the idea, but not sure if it is maybe to late for something like that, and not sure if it will cause any problems.
There is fortunately an obvious fallback: If the class has no opIndex(Sequence) overload, a depreciable opSplice(s.start,s.end) gets called. /Oskar
Mar 11 2006
prev sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Great! This might really be the way to go. Thanks a lot for the idea.



Oskar Linde wrote:
 S. Chancellor wrote:
 
 
It would be nice if the .. operator just returned a variable of a
sequence type template.  That would make it more useful than the
current limitations of being in a [] operator.  Lots of other language
features could take advantage of sequence types.  Then:

" But then, what about mixed expressions like 'a[1,4..7]' ?" Wouldn't
have alot of baring, it would just be a vararg function, one element
would be an int, while another would be a sequence.
The problem with vararg functions is that the argument types are resolved at runtime. This is what Norbert calls "not an option". First, the only way to make this fast is to hope that the indexing function gets inlined, its loops unrolled and the type information constantly folded in. Multidimensional arrays are often used where speed matters, so indexing should be as fast as possible. But this is not the main problem. The resulting type of an indexing expression depends on whether dimensions are indexed or sliced. A three dimensional array a, indexed like a[0..1,0,0..5] should result in a two dimensional array (sliced dimensions remain, indexed dimensions are removed). The return type has to be computed at compile time. I was going to post this at this moment, but reading Norberts post again gave me an idea I had to test...:
On 2006-03-10 07:34:31 -0800, Norbert Nemec <Norbert_member pathlink.com>
said: 

All the really clean solution that I could think of would demand for
tuple-types and list-handling at compile time, both things that have been
discussed before but are at best a topic for the very far future.
It turns out this is not at all far away. Compile time varadic function arguments are available with the IFTI support in DMD 0.149. That means today! (Allright, not unlimited number of varadic function arguments, but a reasonable number of arguments at least.) Here is a demo: --- import std.stdio; struct Empty {} template TupleType(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*,...*/) { static if (is(A == Empty)) alias Empty TupleType; else alias List!(A,.TupleType!(B,C,D,E,F,G/*,...*/)) TupleType; } template Tuple(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*,...*/) { TupleType!(A,B,C,D,E,F,G /*,...*/) Tuple(A a = A.init, B b = B.init, C c = C.init, D d = D.init, E e = E.in\ it, F f = F.init, G g = G.init /*,...*/) { static if (is(A == Empty)) return Empty.init; else return List!(A,TupleType!(B,C,D,E,F,G/*,...*/) (a,.Tuple(b,c,d,e,f,g /*,...*/)); } } struct List(A,B) { A head; B tail; static List opCall(A a, B b) { List!(A,B) ret; ret.head = a; ret.tail = b; return ret; } } template func(A = Empty, B = Empty, C = Empty, D = Empty, E = Empty, F = Empty, G = Empty /*, ...*/) { void func(A a = A.init, B b = B.init, C c = C.init, D d = D.init, E e = E.init, F f = F.init, G g = G.ini\ t /*,...*/) { realFunc(Tuple(a,b,c,d,e,f,g /*,...*/)); } } template realFunc(T) { void realFunc(T args) { static if (is(T == Empty)) { writefln("Done!"); } else { writefln("Got argument: (%s) %s",typeid(typeof(args.head)),args.head); .realFunc(args.tail); } } } void main() { int a = 1; float b = 5.6; long c = 3; char[] d = "test"; func(a,b,c,d); } --- And guess what? This prints: Got argument: (int) 1 Got argument: (float) 5.6 Got argument: (long) 3 Got argument: (char[]) test Done! All types are deduced at compile time! Notice the neat implementation of realFunc(). All other templates could be put in a library. I bet the func()-wrapper could declared by a mixin! So all we need to support multidimensional splice/indexing is to make .. evaluate into: struct Sequence { size_t start,end; } and we are set. /Oskar
Mar 12 2006