digitalmars.D - Iterating over multiple collections in parallel
- Koroskin Denis (25/25) Jul 03 2008 If found myself solving the same problem again and again:
- Jarrett Billingsley (12/37) Jul 03 2008 It's something Walter mentioned before, but without any kind of public
- BCS (5/35) Jul 03 2008 Can a singe thread have more than one stack? If someone can figure out h...
- Jarrett Billingsley (5/9) Jul 03 2008 Of course, they're called stackthreads or Fibers. Downs has implemented...
- downs (23/36) Jul 04 2008 They're pretty stable now, but they depend on a Phobos patch to run acce...
- bearophile (4/11) Jul 04 2008 Nice :-)
- Fawzi Mohamed (43/43) Jul 05 2008 Iterating over many collection in parallel is very nice to have, what
- bearophile (7/17) Jul 05 2008 I may have lost you a bit here, anyway in real Python that code is:
- Steven Schveighoffer (47/72) Jul 03 2008 The issue here is how to formulate each iteration. foreach works becaus...
- Walter Bright (8/11) Jul 05 2008 It's a well-known problem, and not so easy to solve in an intuitive,
- Manfred_Nowak (16/20) Jul 05 2008 What's wrong with
- Yigal Chripun (7/39) Jul 05 2008 what about:
- Manfred_Nowak (8/9) Jul 05 2008 - Maybe some knowledge of some types of disagreeing and their relation
- Yigal Chripun (14/27) Jul 06 2008 What's so strong about the word "should" ?!
-
Koroskin Denis
(27/47)
Jul 05 2008
On Sun, 06 Jul 2008 01:45:08 +0400, Manfred_Nowak
... - Manfred_Nowak (11/13) Jul 05 2008 Still
If found myself solving the same problem again and again: how to implement simultaneous iteration over two (or more collections)? suppose, we have the arrays: int[] a = [1, 2, 3]; int[] b = [1, 4, 9]; and would like to multiply them per-component, like this: int[] c = new int[a.length]; for (int i = 0; i < a.length; ++j) { c[i] = a[i] * b[i]; } Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this: int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; } But this isn't supported (yet?) More complicated example would be an iterating over two (or more) collection with *no* random access iterators available, opApply only. I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more. How about an enhancement proposal? :)
Jul 03 2008
"Koroskin Denis" <2korden gmail.com> wrote in message news:op.udp4pcjxenyajd proton.creatstudio.intranet...If found myself solving the same problem again and again: how to implement simultaneous iteration over two (or more collections)? suppose, we have the arrays: int[] a = [1, 2, 3]; int[] b = [1, 4, 9]; and would like to multiply them per-component, like this: int[] c = new int[a.length]; for (int i = 0; i < a.length; ++j) { c[i] = a[i] * b[i]; } Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this: int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; } But this isn't supported (yet?) More complicated example would be an iterating over two (or more) collection with *no* random access iterators available, opApply only. I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more. How about an enhancement proposal? :)It's something Walter mentioned before, but without any kind of public "plan" of these ideas he has, there's no way to know if it's even on his plate. I think the syntax he used was something like this: foreach(a, b; c)(d, e; f) { blah } In any case, it's difficult in the general case to do parallel iteration with D's inside-out iterators. To do it generally, you'd have to use coroutines. But you can currently write a templated zip to iterate over arrays simultaneously, only because they're so easy to iterate over. I'm sure downs will post something horrid-looking that you can look over.
Jul 03 2008
Reply to Koroskin,If found myself solving the same problem again and again: how to implement simultaneous iteration over two (or more collections)? suppose, we have the arrays: int[] a = [1, 2, 3]; int[] b = [1, 4, 9]; and would like to multiply them per-component, like this: int[] c = new int[a.length]; for (int i = 0; i < a.length; ++j) { c[i] = a[i] * b[i]; } Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this: int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; } But this isn't supported (yet?) More complicated example would be an iterating over two (or more) collection with *no* random access iterators available, opApply only. I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more. How about an enhancement proposal? :)Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib. call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.
Jul 03 2008
"BCS" <ao pathlink.com> wrote in message news:55391cb32ecb08caab0811331728 news.digitalmars.com...Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib. call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.Of course, they're called stackthreads or Fibers. Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.
Jul 03 2008
Jarrett Billingsley wrote:"BCS" <ao pathlink.com> wrote in message news:55391cb32ecb08caab0811331728 news.digitalmars.com...They're pretty stable now, but they depend on a Phobos patch to run acceptably fast (500 cycles per context switch on my Pentium M). And I don't know if they run on DMD; there was some problem with the GDC assembler statements. If somebody finds a problem with it, tell me in a reply and I'll try to fix it. Anyway, here's parallel iteration over two foreachables: gentoo-pc ~/d $ cat test33.d && echo "----" && rebuild test33.d && ./test33 module test33; import std.stdio, tools.stackthreads; void main() { auto it1 = [1, 2, 3]; auto it2 = [1, 4, 9]; auto iter2 = generator((void delegate(int) yield) { foreach (entry; it2) yield(entry); }); foreach (entry1; it1) { auto entry2 = iter2(); writefln(entry1, " - ", entry2); } } ---- 1 - 1 2 - 4 3 - 9 gentoo-pc ~/d $Can a singe thread have more than one stack? If someone can figure out how to do that, then this wouldn't be "to hard" to do in a lib. call dg1 on first stack, body jumps to stack 2 and calls dg 2, ..., last dg is called on the threads normal stack and calls the real body.Of course, they're called stackthreads or Fibers. Downs has implemented them in his tools package (in an unstable state) for Phobos, and they've been in Tango for more than two years.
Jul 04 2008
downs:auto iter2 = generator((void delegate(int) yield) { foreach (entry; it2) yield(entry); }); foreach (entry1; it1) { auto entry2 = iter2(); writefln(entry1, " - ", entry2);Nice :-) Bye, bearophile
Jul 04 2008
Iterating over many collection in parallel is very nice to have, what would be nice to have with a cleaner syntax for it. Aldor had it, and since then I always missed it. It worked like this: Iterator/loops can be declared like this: for a in [1,2,3] for b in [1,0,7,3] for i in 1.. while (b<7) repeat { ... } which mean follow *all* iterators in parallel, and execute ..., when any iterator stop, stop everything very clear, nice and flexible. Other syntaxes look like a kludge with respect to it. Python for example has generators/iterators, and builds derived iterators one on the top of the other for a in [1,2,3] for i in 0.. repeat { ... } becomes for i,a in enumerate([1,2,3].iterator()): ... C++ iterators can be used in parallel, because all the difficulty of keeping the current position is moved to the iterator writer, but the syntax is just plain ugly and writing efficient iterators for complex structures can be painful. Anyway leaving away that syntax, in D without changes one can already have a reasonable syntax (even if not as nice). Using fibers, as in the nice example of downs it should also be possible to convert automatically an opApply in a generator, so technically it should be feasible, even if 500 cycles per conetext switch, might be too much for some applications (tight loops). So in D one could have a something workable foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){ ... } where collectLoop is a variadic template function that converts opApply to a generator and returns an object having a opApply that loops in parallel. Implementation of it is left as exercice to the reader ;-) Fawzi
Jul 05 2008
Fawzi Mohamed:for i,a in enumerate([1,2,3].iterator()):I may have lost you a bit here, anyway in real Python that code is: for idx, el in enumerate([1, 2, 3]): print idx, elUsing fibers, as in the nice example of downs it should also be possible to convert automatically an opApply in a generator, so technically it should be feasible, even if 500 cycles per conetext switch, might be too much for some applications (tight loops). So in D one could have a something workable foreach(a,b,i;collectLoop([1,2,3],[2,3,7],Range(1,infinity))){ ... } Implementation of it is left as exercice to the reader ;-)In my libs there are zip() and azip() for something like that, but they don't use fibers yet... Once the fibers are added, the code can be made smart at compile-time, so it can use the current fast code when possible, and fibers when it must. Bye, bearophile
Jul 05 2008
"Koroskin Denis" wroteIf found myself solving the same problem again and again: how to implement simultaneous iteration over two (or more collections)? suppose, we have the arrays: int[] a = [1, 2, 3]; int[] b = [1, 4, 9]; and would like to multiply them per-component, like this: int[] c = new int[a.length]; for (int i = 0; i < a.length; ++j) { c[i] = a[i] * b[i]; } Since foreach is the prefered (and sometimes the only) way to iterate over collection, there should be a way to iterate over multiple collections simultaneously, like (just an example) this: int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; } But this isn't supported (yet?) More complicated example would be an iterating over two (or more) collection with *no* random access iterators available, opApply only. I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match. It's just a small layer written in asm, nothing more. How about an enhancement proposal? :)The issue here is how to formulate each iteration. foreach works because it is well defined "iterate over each value in the container". But iterating over multiple containers, you could want different orders of iteration, iterating for each unique set of indexes, etc. Or you could want what you had originally stated, in which cases the lengths must be correct. We already have foreach_reverse, which is like foreach, but iterates in the opposite direction. I don't want foreach_X for everyones preferred iteration method of choice. I think the best solution here is a fruct (foreach struct) in a library: struct myfruct { static myfruct opCall(int[] a, int[] b, int[] c) { myfruct result; result.a = a; result.b = b; result.c = c; assert(a.length == b.length && a.length == c.length); } int[] a; int[] b; int[] c; int opApply(int delegate(ref int i, ref int j, ref int k) dg) { int result = 0; for(int i = 0; i < a.length; i++) { if((result = dg(a[i], b[i], c[i])) != 0) break; } return result; } } usage: foreach(i, j, ref k; myfruct(a, b, c)) { k = i * j; } Now, the issue here is, how can this be made generic? The answer is, we need iterators. Otherwise you cannot iterate over multiple containers that are not index-based (such as AA's or maps). With iterators, that have a well defined interface (like opApply), that allows one to increment individual indexes, and not have to know what indexes are made of how how they work. They just define opInc and opStar. Oh, and we would need reference return types too (so opStar can return an actual reference). -Steve
Jul 03 2008
Koroskin Denis wrote:I spend a lot of time on this and have found no solution how to do it with existing D feature set, but this is surely doable with a few inter-function gotos and an exception if collections sizes don't match.It's a well-known problem, and not so easy to solve in an intuitive, non-klunky way. Some of the issues are what do you do when the collections are of different sizes? Error? Iterate to the minimum? Iterate to the maximum and use default values for the shorter collections? The best I can offer at the moment is: foreach (i; 0 .. a.length) a[i] = b[i] + c[i];
Jul 05 2008
Koroskin Denis wrote:int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; }What's wrong with auto c= a .* b; except, that it is neither suggested nor supported? Or auto c= a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length == y.length); c.length= x.length; foreach( i,v; x){ c[i]= x[i] * y[i]; } } -manfred
Jul 05 2008
Manfred_Nowak wrote:Koroskin Denis wrote:what about: int[100] a, b, c; for (int i = 0; i < 100; ++i) { a[i] = myCrazyFunkyFunction(b[i], c[i]); } D does not and should not have an opMyCrazyFunkyFunction.int[] c = new int[a.length]; foreach (i : a; j : b; ref k : c) { k = a + b; }What's wrong with auto c= a .* b; except, that it is neither suggested nor supported? Or auto c= a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length == y.length); c.length= x.length; foreach( i,v; x){ c[i]= x[i] * y[i]; } } -manfred
Jul 05 2008
Yigal Chripun wrote:D does not and should not have an opMyCrazyFunkyFunction.- Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/ - Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed to exist? At least it's in the language specification: http://www.digitalmars.com/d/1.0/pragma.html -manfred
Jul 05 2008
Manfred_Nowak wrote:Yigal Chripun wrote:What's so strong about the word "should" ?! If I wanted to make a "strong argument" as you say, I would've used the word "must". Also, I didn't talk about pragmas and compiler specific extensions but rather meant the core language. maybe you want to suggest a way to define infix functions? Last thing: I find it ridiculous you try to teach me the proper way to post because you didn't like my phrasing while other people use insults and other bad language on the NG and you're fine with that. That's just smells of hypocrisy. If you really want to educate someone here, start with those that curse, swear and troll. disappointed, YigalD does not and should not have an opMyCrazyFunkyFunction.- Maybe some knowledge of some types of disagreeing and their relation can turn out to be useful: http://blog.createdebate.com/2008/04/07/writing-strong-arguments/ - Maybe a DigitalMars_funky_extension.opMyCrazyFunkyFunction is allowed to exist? At least it's in the language specification: http://www.digitalmars.com/d/1.0/pragma.html -manfred
Jul 06 2008
On Sun, 06 Jul 2008 01:45:08 +0400, Manfred_Nowak <svv1999 hotmail.com> = = wrote:Koroskin Denis wrote:int[] c =3D new int[a.length]; foreach (i : a; j : b; ref k : c) { k =3D a + b; }What's wrong with auto c=3D a .* b; except, that it is neither suggested nor supported? Or auto c=3D a (*) b; which was suggested some long time ago, but has not got through? Is it horrible then to define: void opDotMul(T)(out T c, T x, T y){ assert( x.length =3D=3D y.length); c.length=3D x.length; foreach( i,v; x){ c[i]=3D x[i] * y[i]; =} } -manfredNothing wrong. It was just brought as an example. Replace int[] with a List!(int) that exposes opApply() but not iterators= = and try again :) The only way (apart from using stackthreads) you can do it now is = something like this: List!(int) a, b, c; int[] acopy, bcopy; foreach (i; a) { acopy ~=3D i; } foreach (j; b) { bcopy ~=3D j; } int i =3D 0; foreach (ref k; c) { k =3D acopy[i] + bcopy[i]; } compare with the following: List!(int) a, b, c; foreach (i; a) (j; b)(ref k; c) { c =3D a + b; } which one would you prefer?
Jul 05 2008
Koroskin Denis wrote:List!(int) a, b, c;[...]which one would you prefer?Still c= a .+ b; To do componentwise operations imposes some requirements for the involved types. These are at least: - an order on the elements, and - definition(s) for the operation(s) on the elements If an opApply cannot be interpreted as representing an ordering, than your List!-example will fail badly. -manfred
Jul 05 2008