digitalmars.D - C++ / Why Iterators Got It All Wrong
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (6/6) Aug 29 2017 Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/...
- bitwise (3/7) Aug 29 2017 "superseded" is the wrong word, as ranges cannot do everything
- Steven Schveighoffer (17/23) Aug 29 2017 Interesting. It reminds me a bit of cursors for dcollections. In there,
- bitwise (8/17) Aug 29 2017 I was about to whine about exactly this, but thought it would be
- jmh530 (8/13) Aug 29 2017 I was thinking this exact same thing when I got to the Element
- Steven Schveighoffer (9/26) Aug 29 2017 A border points between elements. It's like the end element in a
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (19/30) Aug 30 2017 I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end
- drug (2/29) Aug 31 2017 Interesting. How is it comparable with iterators and ranges ideas?
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (11/12) Sep 02 2017 Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6
- Moritz Maxeiner (6/11) Sep 02 2017 Thanks for your post about Rebol, I didn't know it before.
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (24/28) Sep 03 2017 As said, the official Rebol-2 version is a dead-end. Even our main
- Moritz Maxeiner (13/37) Sep 03 2017 I'll put in on my ever growing list of things to check out in
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (24/48) Sep 04 2017 Yes, some ideas are the same. But the syntax is way more beautyful and u...
- Mark (12/20) Sep 04 2017 They are all pretty low-level abstractions. C++ has posited that
- Mark (24/28) Sep 01 2017 Iterators have many problems. Andrei's talk some years ago,
- =?iso-8859-1?Q?Robert_M._M=FCnch?= (12/28) Sep 02 2017 Iterators are not the silver bullet. But IIRC you can specify if you
- Mark (41/46) Sep 05 2017 There isn't really some finite (or small, anyway) choice here. I
- Ilya Yaroshenko (22/26) Sep 02 2017 Iterators has no proper alternative when one need to implement
- Moritz Maxeiner (5/14) Sep 02 2017 Out of curiosity: Could you elaborate on what the issues are with
- Ilya Yaroshenko (37/52) Sep 03 2017 There are three general kinds of dynamic dense tensors. Mir
- Moritz Maxeiner (16/63) Sep 03 2017 Won't you have to implement opIndex* operators either way in
- Ilya Yaroshenko (40/111) Sep 03 2017 front,
- Moritz Maxeiner (26/127) Sep 03 2017 Didn't know about the *N *Exactly ones, but yeah, it's a lot.
- Ilya (4/19) Sep 03 2017 Maybe I should call it cursors or generic pointers instead of
- jmh530 (11/13) Sep 04 2017 dcollections uses a cursor approach that seems different from
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Sep 05 2017 Well, «C++ iterators» are table pointer abstractions, so you need
- jmh530 (5/11) Sep 06 2017 mir-ndslices are like a generalization of D's slices. Instead of
- Enamex (3/12) Sep 06 2017 Can you elaborate?
- jmh530 (48/62) Sep 06 2017 IMO, it's something that still needs to get explained better in
- Ilya Yaroshenko (11/20) Sep 07 2017 For example, lets takes `transposed` function. It does not
- jmh530 (13/23) Sep 07 2017 I think what's missing from the documentation is a clear
- jmh530 (9/20) Sep 07 2017 Hmm, maybe I can express this better as something like
- jmh530 (2/3) Sep 07 2017 Err, (3, 4) not (2, 3)
- jmh530 (12/16) Sep 07 2017 All kinds of screwed up. This is what I get for not testing
- Ilya Yaroshenko (13/30) Sep 10 2017 Another small difference is slicing:
- jmh530 (9/19) Sep 11 2017 Was not aware of this.
Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look. -- Robert M. Münch http://www.saphirion.com smarter | better | faster
Aug 29 2017
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look."superseded" is the wrong word, as ranges cannot do everything iterators can.
Aug 29 2017
On 8/29/17 8:50 AM, Robert M. Münch wrote:Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Interesting. It reminds me a bit of cursors for dcollections. In there, a cursor is a 0 or 1 element range. The one element range points at an element, the 0 element range points at a border. You can use cursors to compose ranges. https://github.com/schveiguy/dcollections Not sure how useful it is to have them be separate types. It can be nice I suppose. But one thing I love about iterators/cursors is that something like find doesn't assume what data you are interested in. In Phobos, find gives you a range where the first element is the one you searched for, and the last element is the end of the original range. But what if you wanted all the data *up to* the element instead? What if you just wanted to look at that specific element? So we need several functions that do this, and it's not always clear how to do it correctly, and it's difficult to compose one function from the other. With iterators, it's simple. -Steve
Aug 29 2017
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer wrote:In Phobos, find gives you a range where the first element is the one you searched for, and the last element is the end of the original range. But what if you wanted all the data *up to* the element instead? What if you just wanted to look at that specific element? So we need several functions that do this, and it's not always clear how to do it correctly, and it's difficult to compose one function from the other. With iterators, it's simple. -SteveI was about to whine about exactly this, but thought it would be off topic ;) I see "trim_left" in the slides which I'm guessing refers to what D calls "find". IMO, "trim_left" is a much better name. "find" should not return elements you didn't ask for, it should return a range of length 1 or 0.
Aug 29 2017
On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer wrote:Interesting. It reminds me a bit of cursors for dcollections. In there, a cursor is a 0 or 1 element range. The one element range points at an element, the 0 element range points at a border. You can use cursors to compose ranges. https://github.com/schveiguy/dcollectionsI was thinking this exact same thing when I got to the Element part of it. The mach library also makes use of cursors. https://github.com/pineapplemachine/mach.d I'm not entirely sure how if I grok how the Border thing they are talking about works.
Aug 29 2017
On 8/29/17 9:34 AM, jmh530 wrote:On Tuesday, 29 August 2017 at 13:23:50 UTC, Steven Schveighoffer wrote:A border points between elements. It's like the end element in a standard STL container, it's not really pointing at an element, but one past the last element. It can be handy when specifying ranges. I.e. exclusive or inclusive ranges. My gut feeling is that the splitting of types between elements and borders is too much machinery, but I haven't seen it in practice to make a fair judgment. -SteveInteresting. It reminds me a bit of cursors for dcollections. In there, a cursor is a 0 or 1 element range. The one element range points at an element, the 0 element range points at a border. You can use cursors to compose ranges. https://github.com/schveiguy/dcollectionsI was thinking this exact same thing when I got to the Element part of it. The mach library also makes use of cursors. https://github.com/pineapplemachine/mach.d I'm not entirely sure how if I grok how the Border thing they are talking about works.
Aug 29 2017
On 2017-08-29 13:23:50 +0000, Steven Schveighoffer said:... In Phobos, find gives you a range where the first element is the one you searched for, and the last element is the end of the original range. But what if you wanted all the data *up to* the element instead? What if you just wanted to look at that specific element? So we need several functions that do this, and it's not always clear how to do it correctly, and it's difficult to compose one function from the other.I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end these days but it has a lot of very nice ideas) and it uses the idea of a series datatype. Fruther information here: http://www.rebol.com/docs/core23/rebolcore-6.html There you do something like: 1. Return the first hit and the rest of the series== "abcdefg"my-series: "abcdefg"== "bcdefg" 2. Return only the hitfind my-series "b"3. Return everything before the first hitfirst find my-series "b"== "abc" If you ever got used to such a thinking, writing & using more non-functional approaches, really hurts. -- Robert M. Münch http://www.saphirion.com smarter | better | fastercopy/part my-series find my-series "d"
Aug 30 2017
31.08.2017 09:50, Robert M. Münch пишет:On 2017-08-29 13:23:50 +0000, Steven Schveighoffer said:Interesting. How is it comparable with iterators and ranges ideas?... In Phobos, find gives you a range where the first element is the one you searched for, and the last element is the end of the original range. But what if you wanted all the data *up to* the element instead? What if you just wanted to look at that specific element? So we need several functions that do this, and it's not always clear how to do it correctly, and it's difficult to compose one function from the other.I'm a big fan of Rebol (yes, it's far from mainstream and a dead-end these days but it has a lot of very nice ideas) and it uses the idea of a series datatype. Fruther information here: http://www.rebol.com/docs/core23/rebolcore-6.html There you do something like: 1. Return the first hit and the rest of the series== "abcdefg"my-series: "abcdefg"== "bcdefg" 2. Return only the hitfind my-series "b"3. Return everything before the first hitfirst find my-series "b"== "abc" If you ever got used to such a thinking, writing & using more non-functional approaches, really hurts.copy/part my-series find my-series "d"
Aug 31 2017
On 2017-08-31 07:13:55 +0000, drug said:Interesting. How is it comparable with iterators and ranges ideas?Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6 Just download the interpreter and play around with it to get a feeling. Overall it's way more functional. Like getting the last element: copy back tail my-series What I don't like is that indexing starts at 1, because this makes working with +/- offsets from a specific positon cumbersome. -- Robert M. Münch http://www.saphirion.com smarter | better | faster
Sep 02 2017
On Saturday, 2 September 2017 at 20:17:40 UTC, Robert M. Münch wrote:On 2017-08-31 07:13:55 +0000, drug said:Thanks for your post about Rebol, I didn't know it before. After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?Interesting. How is it comparable with iterators and ranges ideas?Well, see: http://www.rebol.com/docs/core23/rebolcore-6.html#section-6
Sep 02 2017
On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:Thanks for your post about Rebol, I didn't know it before.As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive. There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example. Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way. From my experience, traversing datastructures with a functional syntax and concept is really natural to work with. It's mostly what you would tell someone to do using english. -- Robert M. Münch http://www.saphirion.com smarter | better | faster
Sep 03 2017
On Sunday, 3 September 2017 at 08:37:36 UTC, Robert M. Münch wrote:On 2017-09-02 21:27:58 +0000, Moritz Maxeiner said:I'll put in on my ever growing list of things to check out in depth, thanks :pThanks for your post about Rebol, I didn't know it before.As said, the official Rebol-2 version is a dead-end. Even our main product is still based on it :-) 15 years old technology, but still working and we know all problemes. So very few surprises. And, it's very productive. There exists a half-done official Rebol-3 version as open-source. It was picked up by some and continued. And than there is a thing called Red, which uses a lot of ideas but compiles. Worth a look too. It's really cool, because it compiles native apps for Android without any SDK for example. Overall, it's worth to spend some time with Rebol. I'm sure you won't your time back and can learn a lot. Things to look at: VIEW (GUI), PARSE (for parsing) and after this using PARSE to create DSL with Rebol. Very cool feature.Sounds like LISP :)After reading through the series chaper, though, AFAICT Rebol series *are* iterators (begin+end), just with really nice, functional (read: LISP) syntax?There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.From my experience, traversing datastructures with a functional syntax and concept is really natural to work with. It's mostly what you would tell someone to do using english.I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation: A D input range exposes via empty/front/popFront. A classic iterator exposes via begin/end. A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.
Sep 03 2017
On 2017-09-03 12:46:05 +0000, Moritz Maxeiner said:I'll put in on my ever growing list of things to check out in depth, thanks :pYou're welcome. Don't wait to long ;-)Yes, some ideas are the same. But the syntax is way more beautyful and useable.There is no difference between code & data in Rebol. And it has a very rich set of datatpye, IIRC about 35 native ones. And many of them are series, which can be operated in the same way.Sounds like LISP :)I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation: A D input range exposes via empty/front/popFront. A classic iterator exposes via begin/end. A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.The series has elements and a "current pointer" which can be manipulated. Begin & end are just that, begin and end of the series. You don't manipulate them directly. Of couse you can change a series by adding/removing, cutting etc. which then changes begin/end accordingly.== [a b c d e f g]a: [a b c d e f g]== ea/5== [f g]skip a 5== [a b c d e f g]a== eb: a/5== word!type? b== block!type? a== [f g]b: skip a 5== block!type? b== 6index? b== [a b c d e f g] You can even treat such a block as fixed-size record and operate on this. Very handy. -- Robert M. Münch http://www.saphirion.com smarter | better | fasterhead b
Sep 04 2017
On Sunday, 3 September 2017 at 12:46:05 UTC, Moritz Maxeiner wrote:I agree, though I was talking about what the abstract data type of a "series" is, i.e. what operations is exposes. From my observation: A D input range exposes via empty/front/popFront. A classic iterator exposes via begin/end. A Rebol series seems to be a safer form of iterator, as it doesn't expose begin/end directly, but exposes restricted operations that are defined as manipulating begin/end.They are all pretty low-level abstractions. C++ has posited that iterators should be the bridge connecting generic data structures and generic algorithms. But they are pretty awful at that. They make it incredibly easy to destroy one of your structure's invariants, or to use it in a way that ignores some of its invariants (leading to inferior performance and sometimes unnecessarily bulky code). Ironically iterators are frequently used in OO code even though they clearly violate the "Tell, don't ask" principle, as do D ranges (and presumably also Rebol series, though I only skimmed over the documentation).
Sep 04 2017
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators have many problems. Andrei's talk some years ago, titled "Iterators Must Go", points out many of them in a clear fashion. I think most of the problems stem from the fact that they are inherently a leaky abstraction. Iterators basically treat all data structures as a singly/doubly linked list or as arrays (if they allow random access). There is no natural or intuitive way of describing, say, a tree or an arbitrary graph as a list/array. Ranges and cursors try to solve some of the issues but they still have this fundamental problem. When I think of a good iteration interface, the only thing that comes to mind is SQL queries (= relational algebra, more or less). You have a few basic operators (Select, From, etc.) for querying that are intuitive, simple and safe. Furthermore, they compose elegantly, allowing you to express fairly complicated queries using these few basic operators. It's a true abstraction - I don't know and I don't care if the operators are implemented underneath using iterators, ranges, cursors or whatever (of course, sometimes I do care, when performance is a priority...). It may be unrealisic to expect such a nice abstraction of iteration for other data structures, not to mention a universal one which will work for all interesting data structures. Still, if and when I can dispense with iteration altogether, I'm happy to do so.
Sep 01 2017
On 2017-09-01 20:34:32 +0000, Mark said:On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Iterators are not the silver bullet. But IIRC you can specify if you want to iterate over a graph BF or DF. If you just need to "iterate" over the elements things work pretty good IMO. If you want to select a sub-set and then iterate things get much complicater anyway.Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators have many problems. Andrei's talk some years ago, titled "Iterators Must Go", points out many of them in a clear fashion. I think most of the problems stem from the fact that they are inherently a leaky abstraction. Iterators basically treat all data structures as a singly/doubly linked list or as arrays (if they allow random access). There is no natural or intuitive way of describing, say, a tree or an arbitrary graph as a list/array. Ranges and cursors try to solve some of the issues but they still have this fundamental problem.It may be unrealisic to expect such a nice abstraction of iteration for other data structures, not to mention a universal one which will work for all interesting data structures.That's true, hence I didn't ever expected this at all. Some pretty simple & basic building blocks and the rest is done on-demand fitting the use-case. -- Robert M. Münch http://www.saphirion.com smarter | better | faster
Sep 02 2017
On Saturday, 2 September 2017 at 20:22:44 UTC, Robert M. Münch wrote:Iterators are not the silver bullet. But IIRC you can specify if you want to iterate over a graph BF or DF. If you just need to "iterate" over the elements things work pretty good IMO. If you want to select a sub-set and then iterate things get much complicater anyway.There isn't really some finite (or small, anyway) choice here. I may want to iterate over a binary tree in preorder/inorder/postorder DF order. I may want to iterate over it in BF order. I may want to handle the left son before the right one, or vice versa. This is already 8 iterators. In C++ land I also need to provide const and non-const versions of each iterator, which brings it up to 16 iterators (I'm not sure what's the situation in D - can you have const ranges and still be able to call popFront on them?) From the implementor's side, this is a headache. However, it's not that bad in a language in D, since "static if" saves you a lot of code in cases like this. The bigger problem is on the user's side: Which iterator should she use? For most applications more than one type of iterator will be adequate, but the "right" choice will often be implementation-dependent. For instance, if the tree in question is a complete binary tree, then it is often implemented as an array that contains the tree's elements in BF order. Therefore, if some operation on the tree can be done by traversing it BF then the user should probably use a BF iterator, for performance reasons. But the user doesn't know how the tree is implemented, so she can't make that choice... If the implementor informs the user about the implementation in the documentation, then his data structure is a cost-free abstraction (assuming that the user always chooses wisely...) but also a leaky one. If he doesn't, then the abstraction doesn't leak but it is no longer cost-free. He can't have both. A simpler example is a class interface for a (two-dimensional) matrix. You would expect the interface to provide a row-major order iterator and a column-major order one. Again, from the user's POV, the right choice (performance wise) will be dependent on the implementation. Asymptotic complexity is sometimes considered an essential detail of a "true" abstraction but this doesn't help here since the difference in performance is only observed in practice. This is why I find the presentation in the OP - "Why iterators got it all wrong" - a bit iffy. Iterators did get a lot of things wrong, but the presentation focuses on relatively mundane issues - ugly syntax and somewhat annoying semantics.
Sep 05 2017
On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1]. User level API is ndslice [4] (D n-dimensional random access range) , can be combined with Phobos. In the same time library uses low level unbounded random access iterator [2] representation to implement 90% of topology logic. This approach allows to maintain few times smaller, simpler and less error-prone code comparing with std.range. The best example it `bitwise` functions implementation in Phobos and Mir Algorithm. Iterators are very good for type composition when one need its own D range that can not be constructed using existing mir.ndslice.topology / std.range API. [1] https://github.com/libmir/mir-algorithm [2] http://docs.algorithm.dlang.io/latest/mir_ndslice_topology.html [3] http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html [4] http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#Slice Best Regards, Ilya
Sep 02 2017
On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Sep 02 2017
On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer). 2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload. 3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor. [1] [2] https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.flip.html Best regards, IlyaOn Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].
Sep 03 2017
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor.Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges. Thanks for the summary of issues :)
Sep 03 2017
On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner wrote:On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:front, popFront, empty, save, back, popBack, opIndexOpAssign, 2 x opIndex(.opSlice) for rhs scalars and ranges 2 x opIndexAssign(.opSlice) for rhs scalars and ranges 2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges Plus, because of range topology slicing may be slower good to add this ones: popFrontN popFrontExactly, popBackN popBackExactly, and maybe i forgot something ...On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].ndslice is: 1. Slice structure with a huge amount of features from mir algorithm 2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $]. 3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row. But this paragraph was about another issue: struct BLASMatrixTypeBasedOnRanges { double[] _data; // arrays is the simplest RAR size_t rows, cols; size_t stride; } sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5; in other hand: sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4; Ranges requires for 25% more space for canonical matrixes then iterators.2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?3. Ranges can not move backward (auto b = a[-4 .. $].). This means one can not implement dynamic `reversed`(flip) [1, 2] operation for dimensions that have strides (universal or canonical tensors). _Dynamic_ means that instead of construct a new type with reversed order for a dimension, we just negate the corresponding stride and move the cursor.Right, that is indeed a limitation of the range abstraction (though a type could expose both functionality to move backwards _and_ the range API) and if you need that, it makes sense not to use ranges. Thanks for the summary of issues :)
Sep 03 2017
On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko wrote:On Sunday, 3 September 2017 at 12:35:16 UTC, Moritz Maxeiner wrote:Didn't know about the *N *Exactly ones, but yeah, it's a lot. Though unless you aren't interesting in the `a[]` syntax, you can't avoid opIndex*.On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:front, popFront, empty, save, back, popBack, opIndexOpAssign, 2 x opIndex(.opSlice) for rhs scalars and ranges 2 x opIndexAssign(.opSlice) for rhs scalars and ranges 2 x opIndexOpAssign(.opSlice) for rhs scalars and ranges Plus, because of range topology slicing may be slower good to add this ones: popFrontN popFrontExactly, popBackN popBackExactly, and maybe i forgot something ...On Sunday, 3 September 2017 at 02:43:51 UTC, Moritz Maxeiner wrote:Won't you have to implement opIndex* operators either way in order to use the `a[]` syntax? The main difference I can see should be that you'd also have to implement the InputRange (front,popFront,empty), ForwardRange (save), and BidirectionalRange (back,popBack) members, which if you don't need them, is indeed understandably too much?On Sunday, 3 September 2017 at 02:08:20 UTC, Ilya Yaroshenko wrote:There are three general kinds of dynamic dense tensors. Mir implements all of them: 1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors. Finite random access range Issues: 1. Main API issue is that full featured random access range (RAR) API (with slicing, and all opIndex* operators like `[]*=` ) is much larger comparing with unbounded random access iterator (the same API as pointer).On Tuesday, 29 August 2017 at 12:50:08 UTC, Robert M. Münch wrote:Out of curiosity: Could you elaborate on what the issues are with using a range based API internally (if it's performance, the "why")?Maybe of interest: https://www.think-cell.com/en/career/talks/iterators/#1 I haven't read everything, so not sure if it worth to take a look.Iterators has no proper alternative when one need to implement generic tensor library like Mir Algorithm [1].Now I get what you mean, you were talking about not using the dynamic arrays built into the language (which indeed carry their length for safety purposes), not about exposing range API here; you're right, you don't need to use a dynamic array here, as you already have the length encoded another way (it would be wasteful), but strictly speaking D's builtin dynamic arrays are not ranges (as they are neither aggregate types nor do they have the range functions baked into the language). They only become ranges when you import the appropriate free functions from Phobos (or define some yourself).ndslice is: 1. Slice structure with a huge amount of features from mir algorithm 2. multidimensional RAR: .front!0, front!1, .front!(N - 1) and all other generalized RAR primitves including multidimensional index [1, 2, 3, ...] and slicing [1 .. 4, 3 .. $]. 3. Common one dimensional RAR on top of outer most dimension. So one can use Phobos funcitons for Slice structure. For example, a matrix will be iterated row-by-row. But this paragraph was about another issue: struct BLASMatrixTypeBasedOnRanges { double[] _data; // arrays is the simplest RAR size_t rows, cols; size_t stride; }2. Random access range holds its length. Yes, Phobos has infinity ranges (mir hasn't), but the practice is that almost any RAR constructed using Phobos has length and you can not strip when type is constructed. Anyway, infinity RAR is just a pointer/iterator that can move forward but can not move backward. Tensors hold their own lengths for each dimensions, so range's length is just a useless payload.I'm not sure what you mean here. Is this still about accessing the elements of one tensor? If so, what do Phobos' ranges have to do with your tensor type's API?sizeof(YourBLASMatrixTypeBasedOnRanges) == size_t.sizeof * 5; in other hand: sizeof(Slice!(Canonical, [2], double*)) == size_t.sizeof * 4; Ranges requires for 25% more space for canonical matrixes then iterators.We do have to differentiate between a container and the API with which to iterate over its elements. D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays. In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.
Sep 03 2017
On Sunday, 3 September 2017 at 16:33:04 UTC, Moritz Maxeiner wrote:On Sunday, 3 September 2017 at 14:19:19 UTC, Ilya Yaroshenko wrote:Maybe I should call it cursors or generic pointers instead of iterators.Ranges requires for 25% more space for canonical matrixes then iterators.We do have to differentiate between a container and the API with which to iterate over its elements. D's dynamic arrays (a pointer+length) require more space then using only a raw pointer, of course. If that's more than an iterator depends on the type of iterator you use. Most common iterator implementations I know consist of a begin and an end pointer, essentially requiring the same space as D's dynamic arrays. In the case you describe, though, we aren't talking about the iteration API, we are talking about what's used internally for storage.
Sep 03 2017
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:Maybe I should call it cursors or generic pointers instead of iterators.dcollections uses a cursor approach that seems different from yours https://github.com/schveiguy/dcollections/blob/master/concepts.txt Maybe generic pointers makes more sense? Unless I'm mistaken, mir iterators are more like pointers with specific operations defined for how they are incremented, assigned, etc. I had been a little confused by the name iterators because I don't see any functions in mir that use begin/end. But I don't know if that's more about C++'s implementation of iterators or more core to the concept.
Sep 04 2017
On Monday, 4 September 2017 at 04:29:36 UTC, Ilya wrote:Maybe I should call it cursors or generic pointers instead of iterators.Well, «C++ iterators» are table pointer abstractions, so you need a pair. An «iterator» would be a possibly heavy object that is used for traversing a possibly complex and heterogenous data-structure without exposing any notion of size or end a-priori (e.g. before it arrives at the end, if there is an end).
Sep 05 2017
On Wednesday, 6 September 2017 at 06:57:25 UTC, Ola Fosheim Grøstad wrote:Well, «C++ iterators» are table pointer abstractions, so you need a pair. An «iterator» would be a possibly heavy object that is used for traversing a possibly complex and heterogenous data-structure without exposing any notion of size or end a-priori (e.g. before it arrives at the end, if there is an end).mir-ndslices are like a generalization of D's slices. Instead of pointer and a length, they are an iterator and lengths/strides. So the size is exposed.
Sep 06 2017
On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.Can you elaborate?
Sep 06 2017
On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote:On Sunday, 3 September 2017 at 09:24:03 UTC, Ilya Yaroshenko wrote:IMO, it's something that still needs to get explained better in the documentation. I haven't tried to because I'm not 100% on it. Below is as best as I have figured things out: All Slices in mir can have an iterator, lengths, and strides. The lengths are always N-dimensional and contain information on the shape of the Slice. So for instance, if the lengths are [3, 4], then N=2 and it is a 2-dimensional slice, with 3 rows and 4 columns. I left out packs...which are an added complication. Packs can be used to make slices of slices. For the above Slice, the default would simply be that the packs are [1], which means that there is no slice of slicing going on. If the packs were now [1, 1] (the sum of packs must equal N), then that is like saying you now have a slice of slices. In this case, slice[0] would be a row instead of a scalar. This is what allows you to iterate by row or by column. So when you're thinking about contiguous, canonical, and universal. The way that lengths and packs work is the same for all of them. The difference is in the strides. Contiguous slices don't have a strides vector. Canonical slices have a strides vector with a length of N-1. Universal slices have a strides vector of length N. So how are the strides used and why do they matter? I'm not sure I grok this part 100%, so I'll do my best. Strides tell you how much difference there is between the units of each array. So for instance, if my slice (call it a) has lengths [2, 3, 4] with strides [12, 4, 1], then a[0] is a [3, 4] matrix, which is where the 12 comes from. To move the pointer to the start of the next [3, 4] matrix (a[1]), requires moving 12 of whatever the type is. This would be a universal slice because it has N=3 lengths and strides. So if you call a._strides, then it would give you [12, 4, 1]. If a were canonical, calling _strides would give you [12, 4] because _strides for canonical slices have length N-1. Now if a were contiguous instead of universal and you call _strides on it, then it would give you [], because contiguous slices have no strides. The memory footprint of the slice appears different for these with a and a[0] of universal being larger than canonical and contiguous. This largely reflects the storage of the strides data. Similarly, a[0] has _strides [4, 1] for universal, [4] for canonical, and [] for contiguous. Mir is written in such a way that a[0] the same regardless of the SliceKind. For the most part, this means that it isn't really obvious that there is a difference between them. It matters in some underlying functions, but I haven't needed to do much other than sometimes convert a contiguous slice to universal (though it's not always clear to me why, I just do it).1. Contiguous tensors. Their data is located contiguously in memory. Single dense memory chunk. All strides between subs-tensors can be computed from lengths. 2. Canonical tensors. Only data for one dimension is dense, other dimensions has strides that can not be computed from lengths. BLAS matrixes are canonical tensors: they have two lengths and one stride. 3. Universal tensors. Each dimension has a stride. Numpy ndarrays are universal tensors.Can you elaborate?
Sep 06 2017
On Wednesday, 6 September 2017 at 21:44:01 UTC, jmh530 wrote:On Wednesday, 6 September 2017 at 20:24:05 UTC, Enamex wrote: Similarly, a[0] has _strides [4, 1] for universal, [4] for canonical, and [] for contiguous. Mir is written in such a way that a[0] the same regardless of the SliceKind. For the most part, this means that it isn't really obvious that there is a difference between them. It matters in some underlying functions, but I haven't needed to do much other than sometimes convert a contiguous slice to universal (though it's not always clear to me why, I just do it).For example, lets takes `transposed` function. It does not transpose the date. Instead, it swap dimensions. Assume you have a canonical matrix with _lengths = [3, 4]. So its strides are [4]. Now we want to swap dimensions, but to do it we need to swap both lengths and strides. So first we need to convert a slice to universal, so it will have both strides we want to swap: [4, 1]. Transposed slice will have _lengths = [4, 3] and _strides = [1, 4]. Best Regards, Ilya
Sep 07 2017
On Thursday, 7 September 2017 at 09:40:40 UTC, Ilya Yaroshenko wrote:For example, lets takes `transposed` function. It does not transpose the date. Instead, it swap dimensions. Assume you have a canonical matrix with _lengths = [3, 4]. So its strides are [4]. Now we want to swap dimensions, but to do it we need to swap both lengths and strides. So first we need to convert a slice to universal, so it will have both strides we want to swap: [4, 1]. Transposed slice will have _lengths = [4, 3] and _strides = [1, 4]. Best Regards, IlyaI think what's missing from the documentation is a clear explanation of how the strides determine how the iterator moves. Even something like below (assuming I have the math right) would be an improvement, though I'm sure there is a clearer way to express the concept. auto x = iota(3, 4).universal; assert(x.strides == [4, 1]); assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1 auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4
Sep 07 2017
On Thursday, 7 September 2017 at 12:04:12 UTC, jmh530 wrote:I think what's missing from the documentation is a clear explanation of how the strides determine how the iterator moves. Even something like below (assuming I have the math right) would be an improvement, though I'm sure there is a clearer way to express the concept. auto x = iota(3, 4).universal; assert(x.strides == [4, 1]); assert(x[2, 3] == 6); //(2 - 1) * 4 + (3 - 1) * 1 auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[2, 3] == 9); //(2 - 1) * 1 + (3 - 1) * 4Hmm, maybe I can express this better as something like auto data = iota(12); auto x = data.sliced(2, 3).universal; assert(x.strides == [4, 1]); assert(x[2, 3] == data[(2 - 1) * 4 + (3 - 1) * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[2, 3] == data[(2 - 1) * 1 + (3 - 1) * 4]);
Sep 07 2017
On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:auto x = data.sliced(2, 3).universal;Err, (3, 4) not (2, 3)
Sep 07 2017
On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:On Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:All kinds of screwed up. This is what I get for not testing things before I post them. unittest { auto data = iota(12); auto x = data.sliced(3, 4).universal; assert(x.strides == [4, 1]); assert(x[1, 2] == data[1 * 4 + 2 * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[1, 2] == data[1 * 1 + 2 * 4]); }auto x = data.sliced(2, 3).universal;Err, (3, 4) not (2, 3)
Sep 07 2017
On Thursday, 7 September 2017 at 20:46:22 UTC, jmh530 wrote:On Thursday, 7 September 2017 at 12:28:00 UTC, jmh530 wrote:Another small difference is slicing: For example, for contiguous matrix m: 1. m[a .. b] is contiguous 2. m[i] is contiguous 3. m[a .. b, i] is universal (because there are no 1D canonical slices) 4. m[a .. b, c .. d] is canonical BTW, could you please update the docs or may be write a small article for Mir blog? I will be happy to answer your questions if any. Best Regards, IlyaOn Thursday, 7 September 2017 at 12:27:19 UTC, jmh530 wrote:All kinds of screwed up. This is what I get for not testing things before I post them. unittest { auto data = iota(12); auto x = data.sliced(3, 4).universal; assert(x.strides == [4, 1]); assert(x[1, 2] == data[1 * 4 + 2 * 1]); auto y = x.transposed; assert(y.strides == [1, 4]); assert(y[1, 2] == data[1 * 1 + 2 * 4]); }auto x = data.sliced(2, 3).universal;Err, (3, 4) not (2, 3)
Sep 10 2017
On Monday, 11 September 2017 at 05:41:37 UTC, Ilya Yaroshenko wrote:Another small difference is slicing: For example, for contiguous matrix m: 1. m[a .. b] is contiguous 2. m[i] is contiguous 3. m[a .. b, i] is universal (because there are no 1D canonical slices) 4. m[a .. b, c .. d] is canonicalWas not aware of this.BTW, could you please update the docs or may be write a small article for Mir blog?I'm happy to do some additional work on the docs, but I was waiting until I had a better understanding of things. I've also been working on some other things and there are only so many hours in the day. I want to add cholesky to mir-lapack/lubeck.I will be happy to answer your questions if any.I think I had asked a question on gitter, but I don't think I had heard back. Not sure how often you check that.
Sep 11 2017