digitalmars.D.learn - is std.algorithm.joiner lazy?
- Puming (24/24) Apr 07 2016 Hi:
- Edwin van Leeuwen (7/32) Apr 07 2016 Apparently it works processing the first two elements at
- Puming (5/13) Apr 07 2016 OK. Even if it consumes the first two elements, then why does it
- Edwin van Leeuwen (6/11) Apr 07 2016 After some testing it seems to get each element twice, calls
- Puming (6/18) Apr 07 2016 Thanks! I added more elements to xs and checked that you are
- Jonathan M Davis via Digitalmars-d-learn (20/40) Apr 07 2016 I would note that in general, it's not uncommon for an algorithm to acce...
- Puming (22/71) Apr 07 2016 But in the joiner docs, it says joiner is lazy. But accessing
- Jonathan M Davis via Digitalmars-d-learn (42/91) Apr 07 2016 Lazy means that it's not going to consume the entire range when you call...
- Puming (10/66) Apr 07 2016 So what you mean is to read the front in constructor, and read
- Jonathan M Davis via Digitalmars-d-learn (54/60) Apr 07 2016 In general, when you're dealing with a non-random access range, it's bes...
- Puming (3/4) Apr 07 2016 Thanks. I'll adopt this idiom. Hopefully it gets used often
- Mike Parker (33/37) Apr 08 2016 What would such a function look like? I don't think such a thing
- Puming (2/17) Apr 08 2016
- Puming (21/34) Apr 07 2016 Well, I used map because of when viewing the scenario in a data
- Puming (28/40) Apr 07 2016 There is another problem with cache, that is if I want another
- Edwin van Leeuwen (28/33) Apr 07 2016 That seems like a bug to me and you might want to submit it to
- Puming (4/10) Apr 07 2016 Thanks. I just looked at the joiner code, but didn't find the
Hi: when I use map with joiner, I found that function in map are called. In the document it says joiner is lazy, so why is the function called? say: int[] mkarray(int a) { writeln("mkarray called!"); return [a * 2]; // just for test } void main() { auto xs = [1, 2]; auto r = xs.map!(x=>mkarray(x)).joiner; } running this will get the output: mkarray called! mkarray called! I suppose joiner does not consume? when I actually consume the result by writlen, I get more output: mkarray called! mkarray called! [2mkarray called! mkarray called! , 4] I don't understand
Apr 07 2016
On Thursday, 7 April 2016 at 07:07:40 UTC, Puming wrote:Hi: when I use map with joiner, I found that function in map are called. In the document it says joiner is lazy, so why is the function called? say: int[] mkarray(int a) { writeln("mkarray called!"); return [a * 2]; // just for test } void main() { auto xs = [1, 2]; auto r = xs.map!(x=>mkarray(x)).joiner; } running this will get the output: mkarray called! mkarray called! I suppose joiner does not consume? when I actually consume the result by writlen, I get more output: mkarray called! mkarray called! [2mkarray called! mkarray called! , 4] I don't understandApparently it works processing the first two elements at creation. All the other elements will be processed lazily. Even when a range is lazy the algorithm still often has to "consume" one or two starting elements, just to set initial conditions. It does surprise me that joiner needs to process the first two, would have to look at the implementation why.
Apr 07 2016
On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote:On Thursday, 7 April 2016 at 07:07:40 UTC, Puming wrote:OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.[...]Apparently it works processing the first two elements at creation. All the other elements will be processed lazily. Even when a range is lazy the algorithm still often has to "consume" one or two starting elements, just to set initial conditions. It does surprise me that joiner needs to process the first two, would have to look at the implementation why.
Apr 07 2016
On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;
Apr 07 2016
On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:Thanks! I added more elements to xs and checked that you are right. So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;
Apr 07 2016
On Thursday, April 07, 2016 08:47:15 Puming via Digitalmars-d-learn wrote:On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal. And it's not even the case that it necessarily makes sense to make a rule of thumb that ranges should copy front instead of calling it multiple times, because if front returns by ref, calling front multiple times is likely to be cheapepr, and while we don't properly support non-copyable types (like UniquePtr) with ranges right now, we really should, so if anything, it becomes the case that algorithms should favor calling front multiple times over copying its value. So, there are pros and cons involved with copying front vs calling it multiple times, and I think that both approaches are both pretty common at this point. So, given how frequently it makes sense for map to allocate (e.g. to!string(a)), map should probably have a note about cache, but overall, it's just something that you need to be aware of. Regardless, I don't think that it makes sense to put anything in joiner's docs about it. - Jonathan M DavisOn Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:Thanks! I added more elements to xs and checked that you are right. So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;
Apr 07 2016
On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:On Thursday, April 07, 2016 08:47:15 Puming via Digitalmars-d-learn wrote:But in the joiner docs, it says joiner is lazy. But accessing front multiple times is not true laziness. I think it better note that after the lazy part: "joiner is lazy, but it will access the front twice". If there are many other lazy functions behave like this, I suggest to make a new name for it, like 'semi-lazy', to be more accurate. Maybe its my fault, I didn't know what cache does before Edwin told me. So there is the solution, it just is not easy for newbies to find out because there is no direct link between these functions.On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal.On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:Thanks! I added more elements to xs and checked that you are right. So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;And it's not even the case that it necessarily makes sense to make a rule of thumb that ranges should copy front instead of calling it multiple times, because if front returns by ref, calling front multiple times is likely to be cheapepr, and while we don't properly support non-copyable types (like UniquePtr) with ranges right now, we really should, so if anything, it becomes the case that algorithms should favor calling front multiple times over copying its value.Indeed. I think copy is not good. But multiple access is a thing to note. When I want to use lazy things, it usually is that I'm reading files, so accessing twice is not acceptable.So, there are pros and cons involved with copying front vs calling it multiple times, and I think that both approaches are both pretty common at this point. So, given how frequently it makes sense for map to allocate (e.g. to!string(a)), map should probably have a note about cache, but overall, it's just something that you need to be aware of. Regardless, I don't think that it makes sense to put anything in joiner's docs about it.There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm? Because of this, now I have to read a file multiple times(using only joiner), or have to eagerly retrieve data in an array (which is too big), or fall back to an imperative way of manually accessing each file. They are all bad.- Jonathan M Davis
Apr 07 2016
On Friday, April 08, 2016 00:30:05 Puming via Digitalmars-d-learn wrote:On Thursday, 7 April 2016 at 18:15:07 UTC, Jonathan M Davis wrote:Lazy means that it's not going to consume the entire range when you call the function. Rather, it's going to return a range that you can iterate over. It may or may not process the first element before returning, depending on how it works, and there's definitely nothing that says whether it's going to access front multiple times or not before calling popFront. And accessing front multiple times without calling popFront is _normal_ whether you're dealing with a lazy range or an eager one. All that lazy means is that you're getting a range from the function rather than it consuming the range before returning. So, whatever you do with a range, in general, you have to assume that an algorithm might access front multiple times, and the implementation is free to change so that it accesses it more times or fewer times, because the range API says nothing about whether front is accessed multiple times or not. front needs to return equal values every time that it's called before popFront is called, but that doesn't mean that they have to be the same objects, and it doesn't mean that there's any restriction on how many times front is accessed before a call to popFront. So, I see no reason for joiner to say anything in its docs about how many times it accesses front. It's pretty much irrelevant to how ranges are expected to work, and it could change. If it actually matters for what you're doing, then you need to figure out how to rework your code so that it doesn't matter whether front is accessed multiple times per call to popFront or not. That's just part of working with ranges, though I can certainly understand if you didn't realize that previously.On Thursday, April 07, 2016 08:47:15 Puming via Digitalmars-d-learn wrote:But in the joiner docs, it says joiner is lazy. But accessing front multiple times is not true laziness. I think it better note that after the lazy part: "joiner is lazy, but it will access the front twice". If there are many other lazy functions behave like this, I suggest to make a new name for it, like 'semi-lazy', to be more accurate. Maybe its my fault, I didn't know what cache does before Edwin told me. So there is the solution, it just is not easy for newbies to find out because there is no direct link between these functions.On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:I would note that in general, it's not uncommon for an algorithm to access front multiple times. So, this really isn't a joiner-specific issue. If anything, it's map that should get a note in its docs, not joiner. You really should just expect front to be called multiple times. So, if that's a problem, use cache. But joiner is not doing anything abnormal.On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:Thanks! I added more elements to xs and checked that you are right. So EVERY element is accessed twice with joiner. Better add that to the docs, and note the use of cache.On Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do. It's not obvious to me why your example is failing to compile - the problem appears to be with cache specifically and has nothing to do with joiner - and I am inclined to agree that there's a bug there (be it in cache or in the compiler), but I really think that using map is a bad move for what you're trying to do anyway - especially when you consider what will happen if opIndex is used. I'd strongly encourage you to just write a range that does what you need instead. - Jonathan M Davis
Apr 07 2016
On Friday, 8 April 2016 at 01:14:11 UTC, Jonathan M Davis wrote:[...] Lazy means that it's not going to consume the entire range when you call the function. Rather, it's going to return a range that you can iterate over. It may or may not process the first element before returning, depending on how it works, and there's definitely nothing that says whether it's going to access front multiple times or not before calling popFront. And accessing front multiple times without calling popFront is _normal_ whether you're dealing with a lazy range or an eager one. All that lazy means is that you're getting a range from the function rather than it consuming the range before returning. So, whatever you do with a range, in general, you have to assume that an algorithm might access front multiple times, and the implementation is free to change so that it accesses it more times or fewer times, because the range API says nothing about whether front is accessed multiple times or not. front needs to return equal values every time that it's called before popFront is called, but that doesn't mean that they have to be the same objects, and it doesn't mean that there's any restriction on how many times front is accessed before a call to popFront. So, I see no reason for joiner to say anything in its docs about how many times it accesses front. It's pretty much irrelevant to how ranges are expected to work, and it could change. If it actually matters for what you're doing, then you need to figure out how to rework your code so that it doesn't matter whether front is accessed multiple times per call to popFront or not. That's just part of working with ranges, though I can certainly understand if you didn't realize that previously.That makes sense. Thanks for the clarification.So what you mean is to read the front in constructor, and read further parts in the popFront()? that way multiple access to the front won't hurt anything. I think it might work, I'll change my code. So the guideline is: when accessing front is costly, don't use map, use a customized range struct instead. right?There is another problem, map, cache, and joiner don't work when composed multiple times. I've submitted a bug, https://issues.dlang.org/show_bug.cgi?id=15891, can you confirm?Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do.It's not obvious to me why your example is failing to compile - the problem appears to be with cache specifically and has nothing to do with joiner - and I am inclined to agree that there's a bug there (be it in cache or in the compiler), but I really think that using map is a bad move for what you're trying to do anyway - especially when you consider what will happen if opIndex is used. I'd strongly encourage you to just write a range that does what you need instead.OK, hope it'll get fixed. I'll try to look for it once I'm able to understande the code in phobos.- Jonathan M Davis
Apr 07 2016
On Friday, April 08, 2016 02:01:07 Puming via Digitalmars-d-learn wrote:So what you mean is to read the front in constructor, and read further parts in the popFront()? that way multiple access to the front won't hurt anything. I think it might work, I'll change my code. So the guideline is: when accessing front is costly, don't use map, use a customized range struct instead. right?In general, when you're dealing with a non-random access range, it's best for popFront to do the work of setting up front and then have front return the same object every time. If front is doing the work, then if it gets called multiple times, that work is being repeated every time it gets called. map is a funny case, because it can be a random-access range (if the underlying range it's wrapping is a random-access range). So, fundamentally, it doesn't work in map to do the work in popFront. It pretty much has to be done in front. So, doing stuff like range.map!(a => to!string(a))() is problematic in that a new allocation is going to occur every time that front is called - or when any element is accessed via opIndex. It works so long as the element is equal every time, and calling front multiple times does not affect the rest of the range, but it can be costly. In theory, cache should solve that case (and it would result in a range that wasn't random access, so opIndex wouldn't be called on it), but obviously, you're running into problems with it. In any case, in general, when doing something like reading from a file with a range, it works best to do the work in popFront to avoid issues with multiple calls to front, and the constructor needs to do that work as well (be it by calling popFront or not), because front needs to be valid as soon as the range has been created, and it's not empty. So, you end up with something like struct MyRange { public: property T front() { return _value; } property bool empty() { ... } void popFront() { _value = readNextValueFromFile(); } private: this(Something s) { ... popFront(); } T _value; } It also encapsulates things better than having a function whose only purpose is to be used in map, though there are obviously cases where writing a function just to use in map would make sense. In general, I would only use map for cases where I'm converting something to something else and not for functions that do arbitrary work. A function for map that cannot be pure is a danger sign IMHO. Certainly, if you're going to follow how ranges are expected to work, whatever function you give map needs to return equal values every time front is called between calls to popFront, and multiple calls to front cannot affect the rest of the range. And what you did with map, doesn't follow those guidelines, though it probably would if cache worked, and you always fed it into cache. Still, for something like this, I'd just create my own range and be done with it. You often need to anyway in order to manage extra state. And it tends to be more idiomatic, though I suppose that that's somewhat subjective. - Jonathan M Davis
Apr 07 2016
On Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:[...]Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)
Apr 07 2016
On Friday, 8 April 2016 at 03:20:53 UTC, Puming wrote:On Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:What would such a function look like? I don't think such a thing could exist. This is more than just an idiom, IMO. It's a basic principle of ranges that, if not followed, is likely to produce a broken range and/or one whose front is more expensive than it needs to be. The trouble is that it isn't necessarily obvious and is easy to overlook when first implementing a custom range. In Learning D, I used a custom FilteredRange to introduce the concept of ranges. It has a member function called skipNext which does the work of the filtering. It's called once in the constructor to 'prime' the range with the first value that matches the filter, then inside every call to popFront to find the next match. I closed that section with this paragraph: "It might be tempting to take the filtering logic out of the skipNext method and add it to front, which is another way to guarantee that it's performed on every element. Then no work would need to be done in the constructor and popFront would simply become a wrapper for _source.popFront. The problem with that approach is that front can potentially be called multiple times without calling popFront in between, meaning the predicate will be tested on each call. That's unnecessary work. As a general rule, any work that needs to be done inside a range to prepare a front element should happen as a result of calling popFront, leaving front to simply focus on returning the current element." A lazy range should be advanced in the constructor when it needs to be (usually when there is some criterion for an element to be returned from front) and always in popFront, but never in front.[...]Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)
Apr 08 2016
On Friday, 8 April 2016 at 08:44:36 UTC, Mike Parker wrote:On Friday, 8 April 2016 at 03:20:53 UTC, Puming wrote:I thought it was just like map!readNext.cacheOn Friday, 8 April 2016 at 02:49:01 UTC, Jonathan M Davis wrote:What would such a function look like? I don't think such a thing could exist. This is more than just an idiom, IMO. It's a basic principle of ranges that, if not followed, is likely to produce a broken range and/or one whose front is more expensive than it needs to be. The trouble is that it isn't necessarily obvious and is easy to overlook when first implementing a custom range.[...]Thanks. I'll adopt this idiom. Hopefully it gets used often enough to warrent a phobos function :-)[...]
Apr 08 2016
On Friday, 8 April 2016 at 01:14:11 UTC, Jonathan M Davis wrote:[...] Well, given your example, I would strongly argue that you should write a range that calls read in its constructor and in popFront rather (so that calling front multiple times doesn't matter) rather than using map. While map can theoretically be used the way that you're trying to use it, it's really intended for converting an element using rather than doing stuff like I/O in it. Also, if the range that you give map is random access (like an array would be), then opIndex could be used to access random elements, which _really_ wouldn't work with reading from a file. So, I think that map is just plain a bad choice for what you're trying to do.Well, I used map because of when viewing the scenario in a data flow, map seems an intuitive choise: what I have: a bunch of large files, each file containing sections of data, each sections is composed of many lines of record. For each file, I have an list of indices. what I want: given a list of files and indices for each file, I want to construct a lazy stream of records for other program to use. here is the data flow: query constraints -> [(filePath, [index])] -> [(File, [index])] // map, needs cache -> [[section]] // map, needs cache -> [[[record]]] // joiner.joiner -> Range of record And after reading cache's docs, I get that cache is perfect for converting a Range with front side effect into a Range with popFront side effect. So if cache and map works harmoniously, they should do the same trick as manually writing two Ranges here.- Jonathan M Davis
Apr 07 2016
On Thursday, 7 April 2016 at 08:27:23 UTC, Edwin van Leeuwen wrote:On Thursday, 7 April 2016 at 08:17:38 UTC, Puming wrote:There is another problem with cache, that is if I want another level of this map&joiner(which is my code scenario, where I'm reading a bunch of files, with each one I need to read multiple locations with seek and return a bunch of lines with each seek), adding cache will result compiler error: simplified demo: auto read(int a) { writeln("read called!", a); return [0, a]; // second level } auto mkarray(int a) { writeln("mkarray called!", a); return [-a, a].map!(x=>read(x)).cache.joiner; // to avoid calling read twice } void main() { auto xs = [1,2 ,3, 4]; auto r = xs.map!(x=>mkarray(x)).cache.joiner; // to avoid calling mkarray twice writeln(r); } When compiled, I get the error: Error: open path skips field __caches_field_0 source/app.d(19, 36): Error: template instance std.algorithm.iteration.cache!(MapResult!(__lambda1, int[])) error instantiatingOn Thursday, 7 April 2016 at 08:07:12 UTC, Edwin van Leeuwen wrote: OK. Even if it consumes the first two elements, then why does it have to consume them AGAIN when actually used? If the function mkarray has side effects, it could lead to problems.After some testing it seems to get each element twice, calls front on the MapResult twice, on each element. The first two mkarray are both for first element, the second two for the second. You can solve this by caching the front call with: xs.map!(x=>mkarray(x)).cache.joiner;
Apr 07 2016
On Thursday, 7 April 2016 at 09:55:56 UTC, Puming wrote:When compiled, I get the error: Error: open path skips field __caches_field_0 source/app.d(19, 36): Error: template instance std.algorithm.iteration.cache!(MapResult!(__lambda1, int[])) error instantiatingThat seems like a bug to me and you might want to submit it to the bug tracker. Even converting it to an array first does not seem to work: import std.stdio : writeln; import std.algorithm : map, cache, joiner; import std.array : array; auto read(int a) { return [0, a]; // second level } auto mkarray(int a) { return [-a, a].map!(x=>read(x)).cache.joiner; // to avoid calling read twice } void main() { auto xs = [1,2 ,3, 4]; auto r = xs.map!(x=>mkarray(x)).array; // Both lines below should be equal, but second does not compile [[0, -1, 0, 1], [0, -2, 0, 2], [0, -3, 0, 3], [0, -4, 0, 4]].cache.joiner.writeln; r.cache.joiner.writeln; } Above results in following error: /opt/compilers/dmd2/include/std/algorithm/iteration.d(326): Error: one path skips field __caches_field_0 /d617/f62.d(19): Error: template instance std.algorithm.iteration.cache!(Result[]) error instantiating
Apr 07 2016
On Thursday, 7 April 2016 at 10:57:25 UTC, Edwin van Leeuwen wrote:On Thursday, 7 April 2016 at 09:55:56 UTC, Puming wrote:Thanks. I just looked at the joiner code, but didn't find the source of error. I'll submit a bug report.[...]That seems like a bug to me and you might want to submit it to the bug tracker. Even converting it to an array first does not seem to work: [...]
Apr 07 2016