digitalmars.D.learn - Input ranges
- "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (53/53) Apr 18 2015 It seems input ranges without any indirection in memory are not
- anonymous (12/59) Apr 19 2015 Yeah, byLine is dangerous. byLineCopy should probably have been
- "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (17/25) Apr 19 2015 I agree. Yet I am convinced most (all?) proper input ranges read
- anonymous (22/35) Apr 19 2015 All ranges are input ranges, though. Input ranges are the least
- "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (12/27) Apr 20 2015 Yes and no. It is reasonable to provide special implementations
It seems input ranges without any indirection in memory are not working well with algorithms. This seems to be understood by the D community. I did not know. Here is my story on the topic so far: Recently, I learned that I did not know input ranges much at all, totally misjudging std.range.refRange in its usefulness to input ranges: https://github.com/D-Programming-Language/phobos/pull/3123 At this point some experiments might be in order. (using 2.067.0) Input ranges from std.stdio are used for reading files. So assuming we create a file auto f = File("test.txt", "w"); f.writeln(iota(5).map!(a => repeat(to!string(a), 4)).joiner.joiner("\n")); f.close(); We should be able groupBy (chunkBy) its lines: writeln(File("test.txt").byLine.groupBy!((a,b) => a == b)); The result is just one group, that is all lines are considered equal: [["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4"]] Alas, byLine reuses the same buffer for each line and thus groupBy keeps comparing each line with itself. There is a version of byLine that makes copies: writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a == b)); Indeed, the result is as expected: [["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2", "2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]] A final test with the undocumented byRecord method (the mapping after groupBy is for beauty only and does not change the result): writeln(File("test.txt") .byRecord!string("%s") .groupBy!((a,b) => a == b) .map!(map!(a => a[0]))); Here, the result is most peculiar: [["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"], ["3", "3", "3"], ["4", "4", "4"]] Is byRecord broken? (It is undocumented after all.) In a way, because it does not contain any indirection. The current fields tuple is a simple member of the ByRecord struct. In contrast, the ByLineCopy struct is just a wrapper to a ref counted ByLineCopyImpl struct with a simple note: /* Ref-counting stops the source range's ByLineCopyImpl * from getting out of sync after the range is copied, e.g. * when accessing range.front, then using std.range.take, * then accessing range.front again. */ I am uncomfortable at this point. Simple and efficient input ranges fail in unexpected ways. Internal indirections make all the difference. It feels like input ranges are hiding something that should not be hidden. What am I missing?
Apr 18 2015
On Saturday, 18 April 2015 at 22:01:56 UTC, Ulrich Küttler wrote:Input ranges from std.stdio are used for reading files. So assuming we create a file auto f = File("test.txt", "w"); f.writeln(iota(5).map!(a => repeat(to!string(a), 4)).joiner.joiner("\n")); f.close(); We should be able groupBy (chunkBy) its lines: writeln(File("test.txt").byLine.groupBy!((a,b) => a == b)); The result is just one group, that is all lines are considered equal: [["0", "0", "0", "0", "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4"]] Alas, byLine reuses the same buffer for each line and thus groupBy keeps comparing each line with itself. There is a version of byLine that makes copies: writeln(File("test.txt").byLineCopy.groupBy!((a,b) => a == b)); Indeed, the result is as expected: [["0", "0", "0", "0"], ["1", "1", "1", "1"], ["2", "2", "2", "2"], ["3", "3", "3", "3"], ["4", "4", "4", "4"]]Yeah, byLine is dangerous. byLineCopy should probably have been the default. Maybe we should rename byLine to byLineNoCopy (doing the proper deprecation dance, of course).A final test with the undocumented byRecord method (the mapping after groupBy is for beauty only and does not change the result): writeln(File("test.txt") .byRecord!string("%s") .groupBy!((a,b) => a == b) .map!(map!(a => a[0]))); Here, the result is most peculiar: [["0", "0", "0", "0"], ["1", "1", "1"], ["2", "2", "2"], ["3", "3", "3"], ["4", "4", "4"]] Is byRecord broken? (It is undocumented after all.) In a way, because it does not contain any indirection. The current fields tuple is a simple member of the ByRecord struct. In contrast, the ByLineCopy struct is just a wrapper to a ref counted ByLineCopyImpl struct with a simple note: /* Ref-counting stops the source range's ByLineCopyImpl * from getting out of sync after the range is copied, e.g. * when accessing range.front, then using std.range.take, * then accessing range.front again. */ I am uncomfortable at this point. Simple and efficient input ranges fail in unexpected ways. Internal indirections make all the difference. It feels like input ranges are hiding something that should not be hidden. What am I missing?I guess the problem is the mix of value and reference semantics. ByRecord's `current` is a value, but its `file` has reference semantics. So, a copy of a ByRecord affects one part of the original but not the other. Maybe copying should be ` disable`d for such ranges/structs. Then you couldn't pass it by value to groupBy. Instead you would have to use something like (the fixed version of) refRange, which works properly.
Apr 19 2015
On Sunday, 19 April 2015 at 11:33:26 UTC, anonymous wrote:I guess the problem is the mix of value and reference semantics. ByRecord's `current` is a value, but its `file` has reference semantics. So, a copy of a ByRecord affects one part of the original but not the other.I agree. Yet I am convinced most (all?) proper input ranges read input from an external source. (Reference semantic right there.) Input ranges are one-pass ranges after all. Therefore, reference semantics are required in any case (unless the use of the range is known beforehand.) groupBy is a nice example as it laboriously adds reference semantics to forward ranges but assumes input ranges to posses reference semantics by themselves.Maybe copying should be ` disable`d for such ranges/structs. Then you couldn't pass it by value to groupBy. Instead you would have to use something like (the fixed version of) refRange, which works properly.Again, I agree. Disallow copying for such ranges would prevent the error, still it would be a bit harsh. It is not just groupBy that goes astray. You can also get strange results from take: auto r = File("test.txt").byRecord!string("%s"); foreach (i; 0 .. 5) writeln(r.take(4).map!(a => a[0])); I was hoping to find some communication how input ranges should be done. At this point the solution in byLineCopy feels ad hoc.
Apr 19 2015
On Sunday, 19 April 2015 at 21:42:23 UTC, Ulrich Küttler wrote:groupBy is a nice example as it laboriously adds reference semantics to forward ranges but assumes input ranges to posses reference semantics by themselves.All ranges are input ranges, though. Input ranges are the least specialised category. I think it's a mistake to assume/require anything only for input ranges that are not forward ranges. I guess we could require reference semantics for all input ranges (including forward ranges and higher-ups). That would be a rather clean way out of this mess. But it would be a major effort. And I guess it would hurt performance, maybe a lot. [...]Again, I agree. Disallow copying for such ranges would prevent the error, still it would be a bit harsh. It is not just groupBy that goes astray. You can also get strange results from take: auto r = File("test.txt").byRecord!string("%s"); foreach (i; 0 .. 5) writeln(r.take(4).map!(a => a[0]));That would also not be possible if ByRecord had an ` disable this(this);`. But I'm not at all sure that this is the way to go. It's bound to be forgotten, and there are probably places where it's needlessly restrictive.I was hoping to find some communication how input ranges should be done.I'm right there with you. This is all a bit iffy. I suspect that this is an oversight in the design of ranges, as the documentation of std.range doesn't say anything on the topic. This may be too advanced for D.learn. I don't think Andrei and Walter come down here very often. Maybe take it to the general board.At this point the solution in byLineCopy feels ad hoc.I think byLineCopy solves a different problem. ByLine already has this: https://github.com/D-Programming-Language/phobos/blob/v2.067.0/std/stdio.d#L1592-L1598
Apr 19 2015
On Sunday, 19 April 2015 at 23:49:08 UTC, anonymous wrote:On Sunday, 19 April 2015 at 21:42:23 UTC, Ulrich Küttler wrote:Yes and no. It is reasonable to provide special implementations for ranges that are just input ranges. This is what groupBy does. The user gets a function that works with all kinds of ranges.groupBy is a nice example as it laboriously adds reference semantics to forward ranges but assumes input ranges to posses reference semantics by themselves.All ranges are input ranges, though. Input ranges are the least specialised category. I think it's a mistake to assume/require anything only for input ranges that are not forward ranges.I guess we could require reference semantics for all input ranges (including forward ranges and higher-ups). That would be a rather clean way out of this mess. But it would be a major effort. And I guess it would hurt performance, maybe a lot.Definitely, general reference semantics would solve a ton of difficulty. However, this would not be D anymore. This seems to be the catch: For optimal performance memory layout is important. You cannot hide it behind an interface.The same indirection in byLineCopy. This is what I was referring to: https://github.com/D-Programming-Language/phobos/blob/v2.067.0/std/stdio.d#L1783-L1789 Whoever did that knew very well what is going on.At this point the solution in byLineCopy feels ad hoc.I think byLineCopy solves a different problem. ByLine already has this: https://github.com/D-Programming-Language/phobos/blob/v2.067.0/std/stdio.d#L1592-L1598
Apr 20 2015