digitalmars.D.bugs - [Issue 9674] New: std.algorithm.filter problems with non-deterministic _input.front
- d-bugmail puremagic.com (148/148) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (29/29) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (8/12) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (55/55) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (7/14) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (8/11) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (10/12) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (7/12) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (17/25) Mar 09 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
- d-bugmail puremagic.com (15/34) Mar 10 2013 http://d.puremagic.com/issues/show_bug.cgi?id=9674
http://d.puremagic.com/issues/show_bug.cgi?id=9674 Summary: std.algorithm.filter problems with non-deterministic _input.front Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc This Python2.6 program: from itertools import imap, ifilter from sys import stdout def fun(x): stdout.write("*") return x list(ifilter(lambda x: True, imap(fun, xrange(5)))) Prints: ***** While this similar D program: import std.stdio: write; import std.array: array; import std.range: iota; import std.algorithm: map, filter; void main() { iota(5).map!((x) { write("*"); return x; }).filter!(_ => true).array; } Prints: ********** This is a problem if the function given to map is not deterministic, or if it's costly to evaluate. Removing both the iota() and the array() from the D code, using a counter, but keeping both map() and filter(): import std.algorithm: map, filter; int count = 0; void main() { auto r1 = [0, 0, 0, 0, 0]; auto r2 = map!((int x){ count++; return 0; })(r1); auto r3 = filter!((int y){ return true; })(r2); foreach (z; r3) {} assert(count == 10); // passes. } To better show what's going on, this is a shortened version of the filter(), it calls pred() only n times, and it calls _input.front more often: struct FilterResult(alias pred, Range) { Range _input; this(Range r) { _input = r; while (!_input.empty && !pred(_input.front)) { _input.popFront(); } } property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); } while (!_input.empty && !pred(_input.front)); } property auto ref front() { return _input.front; } } Calling _input.front more often means it assumes _input.front is _like_ a pure function. But generally in D (and Python) this is not true. I think at least it should be documented in std.algorithm ddocs that filter() assumes the front() of its argument range to act like a pure function (despite there isn't type sytem enforcement of this). But maybe it's better to change std.algorithm.filter. This modified filter() caches _input.front and it works with not deterministic _input.front functions too: import std.stdio, std.range, std.algorithm, std.traits, std.random; struct FilterResult2(alias pred, Range) { import std.traits: ForeachType; Range _input; ForeachType!Range rFront; this(Range r) { _input = r; while (!_input.empty) { this.rFront = _input.front; if (pred(this.rFront)) break; _input.popFront(); } } property bool empty() { return _input.empty; } void popFront() { do { _input.popFront(); if (_input.empty) break; this.rFront = _input.front; if (pred(this.rFront)) break; } while (true); } property auto ref front() { return this.rFront; } } FilterResult2!(pred, Range) filter2(alias pred, Range)(Range rs) { return FilterResult2!(pred, Range)(rs); } // Follows demo code------ auto distinct(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } /** This is a useful function, it yields lazily the distinct items from the input iterable. It is similar to uniq, but doesn't need the input range to be sorted. */ auto distinct2(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter2!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 100.iota.map!(_ => uniform(0, 10)) .distinct.array.sort.writeln; // Bad output. 100.iota.map!(_ => uniform(0, 10)) .distinct2.array.sort.writeln; // Good output. } What are the downsides of replacing std.algorithm.filter() with something similar to this filter2() (with the missing methods, Unqual, etc)? This topic was discussed at length in this thread: http://forum.dlang.org/thread/fcbtejylrwxfploajuto forum.dlang.org -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674 timon.gehr gmx.ch changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timon.gehr gmx.ch A solution that does not require caching front is the following: struct Filter(alias a,R){ private R source; // ... private bool processed = false; void process(){ if(!processed) while(!source.empty && !a(source.front)) source.popFront(); processed = true; } property auto ref front(){ process(); return source.front; } property bool empty(){ process(); return source.empty; } void popFront(){ process(); source.popFront(); } } It also fixes the problem of filter not being properly lazy (i.e. currently the function call might loop forever, even if no element is actually requested later.) Of course, this might be slower than the current solution in case the compiler is unable to properly track the 'processed' field value. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674A solution that does not require caching front is the following: ... void popFront(){ process(); source.popFront(); }This should obviously reset the flag: void popFront(){ process(); source.popFront(); processed=false; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674 I am trying your code, but I see duplications in the output: import std.stdio: writeln; import std.algorithm: map, sort; import std.array: array; import std.range: iota, isInputRange; import std.random: uniform; import std.traits: ForeachType; struct FilterResult3(alias pred, Range) { private Range _input; this(Range r) { _input = r; } private bool processed = false; void process() { if (!processed) while (!_input.empty && !pred(_input.front)) _input.popFront(); processed = true; } property auto ref front() { process(); return _input.front; } property bool empty() { process(); return _input.empty; } void popFront() { process(); _input.popFront(); processed = false; } } FilterResult3!(pred, Range) filter3(alias pred, Range)(Range rs) { return FilterResult3!(pred, Range)(rs); } auto distinct3(Range)(Range r) if (isInputRange!Range) { bool[ForeachType!Range] mySet; return r.filter3!((k) { if (k in mySet) return false; mySet[k] = true; return true; }); } void main() { 100.iota.map!(_ => uniform(0, 10)) .distinct3.array.sort.writeln; // } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674I am trying your code, but I see duplications in the output: ... void main() { 100.iota.map!(_ => uniform(0, 10)) .distinct3.array.sort.writeln; // }The code depends on front of the source range being consistent across calls. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674... The code depends on front of the source range being consistent across calls.(If that is not what is wanted, a lot of caching would have to be introduced into most std.algorithm/std.range ranges.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674The code depends on front of the source range being consistent across calls.front() is not tagged with "pure", so you can't depend on that.(If that is not what is wanted, a lot of caching would have to be introducedinto most std.algorithm/std.range ranges.) Otherwise I think the "pure" attribute needs be added in many of those Phobos places. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674How would that help? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------The code depends on front of the source range being consistent across calls.front() is not tagged with "pure", ...
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674 Marco Leise <Marco.Leise gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Marco.Leise gmx.dePure _does_ make it consistent across calls. That's how it helps. :) And arguably a range's front property - in theory - fits that shoe. As the user of a range I'd typically expect front to be a dumb, pure property. In this case it is incorrect to call map with a mapping function that accesses global state for example. Or front has to cache the last result, so it doesn't need to ever be recalculated. But then practically every "front" property should be cached and it starts to look like a design flaw. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------How would that help?The code depends on front of the source range being consistent across calls.front() is not tagged with "pure", ...
Mar 09 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9674Actually it does not. :) struct R{ int x; property int front()pure{ return x++; } } I guess you want const pure. (but there are still type system holes.)Pure _does_ make it consistent across calls. That's how it helps. :)How would that help?The code depends on front of the source range being consistent across calls.front() is not tagged with "pure", ...And arguably a range's front property - in theory - fits that shoe. As the user of a range I'd typically expect front to be a dumb, pure property. In this case it is incorrect to call map with a mapping function that accesses global state for example.One issue is that recalculation may be expensive, or it might be cheap, or even removed by CSE.Or front has to cache the last result, so it doesn't need to ever be recalculated. But then practically every "front" property should be cached and it starts to look like a design flaw.Agreed, it is not too clear what to do about this. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2013