www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Truly lazy ranges, transient .front, and std.range.Generator

reply Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
Hello all,

One of the design considerations I've been mulling over recently, in terms of 
random number generation functionality, is that while we talk of ranges being 
lazily evaluated, in fact this isn't strictly true.

Most ranges are in fact a mix of lazy and eager: lazy in their use of popFront, 
but eager in that the current value of .front is always pre-calculated, usually 
starting with the first value being calculated in the constructor.  This is
fine 
for many ranges, whose outcome is in any case deterministic, but it's not 
appropriate for many other cases, where the appropriate moment of evaluation of 
'front' is arguably _the moment where front is called_.

This kind of requirement clearly holds anywhere where the values of the range 
depend on some kind of external input, and that arguably should include things 
like sources of randomness (even if they are, under the hood, only
pseudo-random).

One possible support for this is given by std.range.Generator, which implements 
a very simple wrapper of a callable entity via the following range primitives:

     enum empty = false;

     auto ref front()  property
     {
         return fun();
     }

     void popFront() { }

Now, first, I should say that I think this is a good example of how the 
classification and concept of ranges is incomplete -- there is a case for an 
even simpler range than InputRange, one which _just_ has .empty and .front 
defined, and which we might call a TransientRange -- but second, it's 
problematic, inasmuch as it's an incomplete solution to the problem of truly 
lazy ranges.

In some cases we're going to want true laziness of evaluation, but _not_ 
transience of .front.  In these cases, the _first_ time .front is called, its 
value should be freshly evaluated; but thereafter, the value is cached, until 
.popFront() is called, at which point .front will be re-evaluated lazily the 
next time it's called.  Something like std.algorithm.cache is inappropriate 
here, precisely because it's eager in its calculation of the cached values.

As far as I can see, we don't have a valid solution for this case.  We have
some 
functionality in std.random -- e.g. randomCover and randomSample -- which 
implements rather shaky workarounds for that lack.

I can't imagine this isn't a known case/problem, but as far as I can see any 
discussion of it has been more on GitHub than here.  So before rushing off with 
Yet Another Custom Solution, I thought I'd raise the issue here, to ask what
the 
current state of thought is on these kinds of challenge, and what may be 
upcoming in people's roadmaps.

Thanks & best wishes,

     -- Joe
Aug 15 2015
next sibling parent reply "Alex Parrill" <initrd.gz gmail.com> writes:
On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton 
Wakeling wrote:
 ...
I had this issue recently when reading from a command-line-style TCP connection; I needed to read the line up to the \n separator, but consuming the separator meant waiting for the next byte that would never arrive unless a new command was sent. So I made a wrapper range that evaluates the wrapped range's popFront only when front/empty is first called ("just in time"). Source code here: https://gist.github.com/ColonelThirtyTwo/0dfe76520efcda02d848 You can throw it in a UFCS chain anywhere except (for some reason) after something that takes a delegate template parameter like map. For example: auto reader = SocketReader(socket).joiner.jitRange.map!(byt => cast(char) byt);
Aug 16 2015
parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 17/08/15 00:33, Alex Parrill via Digitalmars-d wrote:
 On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote:
 ...
I had this issue recently when reading from a command-line-style TCP connection; I needed to read the line up to the \n separator, but consuming the separator meant waiting for the next byte that would never arrive unless a new command was sent. So I made a wrapper range that evaluates the wrapped range's popFront only when front/empty is first called ("just in time"). Source code here: https://gist.github.com/ColonelThirtyTwo/0dfe76520efcda02d848 You can throw it in a UFCS chain anywhere except (for some reason) after something that takes a delegate template parameter like map. For example: auto reader = SocketReader(socket).joiner.jitRange.map!(byt => cast(char) byt);
Interesting, thanks for sharing that. I'm not at all surprised that an IO-based range should have similar issues to the one I describe; a comparable use-case I was thinking of was reading bytes from /dev/urandom. My own concern was whether there was any sort of generic functionality for constructing lazy-front ranges like this; I'll share more on my own approach replying to Harald Zealot below.
Aug 18 2015
prev sibling parent reply "HaraldZealot" <harald_zealot tut.by> writes:
On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton 
Wakeling wrote:
 ...

 In some cases we're going to want true laziness of evaluation, 
 but _not_ transience of .front.  In these cases, the _first_ 
 time .front is called, its value should be freshly evaluated; 
 but thereafter, the value is cached, until .popFront() is 
 called, at which point .front will be re-evaluated lazily the 
 next time it's called.  Something like std.algorithm.cache is 
 inappropriate here, precisely because it's eager in its 
 calculation of the cached values.

 ...

     -- Joe
Do you mean something like that: ```d struct Range { public: enum empty = false; auto ref front() property { if(mustBeEvaluated) { cache = fun(); mustBeEvaluated = false; } return cache; } void popFront() { mustBeEvaluated = true; } private: ReturnType!fun cache; bool mustBeEvaluated = true; } ``` ?
Aug 18 2015
parent reply Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 18/08/15 10:59, HaraldZealot via Digitalmars-d wrote:
 On Saturday, 15 August 2015 at 10:06:13 UTC, Joseph Rushton Wakeling wrote:
 ...

 In some cases we're going to want true laziness of evaluation, but _not_
 transience of .front.  In these cases, the _first_ time .front is called, its
 value should be freshly evaluated; but thereafter, the value is cached, until
 .popFront() is called, at which point .front will be re-evaluated lazily the
 next time it's called.  Something like std.algorithm.cache is inappropriate
 here, precisely because it's eager in its calculation of the cached values.

 ...

     -- Joe
Do you mean something like that:
Yes, broadly like your example, although note that your version won't handle multiple popFront() calls in succession without any .front call in-between. My own approach, which I knocked up earlier this month, was to create a generic mixin template that could be used to construct a lazy input range: // --------------------------------------------------------- mixin template LazyGen(T) { T front__; bool evaluated__ = false; public: enum bool empty = false; T front() property { if (!this.evaluated__) { this.front__ = this.evaluateFront__(); this.evaluated__ = true; } return this.front__; } void popFront() { if (this.evaluated__) { this.evaluated__ = false; } else { this.front__ = this.evaluateFront__(); } assert(!this.evaluated__); } } // --------------------------------------------------------- ... which could be mixin'd to any struct or class defining an evaluateFront__ method; although I think in retrospect I might rewrite that a bit to be more generic, not making assumptions about persistent values or the .empty condition: // --------------------------------------------------------- mixin template LazyPopFront() { private bool evaluated__ = false; public T front() property { if (!this.evaluated__) { this.popFront__(); // the 'true', private popFront this.evaluated__ = true; } return this.front__(); // private property method } public void popFront() { if (this.evaluated__) { this.evaluated__ = false; } else { this.popFront__(); } assert(!this.evaluated__); } } // --------------------------------------------------------- There are probably some other subtleties that need to be taken into account here, but broadly I think the above covers the fundamental requirements for a range whose front is truly lazily evaluated. Anyway, I'll try and follow up in the not-too-distant future with some more concrete example of how this can be used for some interesting stuff with RNGs. It would be interesting to know if the above solves any use-cases for anyone else, too.
Aug 18 2015
parent reply "HaraldZealot" <harald_zealot tut.by> writes:
On Tuesday, 18 August 2015 at 19:37:47 UTC, Joseph Rushton 
Wakeling wrote:
 Yes, broadly like your example, although note that your version 
 won't handle multiple popFront() calls in succession without 
 any .front call in-between.
Good point! And you show acceptable solution. The last your example I prefer to parametrize with alias to functions: ```d mixin template LazyPopFront(alias frontImplementation, alias popFrontImplementation) ... ``` It gives more flexibility. One that worries me, if someone start to make odd things like create shared instance of this range.
Aug 18 2015
parent Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 18/08/15 22:47, HaraldZealot via Digitalmars-d wrote:
 The last your example I prefer to parametrize with alias to functions:

 ```d
 mixin template LazyPopFront(alias frontImplementation, alias
 popFrontImplementation)
 ...
 ```
 It gives more flexibility.
Yea, good call.
 One that worries me, if someone start to make odd things like create shared
 instance of this range.
What kind of issues did you have in mind?
Aug 18 2015