www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ranges again

Construction and Initialisation
===============================

As previously discussed, it's annoying having the conflation of 
construction and initialisation. It leads to two options:

1) construction is initialisation. This is nasty for some ranges 
where the range is truly destructive, amongst a whole load of 
other problems.

2) lazy initialisation. This is annoying to implement (hello 
bugs) and potentially slow.


How about a convention to overcome this:


A new, optional, range primitive named initialize or prime is 
introduced that does the initialisation.

Constructors are not guaranteed to initialise a range if an 
initialize/prime member is present.

factory functions (e.g. std.range.iota for std.range.Iota) call 
the constructor, followed by initialise/prime if present, unless 
explicitly documented otherwise.

All entities that accept a range expect that range to be 
pre-initialised unless explicitly documented otherwise.


What we do:

Document it. Publicise it.

Change all phobos ranges that are implemented as private 
aggregates (struct/class) with public factory functions to the 
new convention. Note that many will not need any change at all as 
they have no nasty initialisation needs. This is a non-breaking 
change.

Consider changing some phobos ranges with public aggregates, 
where the improvement is sufficient to justify the potential 
breakage for people using the raw constructors.

Nothing else. All current code with it's various initialisation 
conventions will function exactly as before, any breakage is 
strictly controlled on a case-by-case basis for each individual 
range type.


Open Questions:
Does save always return an initialised range?


Iteration Protocol
==================

Proposal:

A) The   !empty -> front (as many times as you like*) -> popFront 
-> repeat   sequence is used for all ranges.

B) empty and front must appear to do what they say they do** when 
used according to the above, but may internally do whatever they 
want. One exception: Consecutive calls to front are *not* 
guaranteed to return the same value, as this doesn't play well 
with random access ranges, in particular std.algorithm.map.***

C) as a consequence of B: empty and front are both independently 
optional. Of course it is still an error to call front or 
popFront on an empty range.

D) Ranges are not guaranteed to be internally buffered (e.g. see 
caveat about front in B), but many will be for performance and/or 
correctness (see WRT streams below). Inevitably some sort of 
caching wrapper and a byEfficientChunk primitive are likely to 
emerge, whether standardised or not.

* not deliberately by design, more just as an accidental 
side-effect of B.
** non-destructive, does not advance the range, does what it says 
on the tin.
*** Also, a range might work with data that is being mutated 
while the range is iterating/being indexed/sliced.

WRT streams:

empty for streams requires (attempting to) read from the stream, 
which is in turn destructive. Thus, implementations of streams as 
ranges *must* be buffered, to preserve the illusion of empty 
being a simple non-destructive check. My understanding is that 
they will always be buffered on some level anyway for performance.


General notes on everything above
=================================
Ranges are not the solution to every possible iteration problem. 
They are not a panacea. I believe that attempts to overly 
generalise them will cause more harm than good. With some 
sensible conventions we can have a really great tool, but there 
will always be some situations where a different approach will be 
more performant, intuitive or safer. Yes, that means you 
sometimes can't use std.algorithm and std.range, which is sad, 
but you can't have everything.

Those who want to really squeeze every last bit of performance 
out of ranges or make use of non-conforming range-like objects 
can use UDAs to tag various properties of their types, and then 
have their generic algorithms special-case on those attributes 
for performance and/or correctness. This enables the concept of 
ranges to be extended without overly straining the core mechanics.
There might even be a place for some standardised UDAs in 
std.range and some special-casing in std.algorithm/range.

However we proceed, we need an "I don't satisfy this range 
requirement even though it looks like I do" indicator. I suggest 
using a UDA, introduced as standard (in std.range, not the 
language). E.g.  (std.range.notRandomAccessRange) myRange { ... } 
would fail isRandomAccessRange even if it contains the necessary 
primitives.



P.S.
Most of the parts of this have probably been proposed by other 
people before, which I've probably read and forgotten about. No 
originality claimed here :)
Apr 16 2014