digitalmars.D - output ranges: by ref or by value?

Andrei Alexandrescu (23/23) Dec 31 2009 Consider:

Michel Fortin (21/31) Dec 31 2009 I think modeling built-in arrays is the way to go as it makes less

Philippe Sigaud (28/48) Dec 31 2009 I agree. And arrays may well be the most used range anyway.

Andrei Alexandrescu (43/105) Jan 01 2010 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays

Michel Fortin (23/27) Jan 01 2010 I agree and disagree. I wasn't proposing that ranges support ~=, and I
Jason House (9/36) Jan 01 2010 I worry about a growing level of convention with ranges. Another recent...

Andrei Alexandrescu (11/48) Jan 01 2010 I am implementing right now a change in the range interface mentioned in...

Jason House (3/58) Jan 01 2010 Or if they completely forgot that bit of convention and omit creating a ...

Michel Fortin (19/34) Jan 01 2010 I don't think this will be a problem for the optimizer.

Andrei Alexandrescu (4/42) Jan 01 2010 save() is not only for classes. It also distinguishes input ranges from

Michel Fortin (16/19) Jan 01 2010 Right. I still maintain that it's a bad approach. I've written a lot of

Andrei Alexandrescu (4/22) Jan 01 2010 I tried that, it makes input ranges next to unusable. save() is an

Michel Fortin (20/29) Jan 02 2010 I think I can see why. You can't have ref member and local variables

Andrei Alexandrescu (7/40) Jan 02 2010 That's an idea, and names are powerful, but I think it's reasonable to

Michel Fortin (21/26) Jan 02 2010 I'm not expecting a miracle from it either, it'd just be much less confu...

Andrei Alexandrescu (32/97) Jan 01 2010 It may be best to discuss this on an example:

Rainer Deyke (10/14) Jan 01 2010 Require that all ranges are structs. If you want to implement a range

Andrei Alexandrescu (4/14) Jan 01 2010 That's a good idea, but it doesn't work with covariant return types.
Andrei Alexandrescu (3/17) Jan 01 2010 Oh, besides it doesn't work for struct ranges that iterate one-pass stre...

Steven Schveighoffer (4/21) Jan 02 2010 What does save do in those cases?

Andrei Alexandrescu (21/46) Jan 02 2010 It provides a syntactic differentiation between input ranges and forward...

Steven Schveighoffer (13/59) Jan 02 2010 Would it not be as useful then to just define attributes on the type and...

Andrei Alexandrescu (10/74) Jan 02 2010 If you have an enum that says a range's state can be saved, then you

Steven Schveighoffer (43/63) Jan 02 2010 The function already exists -- opAssign. My point is that save would ju...

Andrei Alexandrescu (38/109) Jan 02 2010 You might mean this(this). Either doesn't help because you'd still be

Steven Schveighoffer (55/91) Jan 02 2010 Yes, that's what I meant :) I haven't gotten used to struct constructor...

Michel Fortin (27/43) Jan 02 2010 This is making me think...

Andrei Alexandrescu (6/50) Jan 02 2010 These are implementation matters. save() could lazily save information,

Michel Fortin (31/60) Jan 03 2010 The idea is that you would use save and restore like this:

Andrei Alexandrescu (40/102) Jan 02 2010 Why gratuitously limit the design? You're asking to replace this:

Steven Schveighoffer (62/160) Jan 03 2010 Well, one thing you could do is:

Andrei Alexandrescu (52/221) Jan 03 2010 isForwardRange will be defined to yield true if and only if the range

Steven Schveighoffer (128/281) Jan 03 2010 The answer is, you don't. Ranges via interfaces are slower than primiti...

Andrei Alexandrescu (93/283) Jan 03 2010 I understand we don't agree on this. To me, making containers work with

Steven Schveighoffer (34/86) Jan 03 2010 We are arguing in circles, so I will just stop :)

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Consider:

R2 copy(R1, R2)(R1 src, R2 tgt) {
    foreach (ref e; src) tgt.put(e);
    return tgt;
}

Currently output ranges are supposed to define only the put() method. 
However, you also want to copy using regular ranges as a target, hence 
the shim:

// pseudo-method
void put(R, E)(ref R tgt, E e) {
    tgt.front = e;
    tgt.popFront();
}

Now copying ranges is possible even when they don't define put().

An example of "pure" output range is ArrayAppender. It doesn't define 
anything interesting except put.

The question of this post is the following: should output ranges be 
passed by value or by reference? ArrayAppender uses an extra indirection 
to work properly when passed by value. But if we want to model built-in 
arrays' operator ~=, we'd need to request that all output ranges be 
passed by reference.

Any ideas - please share.


Andrei

Dec 31 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The question of this post is the following: should output ranges be 
 passed by value or by reference? ArrayAppender uses an extra 
 indirection to work properly when passed by value. But if we want to 
 model built-in arrays' operator ~=, we'd need to request that all 
 output ranges be passed by reference.

I think modeling built-in arrays is the way to go as it makes less 
things to learn. In fact, it makes it easier to learn ranges because 
you can begin by learning arrays, then transpose this knowledge to 
ranges which are more abstract and harder to grasp.

Beside, an extra indirection is wasteful when you don't need it. It's 
easier to add a new layer of indirection when you need one than the 
reverse, so the primitive shouldn't require any indirection.


 // pseudo-method
 void put(R, E)(ref R tgt, E e) {
     tgt.front = e;
     tgt.popFront();
 }

I like that because it works especially well with arrays. Here's what 
I'm thinking of:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 31 2009

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com>wrote:

 On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <
 SeeWebsiteForEmail erdani.org> said:

  The question of this post is the following: should output ranges be passed
 by value or by reference? ArrayAppender uses an extra indirection to work
 properly when passed by value. But if we want to model built-in arrays'
 operator ~=, we'd need to request that all output ranges be passed by
 reference.

 I think modeling built-in arrays is the way to go as it makes less things
 to learn. In fact, it makes it easier to learn ranges because you can begin
 by learning arrays, then transpose this knowledge to ranges which are more
 abstract and harder to grasp.

I agree. And arrays may well be the most used range anyway.

Beside, an extra indirection is wasteful when you don't need it. It's easier
 to add a new layer of indirection when you need one than the reverse, so the
 primitive shouldn't require any indirection.


So (squint through sleep-deprived eyes:) that makes it by ref, right?



  // pseudo-method
 void put(R, E)(ref R tgt, E e) {
    tgt.front = e;
    tgt.popFront();
 }


A few random comments, sorry if they are not entirely coherent:

- this new put needs hasAssignableElements!R, right? What's in this case the
difference between isOutputRange!R and hasAssignableElements!R?

- should all higher-order ranges expose a put method if possible? (stride
comes to mind, but also take or filter).

- does that play nice with the new auto ref / ref template parameter from
2.038? It seems to me that this new feature will go hand in hand with this,
but I may be mistaken.

- your shim method will be used like this:

put(r,e);

whereas for an output range you use:

r.put(e);

and you cannot freely go from one form to the other, except for arrays,
which are output ranges anyway [*]. Does that mean that you must disseminate
static if ByRef/Assignable/Output/Whatever checks everywhere, to use either
put(r,e) or r.put(e)?

- what if R is a range of ranges (ie: if E is itself a range). Should put by
invoked recursively? What if its a chunked range?

- something I wanted to ask for a long time: does put really write to the
range as written in the docs or to the underlying container for which the
output range is but a 'writable' view? The container/range separation does
not exist for arrays, but is important for other cases.


  Philippe

[*] except if this transformation rule is implemented now?

Dec 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com 
 <mailto:michel.fortin michelf.com>> wrote:
 
     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:
 
         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.
 
 
     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
motivated by practical necessity. I don't want to propagate that quirk 
into ranges. The best output range is one that works properly when 
passed by value.

     Beside, an extra indirection is wasteful when you don't need it.
     It's easier to add a new layer of indirection when you need one than
     the reverse, so the primitive shouldn't require any indirection.
 
 
 So (squint through sleep-deprived eyes:) that makes it by ref, right?
  
 
 
         // pseudo-method
         void put(R, E)(ref R tgt, E e) {
            tgt.front = e;
            tgt.popFront();
         }

It doesn't. The ref in there is to pass tgt to the pseudo-method put, 
not to the function that invokes it.

 A few random comments, sorry if they are not entirely coherent:
 
 - this new put needs hasAssignableElements!R, right? What's in this case 
 the difference between isOutputRange!R and hasAssignableElements!R?

It's a good question. There are two possible designs:

1. In the current design, the difference is that hasAssignableElements!R 
does not imply the range may grow. Consider this:

auto a = new int[10], b = new int[10];
copy(a, b);

This should work. But this shouldn't:

auto a = new int[10], b = new int[5];
copy(a, b);

because copy does not grow the target. If you want to append to b, you 
write:

copy(a, appender(&b));

2. In the design sketched in 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357, 
iteration is separated from access. In that approach, you'd have a 
one-pass range for both input and output.

I'm not sure which design is better. (Suggestions are welcome.) For a 
pure output range, it's awkward to define empty() (what should it 
return?) and it's also awkward to put elements by calling two functions 
front/popFront instead of one.

 - should all higher-order ranges expose a put method if possible? 
 (stride comes to mind, but also take or filter).

I don't think so. The generic put will take care of that.

 - does that play nice with the new auto ref / ref template parameter 
 from 2.038? It seems to me that this new feature will go hand in hand 
 with this, but I may be mistaken.

There's no obvious connection. The nice thing about auto ref is this:

struct SomeAdapterRange(R) if (isWhateverRange!R) {
    private R src;
     property auto ref front() {
       return src.front;
    }
}

You don't want to see how that looks today. Actually:

http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d

Search the page for "mixin" :o}.

 - your shim method will be used like this:
 
 put(r,e);
 
 whereas for an output range you use:
 
 r.put(e);
 
 and you cannot freely go from one form to the other, except for arrays, 
 which are output ranges anyway [*]. Does that mean that you must 
 disseminate static if ByRef/Assignable/Output/Whatever checks 
 everywhere, to use either put(r,e) or r.put(e)?

The D language automatically rewrites the latter into the former.

 - what if R is a range of ranges (ie: if E is itself a range). Should 
 put by invoked recursively? What if its a chunked range?

I don't know.

 - something I wanted to ask for a long time: does put really write to 
 the range as written in the docs or to the underlying container for 
 which the output range is but a 'writable' view? The container/range 
 separation does not exist for arrays, but is important for other cases.

Depends on how the range is defined. Appender holds a pointer to an 
array and appends to it. But appender is a special-purpose range. A 
usual range cannot change the topology of the container it's under.


Andrei

Jan 01 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-01 09:47:58 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays 
 motivated by practical necessity. I don't want to propagate that quirk 
 into ranges. The best output range is one that works properly when 
 passed by value.

I agree and disagree. I wasn't proposing that ranges support ~=, and I 
don't thing it'd be a good idea, so I agree with you here.

But I still believe output ranges should behave like arrays. I was 
proposing that you model output ranges after buffers. A stream then 
becomes a buffer of infinite length.

Look at this example:

	char[10] buffer;
	char[] remainingSpace = buffer[];
	while (!remainingSpace.empty)
		remainingSpace.put(getc());

	// now buffer is full
	writeln(buffer);

Now rename "remainingSpace" for "outputStream" and it works fine, 
except for two things: "empty" sounds strange, and an output stream is 
never empty of remaining space: it's infinite length.

But conceptually, a buffer and a stream are almost the same, one having 
finite capacity while the other is infinite.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 01 2010

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu wrote:

 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:
 
     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:
 
         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.
 
 
     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

 
 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
 motivated by practical necessity. I don't want to propagate that quirk
 into ranges. The best output range is one that works properly when
 passed by value.

I worry about a growing level of convention with ranges.  Another recent 
range thread discussed the need to call consume after a successful call to 
startsWith.  If I violated convention and had a range class, things would 
fail miserably.  There would be no need to consume after a successful call 
to startsWith and the range would have a random number of elements removed 
on an unsuccessful call to startsWith. I'm pretty sure that early 
discussions of ranges claimed that they could be either structs and classes, 
but in practice that isn't the case.

Jan 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:

     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:

         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.


     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.


 I agree. And arrays may well be the most used range anyway.

 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
 motivated by practical necessity. I don't want to propagate that quirk
 into ranges. The best output range is one that works properly when
 passed by value.

 
 I worry about a growing level of convention with ranges.  Another recent 
 range thread discussed the need to call consume after a successful call to 
 startsWith.  If I violated convention and had a range class, things would 
 fail miserably.  There would be no need to consume after a successful call 
 to startsWith and the range would have a random number of elements removed 
 on an unsuccessful call to startsWith. I'm pretty sure that early 
 discussions of ranges claimed that they could be either structs and classes, 
 but in practice that isn't the case.

I am implementing right now a change in the range interface mentioned in 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: 
add a function save() that saves the iteration state of a range.

With save() in tow, class ranges and struct ranges can be used the same 
way. True, if someone forgets to say

auto copy = r.save();

and instead says:

auto copy = r;

the behavior will indeed be different for class ranges and struct ranges.


Andrei

Jan 01 2010

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com
 <mailto:michel.fortin michelf.com>> wrote:

     On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu
     <SeeWebsiteForEmail erdani.org
     <mailto:SeeWebsiteForEmail erdani.org>> said:

         The question of this post is the following: should output ranges
         be passed by value or by reference? ArrayAppender uses an extra
         indirection to work properly when passed by value. But if we
         want to model built-in arrays' operator ~=, we'd need to request
         that all output ranges be passed by reference.


     I think modeling built-in arrays is the way to go as it makes less
     things to learn. In fact, it makes it easier to learn ranges because
     you can begin by learning arrays, then transpose this knowledge to
     ranges which are more abstract and harder to grasp.


 I agree. And arrays may well be the most used range anyway.

 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
 motivated by practical necessity. I don't want to propagate that quirk
 into ranges. The best output range is one that works properly when
 passed by value.

 
 I worry about a growing level of convention with ranges.  Another recent 
 range thread discussed the need to call consume after a successful call to 
 startsWith.  If I violated convention and had a range class, things would 
 fail miserably.  There would be no need to consume after a successful call 
 to startsWith and the range would have a random number of elements removed 
 on an unsuccessful call to startsWith. I'm pretty sure that early 
 discussions of ranges claimed that they could be either structs and classes, 
 but in practice that isn't the case.

 
 I am implementing right now a change in the range interface mentioned in 
 http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: 
 add a function save() that saves the iteration state of a range.
 
 With save() in tow, class ranges and struct ranges can be used the same 
 way. True, if someone forgets to say
 
 auto copy = r.save();
 
 and instead says:
 
 auto copy = r;
 
 the behavior will indeed be different for class ranges and struct ranges.

Or if they completely forgot that bit of convention and omit creating a
variable called save... Also, doesn't use of save degrade performance for
structs? Or does the inliner/optimizer remove the copy variable altogether?
 
 
 Andrei

Jan 01 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> said:

 With save() in tow, class ranges and struct ranges can be used the same
 way. True, if someone forgets to say
 
 auto copy = r.save();
 
 and instead says:
 
 auto copy = r;
 
 the behavior will indeed be different for class ranges and struct ranges.

 
 Or if they completely forgot that bit of convention and omit creating a 
 variable called save... Also, doesn't use of save degrade performance 
 for structs? Or does the inliner/optimizer remove the copy variable 
 altogether?

I don't think this will be a problem for the optimizer.

But I say dump save(): it's more trouble than it's worth.

I'm sure it will be a problem for most programmers, as most will learn 
and test algorithms with arrays and struct ranges and not classes. 
"save()" is a detail too easy to miss. The benefit of supporting 
classes by asking everyone to remember adding save() seems tiny 
considering you can easily create a struct wrapper if you really need 
your range to be a class.

I know that the struct wrapper would probably force more copies of a 
range than necessary. But there are many ways this could be alleviated. 
For instance, the struct wrapper could use copy-on-write with a 
reference counter to detect when a copy is necessary. So while it'd be 
a little harder to design a range from a class, it'd be easier for 
everyone to use ranges correctly.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> 
 said:
 
 With save() in tow, class ranges and struct ranges can be used the same
 way. True, if someone forgets to say

 auto copy = r.save();

 and instead says:

 auto copy = r;

 the behavior will indeed be different for class ranges and struct 
 ranges.

 Or if they completely forgot that bit of convention and omit creating 
 a variable called save... Also, doesn't use of save degrade 
 performance for structs? Or does the inliner/optimizer remove the copy 
 variable altogether?

 
 I don't think this will be a problem for the optimizer.
 
 But I say dump save(): it's more trouble than it's worth.
 
 I'm sure it will be a problem for most programmers, as most will learn 
 and test algorithms with arrays and struct ranges and not classes. 
 "save()" is a detail too easy to miss. The benefit of supporting classes 
 by asking everyone to remember adding save() seems tiny considering you 
 can easily create a struct wrapper if you really need your range to be a 
 class.
 
 I know that the struct wrapper would probably force more copies of a 
 range than necessary. But there are many ways this could be alleviated. 
 For instance, the struct wrapper could use copy-on-write with a 
 reference counter to detect when a copy is necessary. So while it'd be a 
 little harder to design a range from a class, it'd be easier for 
 everyone to use ranges correctly.
 

save() is not only for classes. It also distinguishes input ranges from 
forward ranges. It's the primitive that STL didn't define but should have.

Andrei

Jan 01 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 save() is not only for classes. It also distinguishes input ranges from 
 forward ranges. It's the primitive that STL didn't define but should 
 have.

Right. I still maintain that it's a bad approach. I've written a lot of 
algorithms in C++ that worked with iterators, always assuming 
assignment would copy the state. Fortunately I didn't had to use input 
iterators with them, most of the time. But I did once or twice, and the 
thing was working slightly off.

What I'd do instead is somehow make input ranges non-copyable. They 
could be either passed by ref or moved, never copied. This way they 
would still behave exactly like array slices, only not copyable, and 
you get a compile-time error if you try to copy them which is 
infinitely better than a subtle change in behavior.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 save() is not only for classes. It also distinguishes input ranges 
 from forward ranges. It's the primitive that STL didn't define but 
 should have.

 
 Right. I still maintain that it's a bad approach. I've written a lot of 
 algorithms in C++ that worked with iterators, always assuming assignment 
 would copy the state. Fortunately I didn't had to use input iterators 
 with them, most of the time. But I did once or twice, and the thing was 
 working slightly off.
 
 What I'd do instead is somehow make input ranges non-copyable. They 
 could be either passed by ref or moved, never copied. This way they 
 would still behave exactly like array slices, only not copyable, and you 
 get a compile-time error if you try to copy them which is infinitely 
 better than a subtle change in behavior.

I tried that, it makes input ranges next to unusable. save() is an 
imperfect but workable solution.

Andrei

Jan 01 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 What I'd do instead is somehow make input ranges non-copyable. They 
 could be either passed by ref or moved, never copied. This way they 
 would still behave exactly like array slices, only not copyable, and 
 you get a compile-time error if you try to copy them which is 
 infinitely better than a subtle change in behavior.

 
 I tried that, it makes input ranges next to unusable.

I think I can see why. You can't have ref member and local variables 
like in C++, so it's pretty hard to use references.


 save() is an imperfect but workable solution.

save() is an workable but error-prone solution.

Perhaps we could mitigate this by making people more aware of the 
difference instead. Couldn't we rename "input range" for "input stream"?

Currently you have ranges that behave one way and ranges that behave 
the other way, which is confusing. Having a different name for both 
would emphasize there is a difference. With different names, you're 
guarantied to get the "what's the difference?" question from newbies.

And it's simple to explain: "You can often use ranges and streams 
interchangeably, but for that to work you must use save() when you need 
a copy of the current state. Also, not all streams support save(). It's 
good practice to always use save() so that algorithms work for both for 
ranges and streams."


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 What I'd do instead is somehow make input ranges non-copyable. They 
 could be either passed by ref or moved, never copied. This way they 
 would still behave exactly like array slices, only not copyable, and 
 you get a compile-time error if you try to copy them which is 
 infinitely better than a subtle change in behavior.

 I tried that, it makes input ranges next to unusable.

 
 I think I can see why. You can't have ref member and local variables 
 like in C++, so it's pretty hard to use references.
 
 
 save() is an imperfect but workable solution.

 
 save() is an workable but error-prone solution.
 
 Perhaps we could mitigate this by making people more aware of the 
 difference instead. Couldn't we rename "input range" for "input stream"?
 
 Currently you have ranges that behave one way and ranges that behave the 
 other way, which is confusing. Having a different name for both would 
 emphasize there is a difference. With different names, you're guarantied 
 to get the "what's the difference?" question from newbies.
 
 And it's simple to explain: "You can often use ranges and streams 
 interchangeably, but for that to work you must use save() when you need 
 a copy of the current state. Also, not all streams support save(). It's 
 good practice to always use save() so that algorithms work for both for 
 ranges and streams."

That's an idea, and names are powerful, but I think it's reasonable to 
not expect miracles from that name change. It has disadvantages too - 
"input range" vs. "forward range"  clarifies there's a conceptual 
relationship between the two, whereas "streams" are different from 
anything else.

Andrei

Jan 02 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-02 09:59:51 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 That's an idea, and names are powerful, but I think it's reasonable to 
 not expect miracles from that name change. It has disadvantages too - 
 "input range" vs. "forward range"  clarifies there's a conceptual 
 relationship between the two, whereas "streams" are different from 
 anything else.

I'm not expecting a miracle from it either, it'd just be much less confusing.

You could say that assignment of an input stream might or might not 
save its state (depending on the stream type) so you must call save() 
to save the state when working with streams, but ranges are guarantied 
to save their state on assignment, thus behaving more predictably and 
just like arrays. So if you're working only with ranges, not streams, 
you never need to worry about save().

A similar option would be to have both input ranges and input streams:

* input range:  by value semantics, no need for save()
* input stream: by reference semantics

A pointer to an input range would thus automatically qualify as an 
input stream, so it's easy to give an input range to a function taking 
an input stream. Well, except for stack-allocated ranges in SafeD for 
which you can't create a pointer. This pretty much break the idea, I 
think.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Andrei Alexandrescu Wrote:
 
 Jason House wrote:
 Andrei Alexandrescu wrote:
 
 Philippe Sigaud wrote:
 On Thu, Dec 31, 2009 at 16:47, Michel Fortin
 <michel.fortin michelf.com 
 <mailto:michel.fortin michelf.com>> wrote:
 
 On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org 
 <mailto:SeeWebsiteForEmail erdani.org>> said:
 
 The question of this post is the following: should output
 ranges be passed by value or by reference? ArrayAppender uses
 an extra indirection to work properly when passed by value.
 But if we want to model built-in arrays' operator ~=, we'd
 need to request that all output ranges be passed by
 reference.
 
 
 I think modeling built-in arrays is the way to go as it makes
 less things to learn. In fact, it makes it easier to learn
 ranges because you can begin by learning arrays, then
 transpose this knowledge to ranges which are more abstract
 and harder to grasp.
 
 
 I agree. And arrays may well be the most used range anyway.

 Upon more thinking, I'm leaning the other way. ~= is a quirk of
 arrays motivated by practical necessity. I don't want to
 propagate that quirk into ranges. The best output range is one
 that works properly when passed by value.

 I worry about a growing level of convention with ranges.  Another
 recent range thread discussed the need to call consume after a
 successful call to startsWith.  If I violated convention and had
 a range class, things would fail miserably.  There would be no
 need to consume after a successful call to startsWith and the
 range would have a random number of elements removed on an
 unsuccessful call to startsWith. I'm pretty sure that early 
 discussions of ranges claimed that they could be either structs
 and classes, but in practice that isn't the case.

 I am implementing right now a change in the range interface
 mentioned in 
 http://www.informit.com/articles/printerfriendly.aspx?p=1407357,
 namely: add a function save() that saves the iteration state of a
 range.
 
 With save() in tow, class ranges and struct ranges can be used the
 same way. True, if someone forgets to say
 
 auto copy = r.save();
 
 and instead says:
 
 auto copy = r;
 
 the behavior will indeed be different for class ranges and struct
 ranges.

 
 Or if they completely forgot that bit of convention and omit creating
 a variable called save... Also, doesn't use of save degrade
 performance for structs? Or does the inliner/optimizer remove the
 copy variable altogether?

It may be best to discuss this on an example:

/**
If $(D startsWith(r1, r2)), consume the corresponding elements off $(D
r1) and return $(D true). Otherwise, leave $(D r1) unchanged and
return $(D false).
  */
bool consume(R1, R2)(ref R1 r1, R2 r2)
         if (isForwardRange!R1 && isInputRange!R2)
{
     auto r = r1.save();
     while (!r2.empty && !r.empty && r.front == r2.front) {
         r.popFront();
         r2.popFront();
     }
     if (r2.empty) {
         r1 = r;
         return true;
     }
     return false;
}

For most structs, save() is very simple:

auto save() { return this; }

For classes, save() entails creating a new object:

auto save() { return new typeof(this)(this); }

If the implementor of consume() forgets to call save(), the situation is 
unpleasant albeit not catastrophic: for most struct ranges things will 
continue to work, but for class ranges the function will fail to perform 
to spec. I don't know how to improve on that.

Anyway, it's not entirely a convention. I'll change isForwardRange to 
require the existence of save().


Andrei

Jan 01 2010

Rainer Deyke <rainerd eldwood.com> writes:

Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

Require that all ranges are structs.  If you want to implement a range
as a class, use a wrapper struct that creates a new object in its
postblit function.  The wrapper struct can be made generic and placed in
the standard library.

Same performance as the current approach, slightly more effort on the
part of the range implementor, much easier and less error-prone on the
side of the range user.


-- 
Rainer Deyke - rainerd eldwood.com

Jan 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

 
 Require that all ranges are structs.  If you want to implement a range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and placed in
 the standard library.

That's a good idea, but it doesn't work with covariant return types. 
Those are needed for the container hierarchy that I'm working on.


Andrei

Jan 01 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to perform
 to spec. I don't know how to improve on that.

 
 Require that all ranges are structs.  If you want to implement a range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and placed in
 the standard library.
 
 Same performance as the current approach, slightly more effort on the
 part of the range implementor, much easier and less error-prone on the
 side of the range user.

Oh, besides it doesn't work for struct ranges that iterate one-pass streams.

Andrei

Jan 01 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the situation  
 is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to  
 perform
 to spec. I don't know how to improve on that.

  Require that all ranges are structs.  If you want to implement a range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and placed in
 the standard library.
  Same performance as the current approach, slightly more effort on the
 part of the range implementor, much easier and less error-prone on the
 side of the range user.

 Oh, besides it doesn't work for struct ranges that iterate one-pass  
 streams.

What does save do in those cases?

-Steve

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the 
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things will
 continue to work, but for class ranges the function will fail to 
 perform
 to spec. I don't know how to improve on that.

  Require that all ranges are structs.  If you want to implement a range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and placed in
 the standard library.
  Same performance as the current approach, slightly more effort on the
 part of the range implementor, much easier and less error-prone on the
 side of the range user.

 Oh, besides it doesn't work for struct ranges that iterate one-pass 
 streams.

 
 What does save do in those cases?
 
 -Steve

It provides a syntactic differentiation between input ranges and forward 
ranges.

STL's input iterators are syntactically indistinguishable from forward 
iterators. Because of that, all STL algorithms that expect forward 
ranges will also compile and run with input ranges, although never with 
the expected result. This has been a lingering problem with C++98, and a 
key motivator for concepts. Since syntactic differences cannot be used 
to distinguish between input and forward iterators, the reasoning went, 
we need something else as a discriminator - and that would be a concept: 
someone who defines an input iterator declares it obeys the input 
iterator concept, whereas someone defining a forward iterator declares 
it obeys the forward iterator concept.

During the meltdown of concepts, a number of people including Bjarne 
Stroustrup and myself have suggested that a simple workable solution 
would be to define an extra function a la "save" that is used in 
algorithms and only supported by forward iterators, but not by input 
iterators. Then, algorithms use save() and will correctly refuse to 
compile calls with input iterators. The remaining risk is that someone 
writes an algorithm and forgets to use save().


Andrei

Jan 02 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the  
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things  
 will
 continue to work, but for class ranges the function will fail to  
 perform
 to spec. I don't know how to improve on that.

  Require that all ranges are structs.  If you want to implement a  
 range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and placed  
 in
 the standard library.
  Same performance as the current approach, slightly more effort on the
 part of the range implementor, much easier and less error-prone on the
 side of the range user.

 Oh, besides it doesn't work for struct ranges that iterate one-pass  
 streams.

  What does save do in those cases?
  -Steve

 It provides a syntactic differentiation between input ranges and forward  
 ranges.

 STL's input iterators are syntactically indistinguishable from forward  
 iterators. Because of that, all STL algorithms that expect forward  
 ranges will also compile and run with input ranges, although never with  
 the expected result. This has been a lingering problem with C++98, and a  
 key motivator for concepts. Since syntactic differences cannot be used  
 to distinguish between input and forward iterators, the reasoning went,  
 we need something else as a discriminator - and that would be a concept:  
 someone who defines an input iterator declares it obeys the input  
 iterator concept, whereas someone defining a forward iterator declares  
 it obeys the forward iterator concept.

 During the meltdown of concepts, a number of people including Bjarne  
 Stroustrup and myself have suggested that a simple workable solution  
 would be to define an extra function a la "save" that is used in  
 algorithms and only supported by forward iterators, but not by input  
 iterators. Then, algorithms use save() and will correctly refuse to  
 compile calls with input iterators. The remaining risk is that someone  
 writes an algorithm and forgets to use save().

Would it not be as useful then to just define attributes on the type and  
save a bunch of useless/weird looking code?

that is, have an enum value inside the type that defines its state can be  
saved.  It seems to me like save is the equivalent of that anyways, since  
its possible to forget to use it, and still have your code compile.

Basically, it appears to me that save either a) doesn't compile or b) is  
the equivalent of assignment.  It doesn't seem to add much to the  
functionality.

This is all except for classes, which I think have no business being  
ranges in the first place.

-Steve

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Rainer Deyke wrote:
 Andrei Alexandrescu wrote:
 If the implementor of consume() forgets to call save(), the 
 situation is
 unpleasant albeit not catastrophic: for most struct ranges things 
 will
 continue to work, but for class ranges the function will fail to 
 perform
 to spec. I don't know how to improve on that.

  Require that all ranges are structs.  If you want to implement a 
 range
 as a class, use a wrapper struct that creates a new object in its
 postblit function.  The wrapper struct can be made generic and 
 placed in
 the standard library.
  Same performance as the current approach, slightly more effort on the
 part of the range implementor, much easier and less error-prone on the
 side of the range user.

 Oh, besides it doesn't work for struct ranges that iterate one-pass 
 streams.

  What does save do in those cases?
  -Steve

 It provides a syntactic differentiation between input ranges and 
 forward ranges.

 STL's input iterators are syntactically indistinguishable from forward 
 iterators. Because of that, all STL algorithms that expect forward 
 ranges will also compile and run with input ranges, although never 
 with the expected result. This has been a lingering problem with 
 C++98, and a key motivator for concepts. Since syntactic differences 
 cannot be used to distinguish between input and forward iterators, the 
 reasoning went, we need something else as a discriminator - and that 
 would be a concept: someone who defines an input iterator declares it 
 obeys the input iterator concept, whereas someone defining a forward 
 iterator declares it obeys the forward iterator concept.

 During the meltdown of concepts, a number of people including Bjarne 
 Stroustrup and myself have suggested that a simple workable solution 
 would be to define an extra function a la "save" that is used in 
 algorithms and only supported by forward iterators, but not by input 
 iterators. Then, algorithms use save() and will correctly refuse to 
 compile calls with input iterators. The remaining risk is that someone 
 writes an algorithm and forgets to use save().

 
 Would it not be as useful then to just define attributes on the type and 
 save a bunch of useless/weird looking code?
 
 that is, have an enum value inside the type that defines its state can 
 be saved.  It seems to me like save is the equivalent of that anyways, 
 since its possible to forget to use it, and still have your code compile.

If you have an enum that says a range's state can be saved, then you 
still need a function to effect that save :o). So you added, not 
removed, bloat.

 Basically, it appears to me that save either a) doesn't compile or b) is 
 the equivalent of assignment.  It doesn't seem to add much to the 
 functionality.

No, for class ranges save() is nontrivial (a sort of clone). I suspect 
even struct ranges for certain file-oriented stuff would do nontrivial 
work (e.g. open the file anew and fseek() on the current position etc.)

 This is all except for classes, which I think have no business being 
 ranges in the first place.

Well then how would the container hierarchy work? It does need range 
interfaces. How does dcollections deal with that?


Andrei

Jan 02 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  Would it not be as useful then to just define attributes on the type  
 and save a bunch of useless/weird looking code?
  that is, have an enum value inside the type that defines its state can  
 be saved.  It seems to me like save is the equivalent of that anyways,  
 since its possible to forget to use it, and still have your code  
 compile.

 If you have an enum that says a range's state can be saved, then you  
 still need a function to effect that save :o). So you added, not  
 removed, bloat.

The function already exists -- opAssign.  My point is that save would just  
be the same as opAssign in most cases where you'd want to implement save.   
Cases where you'd want to implement save differently than opAssign are  
cases where you wouldn't want to use it in a generic fashion.

 Basically, it appears to me that save either a) doesn't compile or b)  
 is the equivalent of assignment.  It doesn't seem to add much to the  
 functionality.

 No, for class ranges save() is nontrivial (a sort of clone). I suspect  
 even struct ranges for certain file-oriented stuff would do nontrivial  
 work (e.g. open the file anew and fseek() on the current position etc.)

Classes should not be ranges.  And I hope that any algorithm that uses  
savable ranges would not be used on a file range.  For example, your  
consume function:

bool consume(R1, R2)(ref R1 r1, R2 r2)
         if (isForwardRange!R1 && isInputRange!R2)
{
     auto r = r1.save();  // open an extra file, and seek the file to the  
given point *just in case*?!
     while (!r2.empty && !r.empty && r.front == r2.front) {
         r.popFront();
         r2.popFront();
     }
     if (r2.empty) {
         r1 = r;  // note that this unaliases r1 from any other copies (not  
save()'d copies) that were made of it.
                  // also note that you now have opened more handles to the  
same file.  Calling this many times could
                  // consume quite a few handles.
         return true;
     }
     return false;
}

Here is a good exercise that would help clarify what we need: determine  
all the types of ranges you can think of that would need a special save  
function.

 This is all except for classes, which I think have no business being  
 ranges in the first place.

 Well then how would the container hierarchy work? It does need range  
 interfaces. How does dcollections deal with that?

A container is not a range.  A range of a container should not be a  
class.  For dcollections, I was planning on ranges being a struct with a  
begin and end cursor defining the part of the container referenced by the  
range.  Such a range can always be copied and modifying the copy through  
the range functions should not modify the original range.

I see a range as being useful for iteration or algorithms, but not for  
general use.  A great example is AAs.  Would you say that an AA *is* a  
range or should *provide* a range?  If it is a range, does that mean you  
remove elements as you call popFront?  Does that make any sense?  If it  
doesn't, then what happens if you add elements through another alias to  
that AA?

-Steve

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
  Would it not be as useful then to just define attributes on the type 
 and save a bunch of useless/weird looking code?
  that is, have an enum value inside the type that defines its state 
 can be saved.  It seems to me like save is the equivalent of that 
 anyways, since its possible to forget to use it, and still have your 
 code compile.

 If you have an enum that says a range's state can be saved, then you 
 still need a function to effect that save :o). So you added, not 
 removed, bloat.

 
 The function already exists -- opAssign.  My point is that save would 
 just be the same as opAssign in most cases where you'd want to implement 
 save.  Cases where you'd want to implement save differently than 
 opAssign are cases where you wouldn't want to use it in a generic fashion.

You might mean this(this). Either doesn't help because you'd still be 
unable to differentiate between input ranges and forward ranges. Much of 
the point of save() is to mark a syntactic difference between input 
ranges and forward ranges. Input ranges still need this(this) and 
opAssign - those just have different semantics.

 Basically, it appears to me that save either a) doesn't compile or b) 
 is the equivalent of assignment.  It doesn't seem to add much to the 
 functionality.

 No, for class ranges save() is nontrivial (a sort of clone). I suspect 
 even struct ranges for certain file-oriented stuff would do nontrivial 
 work (e.g. open the file anew and fseek() on the current position etc.)

 
 Classes should not be ranges.  And I hope that any algorithm that uses 
 savable ranges would not be used on a file range.  For example, your 
 consume function:
 
 bool consume(R1, R2)(ref R1 r1, R2 r2)
         if (isForwardRange!R1 && isInputRange!R2)
 {
     auto r = r1.save();  // open an extra file, and seek the file to the 
 given point *just in case*?!
     while (!r2.empty && !r.empty && r.front == r2.front) {
         r.popFront();
         r2.popFront();
     }
     if (r2.empty) {
         r1 = r;  // note that this unaliases r1 from any other copies 
 (not save()'d copies) that were made of it.
                  // also note that you now have opened more handles to 
 the same file.  Calling this many times could
                  // consume quite a few handles.
         return true;
     }
     return false;
 }
 
 Here is a good exercise that would help clarify what we need: determine 
 all the types of ranges you can think of that would need a special save 
 function.

A range that defines save() is a forward range. save() creates an 
independent range from its source. The file etc. example was 
hypothetical but realistic.

 This is all except for classes, which I think have no business being 
 ranges in the first place.

 Well then how would the container hierarchy work? It does need range 
 interfaces. How does dcollections deal with that?

 
 A container is not a range.  A range of a container should not be a 
 class.  For dcollections, I was planning on ranges being a struct with a 
 begin and end cursor defining the part of the container referenced by 
 the range.  Such a range can always be copied and modifying the copy 
 through the range functions should not modify the original range.

Struct ranges won't work with a container hierarchy. If you define a 
container hierarchy (classes etc.) you'll also need a range hierarchy. 
Otherwise defining the inheritance relationship is impossible. Consider:

abstract class Container(E) { // most general container
      property bool empty();
     bool add(E element);
     E removeAny();
     InputRange!E opSlice();
}

That's what I'd expect any container worth its salt to implement: (a) 
test for empty; (b) add an element to the container; (c) remove some 
element from the container; (d) get a range that spans the entire 
container. (Probably removeAll() and a range-positioned remove() would 
make sense, too.)

My point is, how do you inherit this guy? Well by taking advantage of 
covariant return types:

abstract class Array(E) : Container!E {
      property bool empty();
     bool add(E element);
     E removeAny();
     RandomAccessRange!E opSlice();
     ... more stuff ...
}

Ergo, RandomAccessRange!E must inherit InputRange!E for covariance to 
kick in. The resulting setup is quite interesting: you either know you 
work with an Array and therefore you get a random-access range, or you 
just use the generic Container and get an input range.

 I see a range as being useful for iteration or algorithms, but not for 
 general use.  A great example is AAs.  Would you say that an AA *is* a 
 range or should *provide* a range?  If it is a range, does that mean you 
 remove elements as you call popFront?  Does that make any sense?  If it 
 doesn't, then what happens if you add elements through another alias to 
 that AA?

An AA provides several ranges - among which byKey and byValue.


Andrei

Jan 02 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 02 Jan 2010 19:51:41 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  The function already exists -- opAssign.  My point is that save would  
 just be the same as opAssign in most cases where you'd want to  
 implement save.  Cases where you'd want to implement save differently  
 than opAssign are cases where you wouldn't want to use it in a generic  
 fashion.

 You might mean this(this).

Yes, that's what I meant :)  I haven't gotten used to struct constructors.

 Either doesn't help because you'd still be unable to differentiate  
 between input ranges and forward ranges. Much of the point of save() is  
 to mark a syntactic difference between input ranges and forward ranges.  
 Input ranges still need this(this) and opAssign - those just have  
 different semantics.

That was the point of the enum.  It would be a synthetic difference, but  
IMO so is save.  If it turns out that the only usable times you implement  
save all look like:

auto save() { return *this; }

then save gives you nothing.  It's kind of a proof by induction (I think).

You say that algorithm A requires the ability to save the state of a  
range.  So I define a function save on a range which cannot be saved by a  
simple assignment (i.e. a file).  I use A on that range, and the results  
are not what I expect or kill performance or consume unneeded resources,  
I'd rather not use algorithm A on that range, negating the need to define  
a save function in the first place.

I think that's what we're going to end up with.

 A range that defines save() is a forward range. save() creates an  
 independent range from its source. The file etc. example was  
 hypothetical but realistic.

I meant what actual types of ranges, not categories, I don't know how to  
say this better...  i.e. the file range is one type, what are others?   
What I'm looking for is ranges that would otherwise be input ranges unless  
you define save on them.  For example, a file range without save is an  
input range, cannot be a forward range.  But if you define save, it is a  
forward range.  An array is an example of a range that can be a forward  
range without the save function.

My theory is, given this list of ranges, if you pair them with an  
algorithm that requires save capability, you wouldn't want to use that  
algorithm on it anyways (kinda like the consume example).

 Struct ranges won't work with a container hierarchy. If you define a  
 container hierarchy (classes etc.) you'll also need a range hierarchy.  
 Otherwise defining the inheritance relationship is impossible. Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a)  
 test for empty; (b) add an element to the container; (c) remove some  
 element from the container; (d) get a range that spans the entire  
 container. (Probably removeAll() and a range-positioned remove() would  
 make sense, too.)

The range interface is compile-time only, there is no need to define it in  
the container interface.  opSlice does not need to be part of the  
interface, even if it is defined on all the container types.

opApply makes for a much better interface-friendly iteration anyways.

BTW, the primitives in dcollections are:

clear(); // clear all elements
remove(V v); // remove an element
contains(V v); // returns whether an element is contained in the colleciton
length(); // the length of the collection
dup(); // duplicate the collection
opApply(int delegate(ref V v) dg); // iterate the collection
opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
collection

That means it covers only empty in your list of must-have functions (via  
length() == 0).  Add is not a primitive because the Map collections  
shouldn't assign some random key to the new element.  removeAny is defined  
only on sets and multisets, but I'm not sure that it couldn't be moved to  
Collection, in fact, I'll probably do that.

Note that it's missing begin and end which are defined on every single  
container type (i.e. the equivalent of the all-elements range).  This is  
because those primitives return a struct that is different for every  
container type.

It also surpasses opSlice via opApply, since all an input range can do  
anyways is iterate.  In fact, opApply is more powerful because you can  
change elements and remove elements (via purging).  Plus it's more  
efficient than a range-via-interface.

 I see a range as being useful for iteration or algorithms, but not for  
 general use.  A great example is AAs.  Would you say that an AA *is* a  
 range or should *provide* a range?  If it is a range, does that mean  
 you remove elements as you call popFront?  Does that make any sense?   
 If it doesn't, then what happens if you add elements through another  
 alias to that AA?

 An AA provides several ranges - among which byKey and byValue.

I misunderstood your statement "[a container hierarchy] does need range  
interfaces."  I thought you meant containers themselves need to implement  
the range interface, I see now that isn't the case, so my bad.

-Steve

Jan 02 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 That was the point of the enum.  It would be a synthetic difference, 
 but  IMO so is save.  If it turns out that the only usable times you 
 implement  save all look like:
 
 auto save() { return *this; }
 
 then save gives you nothing.  It's kind of a proof by induction (I think).
 
 You say that algorithm A requires the ability to save the state of a  
 range.  So I define a function save on a range which cannot be saved by 
 a  simple assignment (i.e. a file).  I use A on that range, and the 
 results  are not what I expect or kill performance or consume unneeded 
 resources,  I'd rather not use algorithm A on that range, negating the 
 need to define  a save function in the first place.
 
 I think that's what we're going to end up with.

This is making me think...

First, opening files silently whenever an algorithm feels the need to 
save its state is just madness. What if the operating system decide to 
pop a security window when opening a restricted file? What if the file 
has been deleted or replaced in its directory while your program is 
still hanging on the old inode?

One might want to save the position in a file in order to be able to 
return there later, but when you return there the last thing you 
actually want to do is to open the file a second time: what you might 
realistically want is to set the position to what it was before, 
calling fseek. So perhaps it would be more useful to have both save() 
to save the current state (the position in a file), and restore() which 
would restore the saved state (returning to the saved position in a 
file).

For ranges with reference semantics -- please call them streams! -- 
save() and restore() would work just as fpos and fseek for files, but 
they might also not be available like in a TCP stream or in a 
communication channel between threads.

For ranges such as array slices, save and restore would just copy the 
range in one or the other direction. The only reason to have them is so 
they can be used as streams.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 02 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" 
 <schveiguy yahoo.com> said:
 
 That was the point of the enum.  It would be a synthetic difference, 
 but  IMO so is save.  If it turns out that the only usable times you 
 implement  save all look like:

 auto save() { return *this; }

 then save gives you nothing.  It's kind of a proof by induction (I 
 think).

 You say that algorithm A requires the ability to save the state of a  
 range.  So I define a function save on a range which cannot be saved 
 by a  simple assignment (i.e. a file).  I use A on that range, and the 
 results  are not what I expect or kill performance or consume unneeded 
 resources,  I'd rather not use algorithm A on that range, negating the 
 need to define  a save function in the first place.

 I think that's what we're going to end up with.

 
 This is making me think...
 
 First, opening files silently whenever an algorithm feels the need to 
 save its state is just madness. What if the operating system decide to 
 pop a security window when opening a restricted file? What if the file 
 has been deleted or replaced in its directory while your program is 
 still hanging on the old inode?
 
 One might want to save the position in a file in order to be able to 
 return there later, but when you return there the last thing you 
 actually want to do is to open the file a second time: what you might 
 realistically want is to set the position to what it was before, calling 
 fseek. So perhaps it would be more useful to have both save() to save 
 the current state (the position in a file), and restore() which would 
 restore the saved state (returning to the saved position in a file).

These are implementation matters. save() could lazily save information, 
duplicate the file handle (cheap on many OSs), etc.

 For ranges with reference semantics -- please call them streams! -- 
 save() and restore() would work just as fpos and fseek for files, but 
 they might also not be available like in a TCP stream or in a 
 communication channel between threads.
 
 For ranges such as array slices, save and restore would just copy the 
 range in one or the other direction. The only reason to have them is so 
 they can be used as streams.

This I don't understand. Array slices' save is:

T[] save(T[] array) { return array; }


Andrei

Jan 02 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-03 00:51:46 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 First, opening files silently whenever an algorithm feels the need to 
 save its state is just madness. What if the operating system decide to 
 pop a security window when opening a restricted file? What if the file 
 has been deleted or replaced in its directory while your program is 
 still hanging on the old inode?
 
 One might want to save the position in a file in order to be able to 
 return there later, but when you return there the last thing you 
 actually want to do is to open the file a second time: what you might 
 realistically want is to set the position to what it was before, 
 calling fseek. So perhaps it would be more useful to have both save() 
 to save the current state (the position in a file), and restore() which 
 would restore the saved state (returning to the saved position in a 
 file).

 
 These are implementation matters. save() could lazily save information, 
 duplicate the file handle (cheap on many OSs), etc.




 For ranges with reference semantics -- please call them streams! -- 
 save() and restore() would work just as fpos and fseek for files, but 
 they might also not be available like in a TCP stream or in a 
 communication channel between threads.
 
 For ranges such as array slices, save and restore would just copy the 
 range in one or the other direction. The only reason to have them is so 
 they can be used as streams.

 
 This I don't understand. Array slices' save is:
 
 T[] save(T[] array) { return array; }

The idea is that you would use save and restore like this:

	auto state = range.save();
	// ...
	range.restore(state);

For array slices, save doesn't really have to change, just define restore:

	T[] save(T[] array) { return array; }
	void restore(ref T[]�array, T[] state) { array = state; }

But with file handles:

	size_t save(FILE handle) { return fpos(handle); }
	void restore(FILE handle, size_t state) { fseek(handle, state); }

And for a range interface to be used in a class hierarchy:

	interface Range {
		Variant save();
		void restore(Variant state);
	}

This way you don't have to create a new range every time you need to 
start from a previously saved state, you can reuse the existing one 
which is much more efficient.

Also, the saved state can be anything, of any type, and doesn't need to 
be a range. So when implementing a non-trivial range you don't need to 
implement lazy initialization logic and all sort of complicated stuff 
just in case someone might save the state.

Note that with save defined like this you can't really duplicate the 
range in the general case. I'm not sure if this is good or bad. How 
many algorithms really require you to duplicate a range?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 03 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an 
 algorithm that requires save capability, you wouldn't want to use that 
 algorithm on it anyways (kinda like the consume example).

Why gratuitously limit the design? You're asking to replace this:

R save() { return this; }

with:

enum thisIsAForwardRange = true;

Is there a reason? The former leaves in flexibility. The latter doesn't, 
for no good reason.

 Struct ranges won't work with a container hierarchy. If you define a 
 container hierarchy (classes etc.) you'll also need a range hierarchy. 
 Otherwise defining the inheritance relationship is impossible. Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a) 
 test for empty; (b) add an element to the container; (c) remove some 
 element from the container; (d) get a range that spans the entire 
 container. (Probably removeAll() and a range-positioned remove() would 
 make sense, too.)

 
 The range interface is compile-time only, there is no need to define it 
 in the container interface.  opSlice does not need to be part of the 
 interface, even if it is defined on all the container types.
 
 opApply makes for a much better interface-friendly iteration anyways.

I want container users to be able to save iteration state and also to 
open up containers to std.algorithm and other goodies, so I'm shying 
away from opApply.

 BTW, the primitives in dcollections are:
 
 clear(); // clear all elements
 remove(V v); // remove an element

Search and remove? That's an odd primitive. Why wouldn't you offer an 
interface for iteration that allows an algorithm for search, and a 
primitive for positioned removal?

 contains(V v); // returns whether an element is contained in the colleciton

I don't think this belongs to primitives. It's O(n) for many containers 
and again it's a generic algorithm, not a member.

 length(); // the length of the collection

That's not a primitive either. std.algorithm.walkLength is. For me, all 
sorts of red flags and alarm buzzers go off when primitives are 
guaranteed that can't be implemented efficiently but by a subset of 
containers. You can't discount complexity as a implementation detail.

 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the 
 collection
 
 That means it covers only empty in your list of must-have functions (via 
 length() == 0).

How do you implement length() for a singly-linked list? Is empty() going 
to take O(n)?

 Add is not a primitive because the Map collections 
 shouldn't assign some random key to the new element.  removeAny is 
 defined only on sets and multisets, but I'm not sure that it couldn't be 
 moved to Collection, in fact, I'll probably do that.

add is a primitive that takes Tuple!(K, V) as the element type.

 Note that it's missing begin and end which are defined on every single 
 container type (i.e. the equivalent of the all-elements range).  This is 
 because those primitives return a struct that is different for every 
 container type.

So you can't write container-independent iteration code unless you use 
opApply, in which case composition becomes tenuous.

 It also surpasses opSlice via opApply, since all an input range can do 
 anyways is iterate.  In fact, opApply is more powerful because you can 
 change elements and remove elements (via purging).  Plus it's more 
 efficient than a range-via-interface.

An input range is a subset of other (more powerful) ranges. It's also 
much more flexible. I agree that calling range primitives via interfaces 
can become an efficiency liability.

 I see a range as being useful for iteration or algorithms, but not 
 for general use.  A great example is AAs.  Would you say that an AA 
 *is* a range or should *provide* a range?  If it is a range, does 
 that mean you remove elements as you call popFront?  Does that make 
 any sense?  If it doesn't, then what happens if you add elements 
 through another alias to that AA?

 An AA provides several ranges - among which byKey and byValue.

 
 I misunderstood your statement "[a container hierarchy] does need range 
 interfaces."  I thought you meant containers themselves need to 
 implement the range interface, I see now that isn't the case, so my bad.

Yah, they'd offer it as a result of opSlice(). Covariant return types 
will ensure there's no virtual call when you know what container you 
operate on.

Above all: the primitive set for a container must be a small set of 
functions that (a) can be implemented by all containers within 
reasonable efficiency bounds, and (b) are container-specific, not 
generic. IMHO any container design that defines a search(Element) as a 
primitive is misguided. Searching is not a container primitive - it's an 
algorithm. Only more specialized containers (e.g. trees, hashes etc.) 
can afford to define search(Element) as a primitive. Linear search is a 
generic algorithm that works the same for everyone. It does not belong 
as a method of any container.


Andrei

Jan 02 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an  
 algorithm that requires save capability, you wouldn't want to use that  
 algorithm on it anyways (kinda like the consume example).

 Why gratuitously limit the design? You're asking to replace this:

 R save() { return this; }

 with:

 enum thisIsAForwardRange = true;

 Is there a reason? The former leaves in flexibility. The latter doesn't,  
 for no good reason.

Well, one thing you could do is:

enum thisIsAnInputRange = true;

and then no special implementation is needed for normal forward ranges.   
the other point is there is no special treatment needed inside algorithms  
-- the risk of forgetting to use save at the right points of the algorithm  
is higher than forgetting to say isForwardRange!(R) at the beginning of  
the function.

 Struct ranges won't work with a container hierarchy. If you define a  
 container hierarchy (classes etc.) you'll also need a range hierarchy.  
 Otherwise defining the inheritance relationship is impossible.  
 Consider:

 abstract class Container(E) { // most general container
       property bool empty();
      bool add(E element);
      E removeAny();
      InputRange!E opSlice();
 }

 That's what I'd expect any container worth its salt to implement: (a)  
 test for empty; (b) add an element to the container; (c) remove some  
 element from the container; (d) get a range that spans the entire  
 container. (Probably removeAll() and a range-positioned remove() would  
 make sense, too.)

  The range interface is compile-time only, there is no need to define  
 it in the container interface.  opSlice does not need to be part of the  
 interface, even if it is defined on all the container types.
  opApply makes for a much better interface-friendly iteration anyways.

 I want container users to be able to save iteration state and also to  
 open up containers to std.algorithm and other goodies, so I'm shying  
 away from opApply.

Not having opSlice be part of the interface itself does not preclude it  
 from implementing opSlice, and does not preclude using ranges of it in  
std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely  
on compile time interfaces.  opApply allows for full input range  
functionality for things like copying and outputting where templates may  
not be desired.

 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

 Search and remove? That's an odd primitive. Why wouldn't you offer an  
 interface for iteration that allows an algorithm for search, and a  
 primitive for positioned removal?

Search and positioned removal are also a primitives, but not defined on  
the interface.  remove was a primitive on Tango's containers, and  
dcollections was originally meant to be a replacement for Tango's  
containers.

I think the point is, if you have an interface reference, what would be  
the minimum functionality needed so that you could use a container without  
knowing its implementation.

 contains(V v); // returns whether an element is contained in the  
 colleciton

 I don't think this belongs to primitives. It's O(n) for many containers  
 and again it's a generic algorithm, not a member.

Again, it's part of the minimal usable interface.  It's not a generic  
algorithm, because some containers can implement this more efficiently.   
Plus, to use the generic algorithms, you would need to use interfaces as  
ranges which I think are completely useless.

 length(); // the length of the collection

 That's not a primitive either. std.algorithm.walkLength is. For me, all  
 sorts of red flags and alarm buzzers go off when primitives are  
 guaranteed that can't be implemented efficiently but by a subset of  
 containers. You can't discount complexity as a implementation detail.

All current dcollection containers have O(1) length.

 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
 collection
  That means it covers only empty in your list of must-have functions  
 (via length() == 0).

 How do you implement length() for a singly-linked list? Is empty() going  
 to take O(n)?

first, dcollections' list implementation is doubly linked because all  
collections are forward and backward iterable.

second, even for singly linked lists, you can have either O(1) length or  
O(1) splicing (consuming a link list range into another linked list).   
Dcollections' default link list implementation uses O(1) length since I  
think splicing is a specialized requirement.

 Add is not a primitive because the Map collections shouldn't assign  
 some random key to the new element.  removeAny is defined only on sets  
 and multisets, but I'm not sure that it couldn't be moved to  
 Collection, in fact, I'll probably do that.

 add is a primitive that takes Tuple!(K, V) as the element type.

How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is an  
interface method.

what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)),  
but then trying to use the container functions are very cumbersome.  In  
dcollections, Map(K, V) inherits Collection(V).

 Note that it's missing begin and end which are defined on every single  
 container type (i.e. the equivalent of the all-elements range).  This  
 is because those primitives return a struct that is different for every  
 container type.

 So you can't write container-independent iteration code unless you use  
 opApply, in which case composition becomes tenuous.

No, you can easily write container-independent iteration as long as you  
have the implementation.

If you use interfaces you can write opApply wrappers to do different  
things.  I'm not sure what you mean by composition.

 It also surpasses opSlice via opApply, since all an input range can do  
 anyways is iterate.  In fact, opApply is more powerful because you can  
 change elements and remove elements (via purging).  Plus it's more  
 efficient than a range-via-interface.

 An input range is a subset of other (more powerful) ranges. It's also  
 much more flexible. I agree that calling range primitives via interfaces  
 can become an efficiency liability.

How is it more flexible?  You can't replace data, and you can't remove  
data while iterating, both of which are possible with dcollection's  
primitives.  If you have a Container(E) which defines InputRange!E  
opSlice, how do you get at the more defined range definition?  casting?

 I see a range as being useful for iteration or algorithms, but not  
 for general use.  A great example is AAs.  Would you say that an AA  
 *is* a range or should *provide* a range?  If it is a range, does  
 that mean you remove elements as you call popFront?  Does that make  
 any sense?  If it doesn't, then what happens if you add elements  
 through another alias to that AA?

 An AA provides several ranges - among which byKey and byValue.

  I misunderstood your statement "[a container hierarchy] does need  
 range interfaces."  I thought you meant containers themselves need to  
 implement the range interface, I see now that isn't the case, so my bad.

 Yah, they'd offer it as a result of opSlice(). Covariant return types  
 will ensure there's no virtual call when you know what container you  
 operate on.

Not having opSlice on the interface guarantees you never have a virtual  
call for iteration :)  opApply mitigates the virtual call on the interface.

 Above all: the primitive set for a container must be a small set of  
 functions that (a) can be implemented by all containers within  
 reasonable efficiency bounds, and (b) are container-specific, not  
 generic. IMHO any container design that defines a search(Element) as a  
 primitive is misguided. Searching is not a container primitive - it's an  
 algorithm. Only more specialized containers (e.g. trees, hashes etc.)  
 can afford to define search(Element) as a primitive. Linear search is a  
 generic algorithm that works the same for everyone. It does not belong  
 as a method of any container.

If the minimal container design isn't usable without std.algorithm, then I  
don't think it's worth having interfaces.  If you need std.algorithm, you  
need the full implementation of the container because it's a compile-time  
interface.  Interface ranges are something that should be avoided, it's  
like having a programming language where everything has to be a class.

What you are saying seems completely incorrect to me: "since not all  
containers can implement fast search, I'm going to guarantee that *all*  
containers use a linear search via their interface.  *AND* I want to make  
each loop in the search algorithm call 3 virtual functions!"  How is that  
better than a search function that guarantees linear performance but gives  
the option of being as fast as possible with no loop virtual calls?  In  
fact, my implementation is guaranteed to be faster than std.algorithm  
because of the lack of virtual calls.

-Steve

Jan 03 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 My theory is, given this list of ranges, if you pair them with an 
 algorithm that requires save capability, you wouldn't want to use 
 that algorithm on it anyways (kinda like the consume example).

 Why gratuitously limit the design? You're asking to replace this:

 R save() { return this; }

 with:

 enum thisIsAForwardRange = true;

 Is there a reason? The former leaves in flexibility. The latter 
 doesn't, for no good reason.

 
 Well, one thing you could do is:
 
 enum thisIsAnInputRange = true;
 
 and then no special implementation is needed for normal forward ranges.  
 the other point is there is no special treatment needed inside 
 algorithms -- the risk of forgetting to use save at the right points of 
 the algorithm is higher than forgetting to say isForwardRange!(R) at the 
 beginning of the function.

isForwardRange will be defined to yield true if and only if the range 
defines save. But I see you point - user code only asserts 
isForwardRange and then does not bother to use save(), just copies stuff 
around in confidence that copying does the right thing.

Thanks for this insight. I don't see how to reconcile that with class 
ranges and covariance.

 Not having opSlice be part of the interface itself does not preclude it 
 from implementing opSlice, and does not preclude using ranges of it in 
 std.algorithm.  If I'm not mistaken, all functions in std.algorithm rely 
 on compile time interfaces.  opApply allows for full input range 
 functionality for things like copying and outputting where templates may 
 not be desired.

The point is how much container-independent code can someone write by 
using the Container interface. If all you have is a Container, you can't 
use it with any range algorithm.

 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

 Search and remove? That's an odd primitive. Why wouldn't you offer an 
 interface for iteration that allows an algorithm for search, and a 
 primitive for positioned removal?

 
 Search and positioned removal are also a primitives, but not defined on 
 the interface.  remove was a primitive on Tango's containers, and 
 dcollections was originally meant to be a replacement for Tango's 
 containers.
 
 I think the point is, if you have an interface reference, what would be 
 the minimum functionality needed so that you could use a container 
 without knowing its implementation.

Yes, and I think remove(V v) does not belong to the minimum 
functionality. It combines two functions (search and remove) and raises 
questions such as what happens to duplicate elements.

 contains(V v); // returns whether an element is contained in the 
 colleciton

 I don't think this belongs to primitives. It's O(n) for many 
 containers and again it's a generic algorithm, not a member.

 
 Again, it's part of the minimal usable interface.  It's not a generic 
 algorithm, because some containers can implement this more efficiently.

But this is exactly what I believe to be a mistake: you are abstracting 
away algorithmic complexity.

 Plus, to use the generic algorithms, you would need to use interfaces as 
 ranges which I think are completely useless.

Why?

 length(); // the length of the collection

 That's not a primitive either. std.algorithm.walkLength is. For me, 
 all sorts of red flags and alarm buzzers go off when primitives are 
 guaranteed that can't be implemented efficiently but by a subset of 
 containers. You can't discount complexity as a implementation detail.

 
 All current dcollection containers have O(1) length.

Some containers can't define O(1) length conveniently.

 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the 
 collection
  That means it covers only empty in your list of must-have functions 
 (via length() == 0).

 How do you implement length() for a singly-linked list? Is empty() 
 going to take O(n)?

 
 first, dcollections' list implementation is doubly linked because all 
 collections are forward and backward iterable.
 
 second, even for singly linked lists, you can have either O(1) length or 
 O(1) splicing (consuming a link list range into another linked list).  
 Dcollections' default link list implementation uses O(1) length since I 
 think splicing is a specialized requirement.

Right. The question is how much pressure Container is putting on the 
implementation. I'd rather leave it to the list implementation to decide 
to store the length or not.

 Add is not a primitive because the Map collections shouldn't assign 
 some random key to the new element.  removeAny is defined only on 
 sets and multisets, but I'm not sure that it couldn't be moved to 
 Collection, in fact, I'll probably do that.

 add is a primitive that takes Tuple!(K, V) as the element type.

 
 How do you define that on Container(V)?  on Map(K, V), set(K k, V v) is 
 an interface method.

Map!(K, V) has Tuple!(K, V) as its element type.

 what you can do is define Map(K, V) as inheriting Container(Tuple!(K, 
 V)), but then trying to use the container functions are very 
 cumbersome.  In dcollections, Map(K, V) inherits Collection(V).
 
 Note that it's missing begin and end which are defined on every 
 single container type (i.e. the equivalent of the all-elements 
 range).  This is because those primitives return a struct that is 
 different for every container type.

 So you can't write container-independent iteration code unless you use 
 opApply, in which case composition becomes tenuous.

 
 No, you can easily write container-independent iteration as long as you 
 have the implementation.

In this context: container-independent = using the Container interface. 
This is the whole purpose of creating a container hierarchy. If the 
container design fosters knowing the implementation, maybe a class 
hierarchy is not the right choice in the first place.

 If you use interfaces you can write opApply wrappers to do different 
 things.  I'm not sure what you mean by composition.

For example, compose ranges a la retro or take.

 It also surpasses opSlice via opApply, since all an input range can 
 do anyways is iterate.  In fact, opApply is more powerful because you 
 can change elements and remove elements (via purging).  Plus it's 
 more efficient than a range-via-interface.

 An input range is a subset of other (more powerful) ranges. It's also 
 much more flexible. I agree that calling range primitives via 
 interfaces can become an efficiency liability.

 
 How is it more flexible?  You can't replace data, and you can't remove 
 data while iterating, both of which are possible with dcollection's 
 primitives.  If you have a Container(E) which defines InputRange!E 
 opSlice, how do you get at the more defined range definition?  casting?

You can replace data by assigning to range's elements. Removal is done 
via positioned remove (which I think needs to be a primitive).

 I see a range as being useful for iteration or algorithms, but not 
 for general use.  A great example is AAs.  Would you say that an AA 
 *is* a range or should *provide* a range?  If it is a range, does 
 that mean you remove elements as you call popFront?  Does that make 
 any sense?  If it doesn't, then what happens if you add elements 
 through another alias to that AA?

 An AA provides several ranges - among which byKey and byValue.

  I misunderstood your statement "[a container hierarchy] does need 
 range interfaces."  I thought you meant containers themselves need to 
 implement the range interface, I see now that isn't the case, so my bad.

 Yah, they'd offer it as a result of opSlice(). Covariant return types 
 will ensure there's no virtual call when you know what container you 
 operate on.

 
 Not having opSlice on the interface guarantees you never have a virtual 
 call for iteration :)  opApply mitigates the virtual call on the interface.

And takes away the ability to compose ranges and to use algorithms with 
the container.

 Above all: the primitive set for a container must be a small set of 
 functions that (a) can be implemented by all containers within 
 reasonable efficiency bounds, and (b) are container-specific, not 
 generic. IMHO any container design that defines a search(Element) as a 
 primitive is misguided. Searching is not a container primitive - it's 
 an algorithm. Only more specialized containers (e.g. trees, hashes 
 etc.) can afford to define search(Element) as a primitive. Linear 
 search is a generic algorithm that works the same for everyone. It 
 does not belong as a method of any container.

 
 If the minimal container design isn't usable without std.algorithm, then 
 I don't think it's worth having interfaces.

Why?

I think the other way: if the minimal container design is unusable with 
std.algorithm, the design took a wrong turn somewhere.

 If you need std.algorithm, 
 you need the full implementation of the container because it's a 
 compile-time interface.

How did you reach that conclusion? std.algorithm uses a syntactic 
interface that is obeyed by interfaces too. There's no problem.

 Interface ranges are something that should be 
 avoided, it's like having a programming language where everything has to 
 be a class.

I disagree. The negation of an implementation dogma can be just as 
limiting as the dogma. The way I see it, a design defines some 
desiderata. Then it does whatever it takes to fulfill them.

If one desideratum is to use a class hierachy to write 
container-independent code, then interface ranges naturally follow. 
There's no ifs and buts about it.

 What you are saying seems completely incorrect to me: "since not all 
 containers can implement fast search, I'm going to guarantee that *all* 
 containers use a linear search via their interface.

This is a misunderstanding. In the STL linear containers don't define 
find(), but associative containers do. That is the correct design.

 *AND* I want to 
 make each loop in the search algorithm call 3 virtual functions!"

Not necessarily. This happens only if you use the Container interface to 
write container-independent code. It is the cost it takes for the design 
to fulfill its desiderata.

 How 
 is that better than a search function that guarantees linear performance 
 but gives the option of being as fast as possible with no loop virtual 
 calls?

It is better because it doesn't write off search complexity as an 
implementation detail. "Linear or better" is not a good guarantee. A 
good guarantee is "From this node of the hierarchy down, this primitive 
is defined to guarantee O(log n) search". Linear search is a generic 
algorithm and does not belong to any container.


Andrei

Jan 03 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude it  
 from implementing opSlice, and does not preclude using ranges of it in  
 std.algorithm.  If I'm not mistaken, all functions in std.algorithm  
 rely on compile time interfaces.  opApply allows for full input range  
 functionality for things like copying and outputting where templates  
 may not be desired.

 The point is how much container-independent code can someone write by  
 using the Container interface. If all you have is a Container, you can't  
 use it with any range algorithm.

The answer is, you don't.  Ranges via interfaces are slower than primitive  
functions defined on the interface, so use what's best for interfaces when  
you have an interface and what's best for compile-time optimization when  
you have the full implementation.

 BTW, the primitives in dcollections are:
  clear(); // clear all elements
 remove(V v); // remove an element

 Search and remove? That's an odd primitive. Why wouldn't you offer an  
 interface for iteration that allows an algorithm for search, and a  
 primitive for positioned removal?

  Search and positioned removal are also a primitives, but not defined  
 on the interface.  remove was a primitive on Tango's containers, and  
 dcollections was originally meant to be a replacement for Tango's  
 containers.
  I think the point is, if you have an interface reference, what would  
 be the minimum functionality needed so that you could use a container  
 without knowing its implementation.

 Yes, and I think remove(V v) does not belong to the minimum  
 functionality. It combines two functions (search and remove) and raises  
 questions such as what happens to duplicate elements.

the function removes one value if it is in the container, zero if it is  
not.

Another primitive interface Multi(V) adds a removeAll(V v) function.

Again, it combines two functions that are awkward or difficult to  
implement using ranges via interfaces.

 contains(V v); // returns whether an element is contained in the  
 colleciton

 I don't think this belongs to primitives. It's O(n) for many  
 containers and again it's a generic algorithm, not a member.

  Again, it's part of the minimal usable interface.  It's not a generic  
 algorithm, because some containers can implement this more efficiently.

 But this is exactly what I believe to be a mistake: you are abstracting  
 away algorithmic complexity.

Not really, I say right in the docs that contains(V) can take O(n) time.   
There's no abstraction.  I don't say that the algorithmic complexity is  
defined by the implementation.  It's no different than std.algorithm's  
find.

 Plus, to use the generic algorithms, you would need to use interfaces  
 as ranges which I think are completely useless.

 Why?

3 virtual calls per iteration, no possibility to inline, and reference  
copy semantics.  The latter is the biggest problem for me.  I want to be  
able to save ranges (iterators) for later use on containers, and having to  
call to the heap for each save is a deal killer.

 length(); // the length of the collection

 That's not a primitive either. std.algorithm.walkLength is. For me,  
 all sorts of red flags and alarm buzzers go off when primitives are  
 guaranteed that can't be implemented efficiently but by a subset of  
 containers. You can't discount complexity as a implementation detail.

  All current dcollection containers have O(1) length.

 Some containers can't define O(1) length conveniently.

length is actually defined to return ~0 if it cannot compute the length  
quickly :)  Forgot to mention that detail.  So such containers have an  
"out" so to say.

In fact, length is part of a different primitive I defined: Iterator(V)

Iterator(V) defines length and opApply and can be implemented by any  
class, not just containers.  My original goal when I planned dcollections  
for tango was for things like LineInput file readers and such to implement  
the Iterator interface.  That way you could do cool stuff like:

container.add(new LineInput("file.txt")); // add all the lines from the  
file

but it didn't come to fruition, so Iterator is only implemented on  
dcollections...

 dup(); // duplicate the collection
 opApply(int delegate(ref V v) dg); // iterate the collection
 opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the  
 collection
  That means it covers only empty in your list of must-have functions  
 (via length() == 0).

 How do you implement length() for a singly-linked list? Is empty()  
 going to take O(n)?

  first, dcollections' list implementation is doubly linked because all  
 collections are forward and backward iterable.
  second, even for singly linked lists, you can have either O(1) length  
 or O(1) splicing (consuming a link list range into another linked  
 list).  Dcollections' default link list implementation uses O(1) length  
 since I think splicing is a specialized requirement.

 Right. The question is how much pressure Container is putting on the  
 implementation. I'd rather leave it to the list implementation to decide  
 to store the length or not.

That is possible with my design.  I realize I didn't fully disclose the  
nature of length before.  But of course, there aren't any containers in  
dcollections that *don't* define length.

However, you make it sound like there are so many containers that would  
want to not store the length.  There is only one -- linked lists, and only  
with special requirements (O(1) splicing)

I guess this all goes back to how empty is to be composed.  In  
dcollections, all containers implement O(1) length, so that is not an  
issue.  If you had a specialized linked list implementation, I'm not sure  
how you could compose it with my primitives.  But I don't consider it to  
be a worthwhile deficiency to fix.  most of the time, you care if  
something is empty because you are about to take an element, or iterate  
it.  Just call the thing you want to do, and there are ways to check if it  
didn't return anything.  In any case, empty could be defined as a  
primitive if necessary (in all dcollections it would be written return  
length == 0).

 Add is not a primitive because the Map collections shouldn't assign  
 some random key to the new element.  removeAny is defined only on  
 sets and multisets, but I'm not sure that it couldn't be moved to  
 Collection, in fact, I'll probably do that.

 add is a primitive that takes Tuple!(K, V) as the element type.

  How do you define that on Container(V)?  on Map(K, V), set(K k, V v)  
 is an interface method.

 Map!(K, V) has Tuple!(K, V) as its element type.

That makes things awkward.  For example, to remove, you must remove the  
K-V pair.  typically you only want to specify the K or the V.  I realize  
it doesn't make things awkward for your container interface since you  
don't *define* a remove function, but your container interface isn't very  
useful as a general container.

 Note that it's missing begin and end which are defined on every  
 single container type (i.e. the equivalent of the all-elements  
 range).  This is because those primitives return a struct that is  
 different for every container type.

 So you can't write container-independent iteration code unless you use  
 opApply, in which case composition becomes tenuous.

  No, you can easily write container-independent iteration as long as  
 you have the implementation.

 In this context: container-independent = using the Container interface.  
 This is the whole purpose of creating a container hierarchy. If the  
 container design fosters knowing the implementation, maybe a class  
 hierarchy is not the right choice in the first place.

 If you use interfaces you can write opApply wrappers to do different  
 things.  I'm not sure what you mean by composition.

 For example, compose ranges a la retro or take.

how do you compose retro on an input range?  that's all you got with your  
container interface.  It's the same for opApply.

take is simple:

class TakeIterator(V) : Iterator(V)
{
    private Iterator!V src;
    private uint nelems;
    public this(Iterator!V src, uint nelems) { this.src = src; this.nelems  
= nelems;}
    public int opApply(int delegate(ref V v) dg)
    {
       uint elems = 0;
       int result = 0;
       foreach(ref v; src)
       {
           if(elems++ > nelems || (result = dg(v)) != 0)
		break;
       }
       return result;
    }
    public uint length()
    {
       uint len = src.length();
       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ?  
len : nelems;
    }
}


 It also surpasses opSlice via opApply, since all an input range can  
 do anyways is iterate.  In fact, opApply is more powerful because you  
 can change elements and remove elements (via purging).  Plus it's  
 more efficient than a range-via-interface.

 An input range is a subset of other (more powerful) ranges. It's also  
 much more flexible. I agree that calling range primitives via  
 interfaces can become an efficiency liability.

  How is it more flexible?  You can't replace data, and you can't remove  
 data while iterating, both of which are possible with dcollection's  
 primitives.  If you have a Container(E) which defines InputRange!E  
 opSlice, how do you get at the more defined range definition?  casting?

 You can replace data by assigning to range's elements. Removal is done  
 via positioned remove (which I think needs to be a primitive).

That is available in dcollections, just not part of the interface.  Every  
container implements remove(cursor) function, but since cursors are  
implementation specific, it can't be part of the interface.

  Not having opSlice on the interface guarantees you never have a  
 virtual call for iteration :)  opApply mitigates the virtual call on  
 the interface.

 And takes away the ability to compose ranges and to use algorithms with  
 the container.

As long as you only have the interface and not the actual container.  I  
have no problem with that.

 Above all: the primitive set for a container must be a small set of  
 functions that (a) can be implemented by all containers within  
 reasonable efficiency bounds, and (b) are container-specific, not  
 generic. IMHO any container design that defines a search(Element) as a  
 primitive is misguided. Searching is not a container primitive - it's  
 an algorithm. Only more specialized containers (e.g. trees, hashes  
 etc.) can afford to define search(Element) as a primitive. Linear  
 search is a generic algorithm that works the same for everyone. It  
 does not belong as a method of any container.

  If the minimal container design isn't usable without std.algorithm,  
 then I don't think it's worth having interfaces.

 Why?

If I have to pull out std.algorithm to do simple things like remove a  
specific element from a container, where std.algorithm may not do the most  
efficient thing, then why the hell have an interface in the first place?   
If I have an interface, I want to make it do all the things all containers  
can do, not delegate it to some other part of the library.  I trust that  
the interface knows how to do container functions in the best way possible.

 I think the other way: if the minimal container design is unusable with  
 std.algorithm, the design took a wrong turn somewhere.

If the OO *interface* does not support std.algorithm, then that's good by  
me, seeing as how you have to use non-inlinable virtual functions to do  
simple tasks on ranges that cannot be copied without allocating on the  
heap.  I think absolutely containers should support std.algorithm via the  
method std.algorithm is usable by any other range -- compile-time  
interfaces.

 If you need std.algorithm, you need the full implementation of the  
 container because it's a compile-time interface.

 How did you reach that conclusion? std.algorithm uses a syntactic  
 interface that is obeyed by interfaces too. There's no problem.

It's a bastardized hack to use interfaces with std.algorithm.  I think  
it's as useful as functional qsort.  Yeah the big-O is the same, but the  
implementation sucks.

 Interface ranges are something that should be avoided, it's like having  
 a programming language where everything has to be a class.

 I disagree. The negation of an implementation dogma can be just as  
 limiting as the dogma. The way I see it, a design defines some  
 desiderata. Then it does whatever it takes to fulfill them.

You're starting to get too smart for me here :)  I have no idea what  
disrderata means.

 If one desideratum is to use a class hierachy to write  
 container-independent code, then interface ranges naturally follow.  
 There's no ifs and buts about it.

The container hierarchy can support two *different* paradigms.  First, the  
paradigm of runtime interfaces which may be useful for using containers to  
store pieces of data to go with an object.  Such as a socket cache that  
maps ip addresses to sockets.  This cache cares nothing about sorting or  
doing retro iteration on the socket cache.  It cares not about the  
implementation of the map, just that the map does map-like things (i.e.  
assign this key to this value, remove this key, etc.).

The other paradigm is to have access to std.algorithm in the most  
efficient way possible.  Such access requires full implementation  
knowledge to make the most efficient implementation of algorithms.  In  
fact, containers can *use* std.algorithm underneath their member functions  
if so desired.

 What you are saying seems completely incorrect to me: "since not all  
 containers can implement fast search, I'm going to guarantee that *all*  
 containers use a linear search via their interface.

 This is a misunderstanding. In the STL linear containers don't define  
 find(), but associative containers do. That is the correct design.

But STL isn't accessed without the full implementation, you are comparing  
apples to oranges here.

You said that find is something that should be relegated to  
std.algorithm.  Isn't std.algorithm's find a linear search?  If I have a  
Container!E, don't I need to use std.algorithm to search for an element  
with it's returned range?  How is this not forcing a linear search when  
using an interface to a container?  What is the answer to that, don't  
search for an element in a container unless you have the full  
implementation?  That works in dcollections too!

 *AND* I want to make each loop in the search algorithm call 3 virtual  
 functions!"

 Not necessarily. This happens only if you use the Container interface to  
 write container-independent code. It is the cost it takes for the design  
 to fulfill its desiderata.

Is that not the case we are discussing?

 How is that better than a search function that guarantees linear  
 performance but gives the option of being as fast as possible with no  
 loop virtual calls?

 It is better because it doesn't write off search complexity as an  
 implementation detail. "Linear or better" is not a good guarantee. A  
 good guarantee is "From this node of the hierarchy down, this primitive  
 is defined to guarantee O(log n) search". Linear search is a generic  
 algorithm and does not belong to any container.

That is available via the full implementation.  With the interface, it's  
the "best you can do", because that's all that is available.  "linear or  
better" is a better guarantee than "linear with guaranteed 3 no-inlining  
virtual function calls per element."

-Steve

Jan 03 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude 
 it from implementing opSlice, and does not preclude using ranges of 
 it in std.algorithm.  If I'm not mistaken, all functions in 
 std.algorithm rely on compile time interfaces.  opApply allows for 
 full input range functionality for things like copying and outputting 
 where templates may not be desired.

 The point is how much container-independent code can someone write by 
 using the Container interface. If all you have is a Container, you 
 can't use it with any range algorithm.

 
 The answer is, you don't.  Ranges via interfaces are slower than 
 primitive functions defined on the interface, so use what's best for 
 interfaces when you have an interface and what's best for compile-time 
 optimization when you have the full implementation.

I understand we don't agree on this. To me, making containers work with 
algorithms is a drop-dead requirement so I will rest my case.

Nevertheless, I think there's one point that got missed. It's a tad 
subtle, and I find it pretty cool because it's the first time I used 
covariant returns for something interesting. Consider:

interface InputIter(T) { ... }
interface ForwardIter(T) : InputIter!T { ... }

class Container(T) {
     InputIter!T opSlice() { ... }
}

class Array(T) : Container(T) {
     class Iterator : ForwardIter!T {
	... all final functions ...
     }
     Iterator opSlice();
}

Now there are two use cases:

a) The user uses Array specifically.

auto a = new Array!int;
sort(a[]);

In this case, sort gets a range of type Array!(int).Iterator, which 
defines final primitives. Therefore, the compiler does not emit ONE 
virtual call throughout. I believe this point was lost.

b) The user wants generality and OOP-style so uses Container throughout:

Container!int a = new Array!int;
auto found = find(a[], 5);

This time find gets an InputIter!int, so it will use virtual calls for 
iteration.

The beauty of this design is that without any code duplication it 
clearly spans the spectrum between static knowledge and dynamic 
flexibility - within one design and one implementation. This is the 
design I want to pursue.

 Yes, and I think remove(V v) does not belong to the minimum 
 functionality. It combines two functions (search and remove) and 
 raises questions such as what happens to duplicate elements.

 
 the function removes one value if it is in the container, zero if it is 
 not.

So you need a way to do searching and a way to do positioned remove. Why 
combine them into one, when you can use both separately to great effect?

 Another primitive interface Multi(V) adds a removeAll(V v) function.

Why do you need a separate interface for removeAll? Are there containers 
that can remove one value but not all?

 Again, it combines two functions that are awkward or difficult to 
 implement using ranges via interfaces.

I contend that point.

 contains(V v); // returns whether an element is contained in the 
 colleciton

 I don't think this belongs to primitives. It's O(n) for many 
 containers and again it's a generic algorithm, not a member.

  Again, it's part of the minimal usable interface.  It's not a 
 generic algorithm, because some containers can implement this more 
 efficiently.

 But this is exactly what I believe to be a mistake: you are 
 abstracting away algorithmic complexity.

 
 Not really, I say right in the docs that contains(V) can take O(n) 
 time.  There's no abstraction.  I don't say that the algorithmic 
 complexity is defined by the implementation.  It's no different than 
 std.algorithm's find.
 
 Plus, to use the generic algorithms, you would need to use interfaces 
 as ranges which I think are completely useless.

 Why?

 
 3 virtual calls per iteration, no possibility to inline, and reference 
 copy semantics.  The latter is the biggest problem for me.  I want to be 
 able to save ranges (iterators) for later use on containers, and having 
 to call to the heap for each save is a deal killer.

Per my argument above, there may be zero virtual calls per iteration. (I 
want to make sure my point about covariance was understood even if it is 
not considered compelling.)

 However, you make it sound like there are so many containers that would 
 want to not store the length.  There is only one -- linked lists, and 
 only with special requirements (O(1) splicing)

I agree it's tenuous to leave length() out. It's a judgment call.

 Map!(K, V) has Tuple!(K, V) as its element type.

 
 That makes things awkward.  For example, to remove, you must remove the 
 K-V pair.  typically you only want to specify the K or the V.  I realize 
 it doesn't make things awkward for your container interface since you 
 don't *define* a remove function, but your container interface isn't 
 very useful as a general container.

It is. What is true is a converse of your statement - a general 
container isn't very useful as a map - which I'd agree with, and is 
sensible OO design. As long as a map is seen as a container, it _does_ 
store K-V pairs. If I know it's a map, great, it defines a different 
primitive for removing an element given its key.

 If you use interfaces you can write opApply wrappers to do different 
 things.  I'm not sure what you mean by composition.

 For example, compose ranges a la retro or take.

 
 how do you compose retro on an input range?  that's all you got with 
 your container interface.  It's the same for opApply.

You need to be a bit down in the hierarchy to use retro.

 take is simple:
 
 class TakeIterator(V) : Iterator(V)
 {
    private Iterator!V src;
    private uint nelems;
    public this(Iterator!V src, uint nelems) { this.src = src; 
 this.nelems = nelems;}
    public int opApply(int delegate(ref V v) dg)
    {
       uint elems = 0;
       int result = 0;
       foreach(ref v; src)
       {
           if(elems++ > nelems || (result = dg(v)) != 0)
         break;
       }
       return result;
    }
    public uint length()
    {
       uint len = src.length();
       return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems 
 ? len : nelems;
    }
 }

And then how do I compose take with something else, or pass it to 
std.algorithm.find()? This whole thing doesn't hold water.

 You can replace data by assigning to range's elements. Removal is done 
 via positioned remove (which I think needs to be a primitive).

 
 That is available in dcollections, just not part of the interface.  
 Every container implements remove(cursor) function, but since cursors 
 are implementation specific, it can't be part of the interface.

Well that's a problem with the design, no? Why then define a hierarchy? 
A simple set of unrelated types may be a better design.

 If I have to pull out std.algorithm to do simple things like remove a 
 specific element from a container, where std.algorithm may not do the 
 most efficient thing, then why the hell have an interface in the first 
 place?

Things can be arranged such that std.algorithm does do the best thing, 
and is also the most general and reusable approach. The STL has partly 
shown that. I plan to take that one step further - to show that 
container-independent code can be gainfully written with a hierarchy of 
containers (something that STL failed at).

 If I have an interface, I want to make it do all the things all 
 containers can do, not delegate it to some other part of the library.  I 
 trust that the interface knows how to do container functions in the best 
 way possible.

The interface must be expressive enough to let generic algorithms do 
their job.

 I think the other way: if the minimal container design is unusable 
 with std.algorithm, the design took a wrong turn somewhere.

 
 If the OO *interface* does not support std.algorithm, then that's good 
 by me, seeing as how you have to use non-inlinable virtual functions to 
 do simple tasks on ranges that cannot be copied without allocating on 
 the heap.

Did my covariance argument above convince you otherwise?

 I think absolutely containers should support std.algorithm 
 via the method std.algorithm is usable by any other range -- 
 compile-time interfaces.

I agree :o).

 I disagree. The negation of an implementation dogma can be just as 
 limiting as the dogma. The way I see it, a design defines some 
 desiderata. Then it does whatever it takes to fulfill them.

 
 You're starting to get too smart for me here :)  I have no idea what 
 disrderata means.

Come on, the Internet is there. I meant "essential goal".

 If one desideratum is to use a class hierachy to write 
 container-independent code, then interface ranges naturally follow. 
 There's no ifs and buts about it.

 
 The container hierarchy can support two *different* paradigms.  First, 
 the paradigm of runtime interfaces which may be useful for using 
 containers to store pieces of data to go with an object.  Such as a 
 socket cache that maps ip addresses to sockets.  This cache cares 
 nothing about sorting or doing retro iteration on the socket cache.  It 
 cares not about the implementation of the map, just that the map does 
 map-like things (i.e. assign this key to this value, remove this key, 
 etc.).
 
 The other paradigm is to have access to std.algorithm in the most 
 efficient way possible.  Such access requires full implementation 
 knowledge to make the most efficient implementation of algorithms.  In 
 fact, containers can *use* std.algorithm underneath their member 
 functions if so desired.

Yah. Covariance takes care of all of the above.

I'd almost agree with you as of a couple of months ago, when the whole 
covariance thing hadn't occurred to me.

 What you are saying seems completely incorrect to me: "since not all 
 containers can implement fast search, I'm going to guarantee that 
 *all* containers use a linear search via their interface.

 This is a misunderstanding. In the STL linear containers don't define 
 find(), but associative containers do. That is the correct design.

 
 But STL isn't accessed without the full implementation, you are 
 comparing apples to oranges here.
 
 You said that find is something that should be relegated to 
 std.algorithm.

YES. If a design must express define linear search in more than one 
place, then that design is suboptimal. This is an important goal I want 
to follow.

 Isn't std.algorithm's find a linear search?

Yes.

 If I have a 
 Container!E, don't I need to use std.algorithm to search for an element 
 with it's returned range?

No. It's like in the STL - either the container defines its own 
searching primitives (e.g. set or map), or you can always grab the 
container's range and use the well-known linear search defined by 
std.algorithm.

 How is this not forcing a linear search when 
 using an interface to a container?

It isn't forcing a linear search. It is fostering awareness of the 
capabilities needed by the compiler.

A design that defines a best-effort find() as a member is suboptimal. 
There's no guarantee it can provide. A putative user cannot tell 
anything about the complexity of their code that uses the find() member. 
A good design has the user say, hey, linear search is ok here; in this 
other place, I need a fast find so I can't use Container!T, I need 
AssociativeContainer!T.

 What is the answer to that, don't 
 search for an element in a container unless you have the full 
 implementation?  That works in dcollections too!

The shape and characteristic of the search is defined by the type of the 
container.

A great library that got mentioned here in the past:

http://www.gobosoft.com/eiffel/gobo/structure/container.html

I strongly recommend studying that design; it is very close to 
optimality. They define DS_SEARCHABLE as a subclass of DS_CONTAINER - as 
they should.

 *AND* I want to make each loop in the search algorithm call 3 virtual 
 functions!"

 Not necessarily. This happens only if you use the Container interface 
 to write container-independent code. It is the cost it takes for the 
 design to fulfill its desiderata.

 
 Is that not the case we are discussing?

No, there are two cases as I discussed above in this post.

 How is that better than a search function that guarantees linear 
 performance but gives the option of being as fast as possible with no 
 loop virtual calls?

 It is better because it doesn't write off search complexity as an 
 implementation detail. "Linear or better" is not a good guarantee. A 
 good guarantee is "From this node of the hierarchy down, this 
 primitive is defined to guarantee O(log n) search". Linear search is a 
 generic algorithm and does not belong to any container.

 
 That is available via the full implementation.  With the interface, it's 
 the "best you can do", because that's all that is available.  "linear or 
 better" is a better guarantee than "linear with guaranteed 3 no-inlining 
 virtual function calls per element."

I think one issue might be that you think "interface" and 
"implementation" with nothing in between. I'm talking "gradually more 
specialized interface", as the Gobo library above defines.


Andrei

Jan 03 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

We are arguing in circles, so I will just stop :)

I'll address the one point I think we both feel is most important below

On Sun, 03 Jan 2010 17:19:52 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Not having opSlice be part of the interface itself does not preclude  
 it from implementing opSlice, and does not preclude using ranges of  
 it in std.algorithm.  If I'm not mistaken, all functions in  
 std.algorithm rely on compile time interfaces.  opApply allows for  
 full input range functionality for things like copying and outputting  
 where templates may not be desired.

 The point is how much container-independent code can someone write by  
 using the Container interface. If all you have is a Container, you  
 can't use it with any range algorithm.

  The answer is, you don't.  Ranges via interfaces are slower than  
 primitive functions defined on the interface, so use what's best for  
 interfaces when you have an interface and what's best for compile-time  
 optimization when you have the full implementation.

 I understand we don't agree on this. To me, making containers work with  
 algorithms is a drop-dead requirement so I will rest my case.

 Nevertheless, I think there's one point that got missed. It's a tad  
 subtle, and I find it pretty cool because it's the first time I used  
 covariant returns for something interesting. Consider:

 interface InputIter(T) { ... }
 interface ForwardIter(T) : InputIter!T { ... }

 class Container(T) {
      InputIter!T opSlice() { ... }
 }

 class Array(T) : Container(T) {
      class Iterator : ForwardIter!T {
 	... all final functions ...
      }
      Iterator opSlice();
 }

 Now there are two use cases:

 a) The user uses Array specifically.

 auto a = new Array!int;
 sort(a[]);

 In this case, sort gets a range of type Array!(int).Iterator, which  
 defines final primitives. Therefore, the compiler does not emit ONE  
 virtual call throughout. I believe this point was lost.

 b) The user wants generality and OOP-style so uses Container throughout:

 Container!int a = new Array!int;
 auto found = find(a[], 5);

 This time find gets an InputIter!int, so it will use virtual calls for  
 iteration.

 The beauty of this design is that without any code duplication it  
 clearly spans the spectrum between static knowledge and dynamic  
 flexibility - within one design and one implementation. This is the  
 design I want to pursue.

I see why it is attractive to you.  But to me, algorithms are not the main  
driver for containers.  One thing we both agree on -- when you know the  
full implementation, algorithms from std.algorithm should be implemented  
as fast as possible.  Where we disagree is what is desired when the full  
implementation is not known.  I contend that the ability to link with  
std.algorithm isn't a requirement in that case, and is not worth  
complicating the whole world of ranges to do so (i.e. build std.algorithm  
just in case you have reference-type ranges with a "save" requirement).   
If you don't know the implementation, don't depend on std.algorithm to  
have all the answers, depend on the implementation which is abstracted  
correctly by the interface.

What this means is there will be some overlap in functions that are  
defined on the interfaces and functions defined in std.algorithm.  Most  
likely the overlapping functions both point to the same implementation  
(naturally, this should live in std.algorithm).  This is for convenience  
to people who want to use containers in the interface form, so they do not  
need to concern themselves with the abilities of std.algorithm, they just  
want containers that help them get work done.

There is still one flaw in your ideas for covariant types:  even though  
final functions can be used throughout and the possibility for inlining  
exists, you *still* need to use the heap to make copies of ranges.  That  
was and still is my biggest concern.

I think the rest of this whole post is based on our disagreement of these  
design choices, and it really doesn't seem productive to continue all the  
subtle points.

[rest of growing disagreement snipped]

BTW, I use covariant return types freely in dcollections and I agree it  
kicks ass.  It seems like black magic especially when you are returning  
possibly a class or interface which need to be handled differently.

-Steve

Jan 03 2010

D Programming

C/C++ Programming

Other

digitalmars.D - output ranges: by ref or by value?