digitalmars.D - protocol for using InputRanges
- Walter Bright (13/13) Mar 22 2014 It's become clear to me that we've underspecified what an InputRange is....
- Jakob Ovrum (4/8) Mar 22 2014 If `front` was destructive there would be little point in having
- Adam D. Ruppe (7/10) Mar 22 2014 Remember that front is a getter property, which means it should
- Rikki Cattermole (8/24) Mar 22 2014 It would be nice to have ranges full stop described. And how to
- Chris (13/43) Mar 27 2014 I agree. I've been using ranges for a while now and have tried
- Walter Bright (3/15) Mar 27 2014 Following the protocol empty-front-popFront always works.
- Chris (24/43) Mar 28 2014 Not only there. The whole issue of front and popFront has not
- w0rp (43/43) Mar 28 2014 I think something has gone wrong in this discussion somewhere. It
- Chris (14/14) Mar 28 2014 Earlier Walter wrote:
- Regan Heath (6/19) Mar 28 2014 You can build safety on top of performance. You cannot do the opposite....
- Chris (3/25) Mar 28 2014 But should unsafe+fast be the default or rather an option for
- Regan Heath (5/31) Mar 28 2014 Pass. My point was only that it needs to exist.
- Steven Schveighoffer (4/6) Mar 31 2014 -- when it is known empty will return false through logical deduction.
- Jonathan M Davis (42/61) Mar 23 2014 You definitely don't have to call empty before calling front if you know...
- Tommi (6/8) Mar 23 2014 I thought that "breaking existing code" meant either "causing
- Walter Bright (5/9) Mar 25 2014 This is simply not practical. The canonical example is a tty. You cannot...
- Marco Leise (22/40) Mar 23 2014 Looking at ranges as a user with all the subjectivity it
- w0rp (19/19) Mar 23 2014 I understand it like this.
- w0rp (6/6) Mar 23 2014 I should add that I have implemented some ranges where .front and
- Szymon Gatner (8/28) Mar 23 2014 That is not consistent with the first part of your reply and is
- w0rp (37/70) Mar 24 2014 You read wrong. Say you have this sequence of three numbers and
- Walter Bright (2/3) Mar 25 2014 This is another undocumented assumption.
- Szymon Gatner (15/23) Mar 23 2014 IMO: yes. Logic of empty() sould be const and not have side
- Szymon Gatner (7/34) Mar 23 2014 I tend to think about pair of C++ iterators, which are
- Steven Schveighoffer (24/29) Mar 24 2014 Here is the crux of the issue. I think aside from Walter's question, we ...
- Joseph Rushton Wakeling (17/26) Mar 23 2014 There are some not-very-nice exceptions to that in std.random, where in ...
- Timon Gehr (14/17) Mar 23 2014 An analogous problem exists and is more severe for RandomAccessRanges.
- monarch_dodra (10/32) Mar 24 2014 I have an open proposal for a "cache" range adaptor that somewhat
- Tommi (8/24) Mar 23 2014 Is InputRange supposed to be a one-pass range and am I supposed
- Steven Schveighoffer (4/34) Mar 24 2014 A range interface only works for a buffered stream, which naturally allo...
- Dicebot (7/23) Mar 24 2014 I think there is one design mistake with current InputRange rules
- Joseph Rushton Wakeling (16/21) Mar 24 2014 I floated some ideas along those lines, for a "first" method that would ...
- Dicebot (18/31) Mar 25 2014 I was thinking about something more simple. Current pattern is:
- Andrei Alexandrescu (4/33) Mar 25 2014 Suggestion: focusing on what to do within the present context is more
- Dicebot (3/6) Mar 25 2014 I do work on template argument pack & friends right now, leave me
- Chris (15/31) Mar 25 2014 In many cases it makes sense to put things into front, for
- Walter Bright (52/52) Mar 25 2014 It's pretty clear that:
- monarch_dodra (55/84) Mar 25 2014 http://dlang.org/phobos/std_range.html#isInputRange
- Walter Bright (26/83) Mar 25 2014 It's a reasonable requirement. If your range has an issue with this, it ...
- Andrei Alexandrescu (2/4) Mar 25 2014 This part I disagree with. -- Andrei
- Andrei Alexandrescu (2/3) Mar 25 2014 No, front requires a buffer in the range. -- Andrei
- Andrei Alexandrescu (20/57) Mar 25 2014 s/COMPLETELY/loosely/
- Timon Gehr (2/5) Mar 25 2014 'map' may fail this criterion. In this case, is the blame put on the use...
- monarch_dodra (4/13) Mar 25 2014 Depends how you define "same" though :/ But yeah.
- Walter Bright (20/60) Mar 25 2014 The throwing part, or the not good design part?
- Regan Heath (23/44) Mar 26 2014 Surely you'd simply start with a range of pointers to expensive-to-copy ...
- Steven Schveighoffer (6/51) Mar 26 2014 These two rules are not necessary if you know the range is not empty. Se...
- Steven Schveighoffer (5/23) Mar 26 2014 Gah, I didn't cut out the right rules. I meant the two rules that empty ...
- Regan Heath (8/31) Mar 26 2014 I see. I was thinking we ought to make empty mandatory to give more
- Steven Schveighoffer (7/14) Mar 26 2014 Yes, but when you know that empty is going to return false, there isn't ...
- monarch_dodra (8/12) Mar 26 2014 Not only that, but it's also a performance criteria: If you are
- Regan Heath (11/21) Mar 26 2014 What guarantees range2 is longer than range1? The isArray case checks
- monarch_dodra (13/29) Mar 26 2014 The interface: The target *shall* have enough room to accommodate
- Regan Heath (30/56) Mar 26 2014 Ok. So long as *something* is throwing that Error I am down with this.
- Daniel Murphy (3/6) Mar 26 2014 Some ranges will give you their length...
- Regan Heath (7/13) Mar 27 2014 Sure. And generally you could use it. copy() doesn't, and I was talkin...
- Regan Heath (14/29) Mar 26 2014 Sure, it's not required for some algorithms in some situations.
- Andrei Alexandrescu (4/19) Mar 26 2014 I think requiring users to call empty before front on input ranges is a
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/13) Mar 26 2014 Then the name should change to "ready". It makes sense to require
- Paulo Pinto (6/19) Mar 27 2014 Why not?
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) Mar 27 2014 Why not what? A query for "empty()" should not have any side
- Steven Schveighoffer (17/38) Mar 26 2014 if(!r.empty)
- Walter Bright (9/23) Mar 26 2014 The idea is that there are three functions: empty, front, and popFront. ...
- Steven Schveighoffer (12/41) Mar 26 2014 OK, but it's logical to assume you *can* avoid a call to empty if you kn...
- Walter Bright (6/14) Mar 26 2014 As with *any* API, if you look under the hood and make assumptions about...
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/35) Mar 27 2014 I was originally going to do that, but then I took a closer look
- Regan Heath (23/57) Mar 27 2014 u =
- Andrei Alexandrescu (6/10) Mar 27 2014 Probably we need to amend that. For efficient ranges, front() and
- Steven Schveighoffer (25/36) Mar 27 2014 he
- Andrei Alexandrescu (2/4) Mar 27 2014 filter comes to mind. -- Andrei
- monarch_dodra (6/19) Mar 27 2014 I just want to point out that I think allowing empty to have an
-
Steven Schveighoffer
(8/25)
Mar 27 2014
On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra
- Andrei Alexandrescu (2/24) Mar 27 2014 Yah, agreed. -- Andrei
- Walter Bright (5/6) Mar 27 2014 Completely unworkable. To determine if a range is empty or not, it may h...
- Steven Schveighoffer (17/23) Mar 27 2014 In the land of ranges, it's construction and popFront that generally
- Walter Bright (7/17) Mar 27 2014 The range protocol is designed to work with streams. It's a giant fail i...
- Steven Schveighoffer (10/28) Mar 27 2014 A 1 byte buffered stream? If it's performance you are looking for, you
- Walter Bright (5/6) Mar 27 2014 Are ready to implement a parallel universe of stream based algorithms to...
- Steven Schveighoffer (7/14) Mar 28 2014 Not necessary. You just need to implement one range on top of a buffered...
- Andrei Alexandrescu (2/16) Mar 28 2014 It increasingly seems to me we need to do this. -- Andrei
- Steven Schveighoffer (5/25) Mar 28 2014 It is in the works. I need to finish it. I've been incorporating Dmitry'...
- monarch_dodra (13/14) Mar 29 2014 What I find funny here is that walter is saying we must enforce
- monarch_dodra (12/13) Mar 29 2014 Question: Should this new restriction only be applied to
- monarch_dodra (4/18) Mar 30 2014 Did this conversation die the instant I actually had something
- Walter Bright (3/5) Mar 28 2014 Meaning it must also write through a pointer if it did read something.
- Marco Leise (30/37) Mar 29 2014 I guess we all have a clear concept of streams in our mind
- Steven Schveighoffer (4/11) Mar 31 2014 I don't understand this point/question.
- Andrei Alexandrescu (4/8) Mar 27 2014 It's not a giant fail, we just need to adjust the notion.
- Walter Bright (5/12) Mar 27 2014 Are you suggesting that ranges needn't support streams?
- Johannes Pfau (8/28) Mar 28 2014 Ranges have equivalents in other languages:
- Walter Bright (4/10) Mar 28 2014 Do you see a point to be able to, in an algorithm, seamlessly swap a soc...
- Dmitry Olshansky (8/19) Mar 28 2014 Certainly NOT a socket. There is no escaping the fact that there are
- Walter Bright (5/23) Mar 28 2014 Yes, it does require a one element buffer. But seriously, does a one cha...
- Dmitry Olshansky (7/34) Mar 28 2014 WAT? The overhead is in issuing system calls, you'd want to do as little...
- Walter Bright (2/4) Mar 28 2014 That's why we have things like byLine().
- Dmitry Olshansky (6/11) Mar 28 2014 Which uses C's BUFFERED I/O and it reads from it byte by byte via getc.
- Walter Bright (2/5) Mar 28 2014 How about a PR to fix it?
- w0rp (9/16) Mar 28 2014 Yes. I hold the opinion that there is not a whole lot of reason
- Dmitry Olshansky (6/13) Mar 29 2014 Sorry, I'm not inclined to do any work on hacking arbitrary C runtime
- Andrei Alexandrescu (2/13) Mar 28 2014 byLine doesn't use getc. -- Andrei
- Dmitry Olshansky (11/25) Mar 29 2014 Indeed not every version. Linux looks almost OK.
- Johannes Pfau (24/41) Mar 28 2014 Sorry, but no. It sure sounds nice first, but I can't really imagine a
- w0rp (7/28) Mar 28 2014 I think a key is to offer something with gives you chunks at a
- Johannes Pfau (18/27) Mar 28 2014 byChunk is implemented on top of the file rawRead API though, and
- QAston (3/35) Mar 28 2014 There are stream iterators in C++:
- Dmitry Olshansky (8/20) Mar 28 2014 If streams are like hot raw sockets then certainly it doesn't make
- QAston (6/11) Mar 27 2014 The protocol is not intuitive. I'm not entirely against having to
- Walter Bright (10/11) Mar 27 2014 I find empty-front-popFront as perfectly intuitive. I don't find the cou...
- Paolo Invernizzi (10/23) Mar 28 2014 I _strongly_ agree with Walter: people learning D in my groups
- Regan Heath (6/8) Mar 28 2014 This is actually not true.
- Paolo Invernizzi (6/12) Mar 28 2014 What I'm meaning, it's that we don't care: we are always
- John Stahara (4/16) Mar 28 2014 To clarify for Mr. Invernizzi: the "we" to which he refers is the group
- Regan Heath (6/22) Mar 28 2014 Thanks, that was confusing me :)
- H. S. Teoh (7/27) Mar 28 2014 [...]
- Paolo Invernizzi (4/25) Mar 28 2014 Thank you John, that's exact: I'm talking about my colleagues
- Andrei Alexandrescu (3/9) Mar 27 2014 That's a good point too.
- monarch_dodra (9/23) Mar 27 2014 Yes, but the "observability" should be sort lived, since empty is
- RivenTheMage (5/19) Mar 28 2014 Maybe new kind of range can help?
- Walter Bright (2/10) Mar 27 2014 Right, it does say that, and it is seriously wrong.
- Steven Schveighoffer (17/40) Mar 27 2014 Like range.save. It's "required", but frequently omitted, without any
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/15) Mar 27 2014 You probably can run-time test this in Debug builds, but…
- Walter Bright (6/10) Mar 27 2014 Sure, but what is awkward about empty-front-popFront?
- Steven Schveighoffer (6/19) Mar 27 2014 e:
- Andrei Alexandrescu (2/6) Mar 27 2014 I think byXchar are important and are not streams. -- Andrei
- Steven Schveighoffer (4/10) Mar 27 2014 What's broken about those?
- Andrei Alexandrescu (2/12) Mar 27 2014 Speed.
- monarch_dodra (21/40) Mar 27 2014 I call shenanigans. This is the code:
- Andrei Alexandrescu (3/6) Mar 27 2014 I was on the verge, and we need to discuss this more. A constructor that...
- monarch_dodra (25/34) Mar 27 2014 I still think there's ambiguity in the word "lazy". I think this
- Walter Bright (5/6) Mar 27 2014 As mentioned earlier, requiring this of a range constructor makes it imp...
- Steven Schveighoffer (8/14) Mar 27 2014 Not impossible. Use lazy initialization.
- Walter Bright (7/16) Mar 27 2014 That still doesn't work, because then you could be calling front, and fr...
- Steven Schveighoffer (18/38) Mar 27 2014 In the cases where you know there is input, that's not true. What I am
- Walter Bright (17/33) Mar 27 2014 You can't use that on generic code, because generic code cannot assume t...
- Steven Schveighoffer (23/29) Mar 28 2014 There is a difference here that I want to establish.
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (19/54) Mar 28 2014 Does a flag "isInitialized" even have an effect on performance
- Steven Schveighoffer (27/42) Mar 28 2014 =
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (22/47) Mar 29 2014 I was more thinking of the fact that you need to read something
- Steven Schveighoffer (45/78) Mar 31 2014 :
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/36) Mar 31 2014 I've found the example I was talking about:
- Steven Schveighoffer (17/26) Mar 31 2014 d@puremagic.com
- Timon Gehr (7/13) Mar 27 2014 I feel uneasy about the assertion that code that assumes that 'empty'
- Timon Gehr (2/3) Mar 27 2014 arbitrarily restricting interactions to some 'protocol'
- Walter Bright (2/4) Mar 27 2014 What if get() fails?
- H. S. Teoh (26/31) Mar 27 2014 That's the whole point of Nullable!T: you can return null if get()
- monarch_dodra (23/45) Mar 28 2014 It's not that *you* know a particular range doesn't need "empty"
- Rainer Schuetze (101/130) Mar 26 2014 IIUC what you are proposing would be covered better by removing front
- Rainer Schuetze (10/15) Mar 26 2014 Ouch, pasted before fixing:
- Walter Bright (11/15) Mar 27 2014 Different things being ranged over make different decisions as to what g...
- Rainer Schuetze (6/22) Mar 27 2014 This loop is intuitive. Not being allowed to call empty or front
- Walter Bright (4/7) Mar 27 2014 I can concede that. But I can't concede being able to call front without...
- "Tobias =?UTF-8?B?TcO8bGxlciI=?= <troplin bluewin.ch> (24/34) Mar 28 2014 Disclaimer: I'm a C++ programmer just lurking here, I've never
- Szymon Gatner (11/33) Mar 28 2014 In c++ if you had a list or deque you would obviously did
- w0rp (10/45) Mar 28 2014 Well, it might seem kind of weird at first that an InputRange has
- "Oscar =?UTF-8?B?TWFydMOtbiI=?= <omarmed0000 hotmail.com> (15/50) Mar 30 2014 First sorry for my english.
- Walter Bright (5/21) Mar 27 2014 What happens if empty is called twice in a row? Two characters get read!...
- Rainer Schuetze (5/28) Mar 27 2014 Good catch. Somehow I pasted the wrong version, but posted the
- Walter Bright (3/4) Mar 27 2014 I think that is the very definition of looking under the hood in order t...
- Regan Heath (16/25) Mar 27 2014 if(r.empty)
- Jonathan M Davis (18/40) Mar 26 2014 I don't know. It's not the end of the world if we require it (at least w...
- Walter Bright (5/11) Mar 27 2014 Many things, like some ttys, can not be tested for being empty without r...
- =?UTF-8?B?Ikx1w61z?= Marques" (33/35) Mar 27 2014 Some thoughts:
- Daniel Murphy (17/20) Mar 29 2014 Not by itself - if a range's empty has returned true, calling empty
- Peter Alexander (5/7) Mar 29 2014 As someone who has missed this discussion, can I just say this:
It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.
Mar 22 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.If `front` was destructive there would be little point in having it separate from `popFront`. I think it must be non-destructive to make sense.
Mar 22 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:auto e = r.front;Remember that front is a getter property, which means it should work like a variable. Typically, reading a variable is not destructive and needs no preparation. There's exceptions to the rule, but they almost always work this way, so ranges should too.If true, this means that r.front will have to cache a copy in many cases.Indeed, in many of my quick input ranges, I just make front a regular variable and popFront updates it to the next item.
Mar 22 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.It would be nice to have ranges full stop described. And how to use/make them in D. I have yet to learn them because of no official documentation on them. On this topic we also need to work on common patterns and creating documentation on e.g. the site or wiki for it. Saw that we do have a bit in the wiki under tutorials. Maybe if I get some time I'll work on that.
Mar 22 2014
On Sunday, 23 March 2014 at 01:07:27 UTC, Rikki Cattermole wrote:On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic.It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.It would be nice to have ranges full stop described. And how to use/make them in D. I have yet to learn them because of no official documentation on them. On this topic we also need to work on common patterns and creating documentation on e.g. the site or wiki for it. Saw that we do have a bit in the wiki under tutorials. Maybe if I get some time I'll work on that.
Mar 27 2014
On 3/27/2014 3:21 AM, Chris wrote:I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic.Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty.
Mar 27 2014
On Friday, 28 March 2014 at 04:58:28 UTC, Walter Bright wrote:On 3/27/2014 3:21 AM, Chris wrote:Not only there. The whole issue of front and popFront has not been made clear yet (unless it's my lack of understanding). What should and should not be done there, even if you stick to the basic protocol (!empty > front > popFront). I like the concept of ranges and use them a lot, because they help me to create independent components. But sometimes I think they are just glorified for-loops in the sense that we know more or less what the input is, and part of the reason why we have this thread is that we want everything to be specialized and generic at the same time. E.g. monarch_dodra wrote: "It's not that *you* know a particular range doesn't need "empty" called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the "splitting" elements. It then realizes it has walked N elements." But then it's no longer generic. If your range depends on another algorithm (that you assume or know to have checked something), then it's no longer generic. In many cases "generic" means an overhead of checks, and is anathema to performance tuning. I think it's the issue of generic vs. specialized / optimized that started this thread. And this is a tough one to sort out.I agree. I've been using ranges for a while now and have tried different implementations based on advice given on this forum and depending on each case. After reading this thread I looked at some of my ranges and I have to say I still have no clue as to what should and should _not_ be done (regardless of whether you _can_ do it). For a while I thought that it's my lack of understanding, but this thread shows that everyone has a different view of ranges. Guidelines with use cases would be great. I remember I mentioned this in another thread already. The thing is that experimenting without proper guidelines leaves your code in an inconsistent state where you have two or more ranges doing technically the same thing but each with a different logic.Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty.
Mar 28 2014
I think something has gone wrong in this discussion somewhere. It is like this. Do not assume .empty is const, this is unreasonable. Do not assume .front is const, this is unreasonable. .popFront has the only high-level observable difference in removing something from the front of an InputRange. I say it is unreasonable to expect .empty or .front to be const because a range which is strictly an InputRange (not a ForwardRange...) is likely to be doing some kind of work where stepping through its sequence of data is destructive. As soon as the thing is read in the implementation of the InputRange, you can't read that thing again. You've cached it or it is gone. Thus .empty and .front must be non-const and lazily evaluate caching the next value. I do not see it as a major issue to check .empty or call .front many times, even with this caching behaviour. Commonly this is as simple as checking whether or not you still have a value to provide inside the InputRange. We use assert(!empty) etc. because honestly I cannot imagine another way to make this safe. If you really, really hate this double-checking in .empty or .front, you could try to change InputRanges so they behave like Java Iterators. I believe this is a really bad idea. You have to start thinking about returning nullable types. You have to push the job of caching values on to users of the InputRanges, as in each and every algorithm. It's just a really bad choice, and with the kind of boxing you'll need to do for nullable types, you'll surely end off worse as far as performance goes anyway. Really, the range protocol is as Walter specified at the start of this thread. The documentation should reflect how .empty or .front may result in caching the next value. I really don't think this is a major performance problem worth worrying about, though. If it's really, really worth it for certain standard algorithms, perhaps something can be done in private range methods or similar to offer some kind of speed boost. ('package' privacy for std.* is feasible.) I don't think there is a lot to be gained from this in general, though. Streams are different, and may be more appropriate for some functions for performance reasons. I still prefer ranges for code where performance is less critical, because if you aren't very worried about performance, the range API is a lot nicer. I'm not dead set on maximum performance if it makes life a lot harder in my code, personally. I view these kinds of improvements as premature optimisations if I'm writing something new.
Mar 28 2014
Earlier Walter wrote: "I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so." I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.
Mar 28 2014
On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:Earlier Walter wrote: "I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so." I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:But should unsafe+fast be the default or rather an option for cases when you really need it?Earlier Walter wrote: "I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so." I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R
Mar 28 2014
On Fri, 28 Mar 2014 16:04:29 -0000, Chris <wendlec tcd.ie> wrote:On Friday, 28 March 2014 at 15:49:06 UTC, Regan Heath wrote:Pass. My point was only that it needs to exist. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/On Fri, 28 Mar 2014 14:15:10 -0000, Chris <wendlec tcd.ie> wrote:But should unsafe+fast be the default or rather an option for cases when you really need it?Earlier Walter wrote: "I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so." I don't think there is a one size fits all. What if customers ask for maximum security? In any language, if I want high performance, I have to be prepared to walk on thin ice. If I want things to be safe and / or generic, I have to accept additonal checks (= perfomance penalties). I don't think that a language can solve the fundamental problems concerning programming / mathematical logic with all the contradictory demands involved. It can give us the tools to cope with those problems, but not solve them out of the box.You can build safety on top of performance. You cannot do the opposite. Meaning, one could wrap an unsafe/fast range with a safe/slower one. R
Mar 28 2014
On Fri, 28 Mar 2014 00:58:35 -0400, Walter Bright <newshound2 digitalmars.com> wrote:Following the protocol empty-front-popFront always works. Where the differences come in is when people skip calling empty.-- when it is known empty will return false through logical deduction. -Steve
Mar 31 2014
On Saturday, March 22, 2014 17:50:34 Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases.You definitely don't have to call empty before calling front if you know that it's not empty. Both front and empty should normally be pure (or at least act that way) and essentially act like variables. In most cases, it works best for the work of the range to go in popFront. The exception is when you're dealing with a random-access range, since then any element could be accessed, making it so that you can't be doing the work in popFront. I think that we have a general agreement on this based on previous discussions, though it's certainly not unanimous.2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.If calling front were destructive, that would break a lot of code. It's probably true that most range-based code should avoid calling front multiple times (in case front has to do more work than just return the value as well as to avoid copying the result if that happens on every call), though if front is auto ref, it could be more efficient to call it multiple times. So, it's not entirely clear-cut. But again, front and empty should normally function as if they were variables. They should be property functions and should be pure (or at least act like they're pure). I'm sure that a _lot_ of code will break if that isn't followed. There are corner cases which can get a bit mucky though - e.g. auto a = map!(to!string)(range); In this case, front is pure, but it returns a new value each time (albeit a value that's equal each time until popFront is called). And there's no real way to fix that if the resulting range is random access (though if it weren't, the work could go in popFront, which _would_ make it so that front always returned the same result). And there have been arguments over whether the result of front should be valid after popFront has been called (i.e. whether it's transient or not). A lot of code assumes that it will be, but we have some nasty exceptions (e.g. std.stdio.ByLine) - typically because front's a buffer which gets reused. IIRC, in those cases, Andrei favored saying that input ranges that weren't forward ranges could have a transient front but that forward ranges couldn't (which I tend to agree with, though I'd prefer that _no_ ranges have transient fronts, since it can really cause problems - e.g. std.array.array not working). I don't think that a consensus was reached on that though, since a few folks really liked using transient fronts with more complicated ranges. In general though, I think that most of us would agree that front and empty should be treated as properties - i.e. as if they were variables - and that they should have try to stick to those semantics as closely as possible. Ranges that stray from that seriously risk not working with a lot of range- based code. - Jonathan M Davis
Mar 23 2014
On Sunday, 23 March 2014 at 07:54:12 UTC, Jonathan M Davis wrote:If calling front were destructive, that would break a lot of code.I thought that "breaking existing code" meant either "causing existing code do something it wasn't supposed to do" or "causing existing code not compile", but you're using it in the meaning of "making existing code not conform to the language specification". Are you sure that's the correct use of the expression?
Mar 23 2014
On 3/23/2014 12:53 AM, Jonathan M Davis wrote:But again, front and empty should normally function as if they were variables. They should be property functions and should be pure (or at least act like they're pure).This is simply not practical. The canonical example is a tty. You cannot tell if a character is available unless you try to read it. That is NOT pure.I'm sure that a _lot_ of code will break if that isn't followed.It seems there have been a lot of undocumented assumptions about the behavior of ranges.
Mar 25 2014
Am Sat, 22 Mar 2014 17:50:34 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases.Looking at ranges as a user with all the subjectivity it entails, I'd expect .empty to be a read-only property, a safe-const-pure-nothrow candidate. Actions that are not logically const are commonly encapsulated in methods (as opposed to read-only properties like .empty) and often also carry a verb in their name like "create", "update" or "append".2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.On the other side of it were (IIRC) there are use cases for return-by-ref on .front: o .front gives access to a large struct. Return-by-ref can avoid a copy on the call site, but may result in several .front calls. o .front is a view into some struct which holds system resources and providing a copy requires internals as in the File struct. I both cases you deal with a range over value types that are costly to copy and where "auto e = r.front;" is to be avoided. I'm 100% unsure how to deal with this. -- Marco
Mar 23 2014
I understand it like this. * empty - Are there no more values? * front - Get me the current value. * popFront - Advance to the next value. In terms of how I implement an InputRange in general, I typically end up with this. * empty - Advance and cache "current value," return true if we ran out of values. * front - enforce(!empty), which in turn caches the current value, which we then return. * popFront - enforce(!empty), clear the cached value so we can later advance. So .front gives you the same thing again and again until you call popFront, you could indeed call .front before .empty, but you may get an exception. This obviously isn't how I implement all InputRanges, as there are better ways to write other ranges, such as ranges which operate on sets of integers or wrap arrays. This is just my last resort general case InputRange implementation. .front assignment obviously replaces the cached value.
Mar 23 2014
I should add that I have implemented some ranges where .front and .popFront are both nothrow, as !empty doesn't "advance and cache" for these ranges and the check is moved into an in{} contract. For these ranges, they tend to behave like arrays with bounds checking, only now the bounds checking is turned off by virtue of -release.
Mar 23 2014
On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:I understand it like this. * empty - Are there no more values? * front - Get me the current value. * popFront - Advance to the next value.That is correct.In terms of how I implement an InputRange in general, I typically end up with this. * empty - Advance and cache "current value," return true if we ran out of values. * front - enforce(!empty), which in turn caches the current value, which we then return. * popFront - enforce(!empty), clear the cached value so we can later advance. So .front gives you the same thing again and again until you call popFront, you could indeed call .front before .empty, but you may get an exception. This obviously isn't how I implement all InputRanges, as there are better ways to write other ranges, such as ranges which operate on sets of integers or wrap arrays. This is just my last resort general case InputRange implementation. .front assignment obviously replaces the cached value.That is not consistent with the first part of your reply and is incorrect imho. Calling empty() twice should not destroy anything, even according to your understanding. Neither should front(). User should be able to call them as many times as he wishes getting same value every time. Only popFront() mutate the range.
Mar 23 2014
On Sunday, 23 March 2014 at 09:38:43 UTC, Szymon Gatner wrote:On Sunday, 23 March 2014 at 09:26:53 UTC, w0rp wrote:You read wrong. Say you have this sequence of three numbers and 'x' represents the cached value being empty, for a range 'r.' --- 1 2 3 x r.empty 1 2 3 ^ r.empty 1 2 3 ^ r.front == 1 r.popFront() 1 2 3 x r.empty 1 2 3 ^ --- front and popFront each enforce(!empty) first, so this works, etc. --- 1 2 3 x r.popFront() 1 2 3 ^ 1 2 3 x --- So it works for any sequence of values, and you can call .empty and .front as much as you want without changing the result. It just so happens that it's actually .empty which does the work of actually fetching the next value. There are better ways to implement this for certain ranges, but this works for anything.I understand it like this. * empty - Are there no more values? * front - Get me the current value. * popFront - Advance to the next value.That is correct.In terms of how I implement an InputRange in general, I typically end up with this. * empty - Advance and cache "current value," return true if we ran out of values. * front - enforce(!empty), which in turn caches the current value, which we then return. * popFront - enforce(!empty), clear the cached value so we can later advance. So .front gives you the same thing again and again until you call popFront, you could indeed call .front before .empty, but you may get an exception. This obviously isn't how I implement all InputRanges, as there are better ways to write other ranges, such as ranges which operate on sets of integers or wrap arrays. This is just my last resort general case InputRange implementation. .front assignment obviously replaces the cached value.That is not consistent with the first part of your reply and is incorrect imho. Calling empty() twice should not destroy anything, even according to your understanding. Neither should front(). User should be able to call them as many times as he wishes getting same value every time. Only popFront() mutate the range.
Mar 24 2014
On 3/23/2014 2:26 AM, w0rp wrote:.front assignment obviously replaces the cached value.This is another undocumented assumption.
Mar 25 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?IMO: yes. Logic of empty() sould be const and not have side effects.If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.Yes, all true. Not sure if there is something like that in Phobos but if for example you had RandomValueRange, ever call to front() should return the same random number until popFront() is called at which point internal front variable is being recalculated and cached. Since only popFront() is non-const, it is there where all the magic/mutation should happen. In my code I also have CachingRange which calls front() on underlying range the first time and then stores result internally. I use it for ranges which generate front() on the fly and I knot it is expensive. I could cache that calculation directly in that underlying range but CachingRange is reusable.
Mar 23 2014
On Sunday, 23 March 2014 at 09:34:28 UTC, Szymon Gatner wrote:On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:I tend to think about pair of C++ iterators, which are generalization of pointers. So for C pointers 'first' and 'last': *first -> front() first != last -> empty() ++first -> popFont() And I stick to semantics.1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?IMO: yes. Logic of empty() sould be const and not have side effects.If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.Yes, all true. Not sure if there is something like that in Phobos but if for example you had RandomValueRange, ever call to front() should return the same random number until popFront() is called at which point internal front variable is being recalculated and cached. Since only popFront() is non-const, it is there where all the magic/mutation should happen. In my code I also have CachingRange which calls front() on underlying range the first time and then stores result internally. I use it for ranges which generate front() on the fly and I knot it is expensive. I could cache that calculation directly in that underlying range but CachingRange is reusable.
Mar 23 2014
On Sun, 23 Mar 2014 05:34:26 -0400, Szymon Gatner <noemail gmail.com> wrote:On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:Here is the crux of the issue. I think aside from Walter's question, we have situations where it is expensive to construct each value of a range. For a range that is never used, that is an expense that you don't have to pay. There was a post a little while ago about how File.byLine reads the first line on construction. Consider the case where stdin.byLine needs to be constructed, but not *iterated* before writing some information to stdout. This gives us a predicament that the first line has to be available before you can output any instructions. As we all know, stdin will block on first read, unless you piped in a file or something. This makes an interactive program simply incorrect. From the library point of view, caching the first line on construction is the most sensible thing -- This allows empty, front, and popFront to all be consistent across the entire iteration. Making empty or front do something different on the first access requires awkward machinery (a boolean indicating it should do so, plus a check every call thereafter). But from the user's point of view, sometimes you want lazy access to the first element, for logical reasons. I think in light of such possibilities, empty and front should not have to be const or pure. Logically, it can be viewed that way, but not technically. -Steve1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first?IMO: yes. Logic of empty() sould be const and not have side effects.
Mar 24 2014
On 23/03/14 08:53, Jonathan M Davis wrote:But again, front and empty should normally function as if they were variables. They should be property functions and should be pure (or at least act like they're pure). I'm sure that a _lot_ of code will break if that isn't followed.There are some not-very-nice exceptions to that in std.random, where in many of the range types you have stuff along the following lines: auto front() property { if (notInitialized) { init(); } return _value; } see e.g. MersenneTwister, RandomSample, and others. It's all a nasty consequence of that RNGs-as-value-types problem we've discussed previously. However, the class-based std.random2 that is being discussed on the announce list fixes that: http://forum.dlang.org/thread/cyytvhixkqlbwkmiugex forum.dlang.orgIn general though, I think that most of us would agree that front and empty should be treated as properties - i.e. as if they were variables - and that they should have try to stick to those semantics as closely as possible. Ranges that stray from that seriously risk not working with a lot of range- based code.Yes, absolutely.
Mar 23 2014
On 03/23/2014 01:50 AM, Walter Bright wrote:2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.An analogous problem exists and is more severe for RandomAccessRanges. import std.algorithm, std.range; class Obj{ this(int){} } void main(){ auto x=[1,2,3].map!(a=>new Obj(a)); auto a=x.front; auto b=x.front; auto c=x[0]; auto d=x[0]; assert(a is b); // fail assert(b is c); // fail assert(c is d); // fail }
Mar 23 2014
On Sunday, 23 March 2014 at 11:52:53 UTC, Timon Gehr wrote:On 03/23/2014 01:50 AM, Walter Bright wrote:I have an open proposal for a "cache" range adaptor that somewhat eliviates the problem: https://github.com/D-Programming-Language/phobos/pull/1364 But it restricts the range to bidirectional (for the reasons above). Actually, it's mimited to Bidir, but if you *do* use it as bidirectional, then 1 element will be evaluated twice. So I'm wondering if it might not be better to simply restrict it to forward...2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.An analogous problem exists and is more severe for RandomAccessRanges. import std.algorithm, std.range; class Obj{ this(int){} } void main(){ auto x=[1,2,3].map!(a=>new Obj(a)); auto a=x.front; auto b=x.front; auto c=x[0]; auto d=x[0]; assert(a is b); // fail assert(b is c); // fail assert(c is d); // fail }
Mar 24 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.
Mar 23 2014
On Sun, 23 Mar 2014 09:00:17 -0400, Tommi <tommitissari hotmail.com> wrote:On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:A range interface only works for a buffered stream, which naturally allows caching. -SteveIt's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.Is InputRange supposed to be a one-pass range and am I supposed to able to use InputRange as a lightweight wrapper for a stream? If the answer is yes, then I think the fundamental issue is that the empty-front-popFront interface is not optimal for something like InputRange. But given that's the interface we have, I think that InputRange's front must be allowed to be destructive because the stream it could potentially be wrapping is destructive.
Mar 24 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.
Mar 24 2014
On 24/03/14 14:20, Dicebot wrote:I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first. The trouble is that it's so vague _when_ it should be called. Example from std.random: RandomSample not only has a .front which needs a check as to whether the first value has actually been selected, it also has an .index method which corresponds to the index of the chosen element of the range being sampled from. Now, how is a "first" method supposed to know which methods it should precede? Any method? Specified ones? In any case, even if we could get it to work, it'd be a convenience rather than a solution, and would still in effect result in a non-const .front. OTOH a requirement that .popFront() be called at least once before .front is accessed won't wash either. At least in the case of randomSample, the code required to generate the _first_ sample point is different from what is required to .popFront().
Mar 24 2014
On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling wrote:On 24/03/14 14:20, Dicebot wrote:I was thinking about something more simple. Current pattern is: while (!r.empty) { auto useme = r.front; r.popFront; } And I think this would have been more practical: r.popFront; while (!r.empty) { // same } So every range is supposed to start in "init" state and provide data only after first popFront. No other changes. It is not a silver bullet but looks like an improvement over current design to me.I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first.
Mar 25 2014
On 3/25/14, 12:16 PM, Dicebot wrote:On Monday, 24 March 2014 at 22:26:10 UTC, Joseph Rushton Wakeling wrote:Suggestion: focusing on what to do within the present context is more productive. AndreiOn 24/03/14 14:20, Dicebot wrote:I was thinking about something more simple. Current pattern is: while (!r.empty) { auto useme = r.front; r.popFront; } And I think this would have been more practical: r.popFront; while (!r.empty) { // same } So every range is supposed to start in "init" state and provide data only after first popFront. No other changes. It is not a silver bullet but looks like an improvement over current design to me.I think there is one design mistake with current InputRange rules that makes usage so inconsistent. We have `empty` but don't have any distinct `not yet started` state. If calling `popFront` at least once was required before accessing `front`, I can't imagine the case where having any non-trivial code in `front` would have been necessary.I floated some ideas along those lines, for a "first" method that would be automatically called immediately before the first call to front, popFront, etc., whatever came first.
Mar 25 2014
On Tuesday, 25 March 2014 at 19:44:46 UTC, Andrei AlexanSuggestion: focusing on what to do within the present context is more productive. AndreiI do work on template argument pack & friends right now, leave me some fun! :P
Mar 25 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is. The normal way to use it is: while (!r.empty) { auto e = r.front; ... do something with e ... r.popFront(); } no argument there. But there are two issues: 1. If you know the range is not empty, is it allowed to call r.front without calling r.empty first? If this is true, extra logic will need to be added to r.front in many cases. 2. Can r.front be called n times in a row? I.e. is calling front() destructive? If true, this means that r.front will have to cache a copy in many cases.In many cases it makes sense to put things into front, for example if you read a file line by line, parsing each line to create a dictionary entry. Although it has been said that any logic should go into popFront, because front can be called several times. Since Ranges usually perform some kind of filtering mechanism and return something new, it makes sense that things go into front, or at least should be allowed to go into front. In my experience it is a bit of a "the hen or the egg" problem, to get something you have to call front, but the something should only be done in popFront. Ranges are often simpler and faster to write, when you put things into front, because in most of my use cases front is not called many times in a row, just once for each item. I still haven't found the golden way(s) of writing ranges.
Mar 25 2014
It's pretty clear that: 1. the protocol is COMPLETELY undocumented and undefined. 2. in this vacuum, people (including me) have made all sorts of assumptions, and assumed those assumptions were universal and shared. Since ranges+algorithms are central to D, this is a very serious problem. There are two competing schools of thought: 1. maximum flexibility 2. maximum performance (1) has been explained well in this thread already. (2) not so much, so I'll stand up for that one. I've been recently involved in writing high performance apps, as anyone following ScopeBuffer knows. With such apps, it's clear that: 1. size matters - the fewer registers an object fits in, the faster it goes 2. every instruction matters - more operations == slower code 3. lazy seems to be faster, mainly because it eliminates the storage management cost of buffers and extra copying into and out of those buffers We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); } With the additional proviso that ranges are passed to algorithms by value, so they should be cheap to copy. Cheap to copy usually means them being small. A) I know that my range is not empty, so I can skip calling empty. Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design. Next, priming requires a buffer in the range. Buffers add size, making them slower to copy, and will often require another field saying if the buffer has data in it or not, further bumping the size. And lastly, it means that lazy ranges will be required to read the first element, even if the range isn't then subsequently used, which defeats what one would expect from a lazy range. By having mere construction of a range initiate reading from source, this makes the useful idiom of separating construction from use impractical. (For example, building a pipeline object, then sending it to another thread for execution.) All this saves for the user is one call to empty for the entire algorithm, at a cost incurred with every iteration. I.e. it selects O(n) to save O(1). B) I want to be able to call front multiple times in a row, and expect to get the same value. This can force the range to incorporate a one element buffer and a flag indicating the status of that buffer. The range instance gets bigger and more expensive to copy, and the cost of manipulating the flag and the buffer is added to every loop iteration. Note that the user of a range can trivially use: auto e = r.front; ... using e multiple times ... instead. The big problem with (A) and (B) is these costs become present in every range, despite being rarely needed and having simple alternatives. This is the wrong place to put cost and complexity. The user should be making these decisions based on his needs. Hence, I propose that the protocol for using input ranges be defined as: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); } This makes it possible to build pipelines that are firehoses with no kinks or constrictions in them. It optimizes for the inner loop case, not boundary cases.
Mar 25 2014
On Tuesday, 25 March 2014 at 20:15:32 UTC, Walter Bright wrote:It's pretty clear that: 1. the protocol is COMPLETELY undocumented and undefined.http://dlang.org/phobos/std_range.html#isInputRange The semantics of an input range (not checkable during compilation) are assumed to be the following (r is an object of type R): r.empty returns false iff there is more data available in the range. r.front returns the current element in the range. It may return by value or by reference. Calling r.front is allowed only if calling r.empty has, or would have, returned false. r.popFront advances to the next element in the range. Calling r.popFront is allowed only if calling r.empty has, or would have, returned false.We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }This makes the assumption that r.front is copy constructible at all. It also makes the assumption that you want to operate on a copy, rather than the actual element in the range. Finally, it means having to declare a local object: It merely means shifting the burden of caching from one context to another. If the object is large, chances are you are better off just calling front instead of making a copy. Especially if the loop is trivial. If you want high performance, then arguably, just provide a O(1) front, and a O(1) empty. Also, certain ranges, such as "filter" *must* access the front of the previous range more than once. Unless you are suggesting we add a field for it to cache the result of the previous range?With the additional proviso that ranges are passed to algorithms by value, so they should be cheap to copy. Cheap to copy usually means them being small.Unfortunately yes. That said, any range that does anything will have at least two fields, one of which is a slice, or comparable to in terms of size, so it's going to be big anyways. So you *are* better off passing by ref if you can regardless, unless your range is *really* trivial. I agree that range sizes can be a problem.A) I know that my range is not empty, so I can skip calling empty. Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design.If the prime fails, then the range can simply be marked as empty. Then if you decide to skip calling empty anyways, it's your own fault.And lastly, it means that lazy ranges will be required to read the first element, even if the range isn't then subsequently used, which defeats what one would expect from a lazy range.I'm not yet convinced of adding special code for ranges that aren't used. I've heard of these kinds of ranges, but I've observed that when you declare one, it almost always ends up being used. I don't think we should waste efforts on this rare usecase. As for "Lazy", in range terms, it mostly only means you calculate things element at once, as you go up the range chain. As opposed to processing the entire input data, one transformation at a time. Evaluating an element "on use" as opposed to "1 instruction before use" doesn't make much of a change in this context.All this saves for the user is one call to empty for the entire algorithm, at a cost incurred with every iteration. I.e. it selects O(n) to save O(1).If code that was actually meant to *do* something was in popFront() to begin with, then there'd be no extra overhead. I've found that if you are creative enough, you can usually design the range in such a way that it works efficiently, lazilly, and without flags.Hence, I propose that the protocol for using input ranges be defined as: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); } This makes it possible to build pipelines that are firehoses with no kinks or constrictions in them. It optimizes for the inner loop case, not boundary cases.I get where you are coming from, but it's simply not manageable in a generic fashion, if you want to be able to preserve all the power and the diversity of the ranges we have. The protocol you are suggesting would prevent us from doing a lot of the lovely things that ranges empowers us with.
Mar 25 2014
On 3/25/2014 1:56 PM, monarch_dodra wrote:http://dlang.org/phobos/std_range.html#isInputRange The semantics of an input range (not checkable during compilation) are assumed to be the following (r is an object of type R): r.empty returns false iff there is more data available in the range. r.front returns the current element in the range. It may return by value or by reference. Calling r.front is allowed only if calling r.empty has, or would have, returned false. r.popFront advances to the next element in the range. Calling r.popFront is allowed only if calling r.empty has, or would have, returned false.I overlooked that. Thanks.It's a reasonable requirement. If your range has an issue with this, it can return a pointer to the element, and the element can be a struct with access functions. Then, the pointer will work as well as a copy.We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }This makes the assumption that r.front is copy constructible at all. It also makes the assumption that you want to operate on a copy, rather than the actual element in the range.Finally, it means having to declare a local object: It merely means shifting the burden of caching from one context to another. If the object is large, chances are you are better off just calling front instead of making a copy. Especially if the loop is trivial.This does come up as an issue, and is solvable by returning a pointer as I described. It's up to the designer of the range.If you want high performance, then arguably, just provide a O(1) front, and a O(1) empty.I don't think the issues can be waved away so easily, or I wouldn't have brought this up.Also, certain ranges, such as "filter" *must* access the front of the previous range more than once. Unless you are suggesting we add a field for it to cache the result of the previous range?This is putting the cost where it belongs - when needed on the user of a range, rather than on all ranges. It's the "pay only for what you need" idea rather than "pay regardless".Many very useful ranges are trivial. Or at least they should be. An array, for example, is a trivial range.With the additional proviso that ranges are passed to algorithms by value, so they should be cheap to copy. Cheap to copy usually means them being small.Unfortunately yes. That said, any range that does anything will have at least two fields, one of which is a slice, or comparable to in terms of size, so it's going to be big anyways. So you *are* better off passing by ref if you can regardless, unless your range is *really* trivial.Yes, one can add state flags to indicate failed construction, which I'll argue is an ugly design. After all, construction is supposed to construct an object or fail, not leave the 'constructed' object in a zombie state.A) I know that my range is not empty, so I can skip calling empty. Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design.If the prime fails, then the range can simply be marked as empty. Then if you decide to skip calling empty anyways, it's your own fault.throw this entire category of use cases under the bus just to handle a convenience of not needing to call empty in some cases?And lastly, it means that lazy ranges will be required to read the first element, even if the range isn't then subsequently used, which defeats what one would expect from a lazy range.I'm not yet convinced of adding special code for ranges that aren't used. I've heard of these kinds of ranges, but I've observed that when you declare one, it almost always ends up being used. I don't think we should waste efforts on this rare usecase.Evaluating an element "on use" as opposed to "1 instruction before use" doesn't make much of a change in this context.Except that it requires the use to start upon construction, which defeats any hope of separating construction of a pipeline from using it.I've found that if you are creative enough, you can usually design the range in such a way that it works efficiently, lazilly, and without flags.That's not been my experience.I get where you are coming from, but it's simply not manageable in a generic fashion, if you want to be able to preserve all the power and the diversity of the ranges we have. The protocol you are suggesting would prevent us from doing a lot of the lovely things that ranges empowers us with.Please show me such a case. Note that I've shown above that this power and diversity throws an entire use case category under the bus. Secondly, in my experiments with ranges, the power and diversity results in slower pipelines.
Mar 25 2014
On 3/25/14, 1:15 PM, Walter Bright wrote:Of course, priming may fail, and so construction may throw, which is not good design.This part I disagree with. -- Andrei
Mar 25 2014
On 3/25/14, 1:15 PM, Walter Bright wrote:Next, priming requires a buffer in the range.No, front requires a buffer in the range. -- Andrei
Mar 25 2014
I thought I had only 1-2 comments, but I have a few more. On 3/25/14, 1:15 PM, Walter Bright wrote:1. the protocol is COMPLETELY undocumented and undefined.s/COMPLETELY/loosely/Since ranges+algorithms are central to D, this is a very serious problem.Agreed.We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }Actually generic code often prefers: while (!r.empty) { ... r.front ...; r.popFront(); } That saves copying r.front if it returns by ref. A bunch of algos in std do that.Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design.Again, I disagree with this assertion.Next, priming requires a buffer in the range.Priming has nothing do to with the range being buffered. The entire design of ranges is a one-element buffer.Buffers add size, making them slower to copy, and will often require another field saying if the buffer has data in it or not, further bumping the size.That's an argument for adding an unbuffered range abstraction.All this saves for the user is one call to empty for the entire algorithm, at a cost incurred with every iteration. I.e. it selects O(n) to save O(1).I don't understand how that O() reasoning works.B) I want to be able to call front multiple times in a row, and expect to get the same value.Correct.This can force the range to incorporate a one element buffer and a flag indicating the status of that buffer.It may, but many ranges naturally work that way (e.g. arrays).The range instance gets bigger and more expensive to copy, and the cost of manipulating the flag and the buffer is added to every loop iteration. Note that the user of a range can trivially use: auto e = r.front; ... using e multiple times ... instead.That would pessimize code using arrays of large structs.The big problem with (A) and (B) is these costs become present in every range, despite being rarely needed and having simple alternatives. This is the wrong place to put cost and complexity. The user should be making these decisions based on his needs. Hence, I propose that the protocol for using input ranges be defined as: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); } This makes it possible to build pipelines that are firehoses with no kinks or constrictions in them. It optimizes for the inner loop case, not boundary cases.The proposed protocol pessimizes arrays of large structs and will trip the unwary if calling r.front again returns something else. I don't think the proposal is good. Andrei
Mar 25 2014
On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:'map' may fail this criterion. In this case, is the blame put on the user?B) I want to be able to call front multiple times in a row, and expect to get the same value.Correct.
Mar 25 2014
On Tuesday, 25 March 2014 at 22:54:14 UTC, Timon Gehr wrote:On 03/25/2014 10:29 PM, Andrei Alexandrescu wrote:Depends how you define "same" though :/ But yeah. Also, shameless plug about my "cache" range adaptor: https://github.com/D-Programming-Language/phobos/pull/1364'map' may fail this criterion. In this case, is the blame put on the user?B) I want to be able to call front multiple times in a row, and expect to get the same value.Correct.
Mar 25 2014
On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:Yes, I agree. The idiom foreach (ref e; r) { ... } must also work.We want to appeal to the high performance coders. To maximize performance, ranges should be optimized to the inner loop case, which is: while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }Actually generic code often prefers: while (!r.empty) { ... r.front ...; r.popFront(); } That saves copying r.front if it returns by ref.The throwing part, or the not good design part?Since front is guaranteed to succeed if !empty, this puts a requirement on many ranges that they have a non-trivial constructor that 'primes the pump'. Of course, priming may fail, and so construction may throw, which is not good design.Again, I disagree with this assertion.I don't know how that would fit into the ecosystem. In any case, with the protocol I suggest, the designer of the range is free to distribute the functionality into the three functions, however it makes best sense. With arbitrarily calling any in any order, then all three need to have some sort of signalling mechanism between them.Next, priming requires a buffer in the range.Priming has nothing do to with the range being buffered. The entire design of ranges is a one-element buffer.Buffers add size, making them slower to copy, and will often require another field saying if the buffer has data in it or not, further bumping the size.That's an argument for adding an unbuffered range abstraction.I meant adding extra signalling with every iteration of the loop, rather than requiring the user to call empty before front.All this saves for the user is one call to empty for the entire algorithm, at a cost incurred with every iteration. I.e. it selects O(n) to save O(1).I don't understand how that O() reasoning works.Sure, but there are far more range types than arrays.B) I want to be able to call front multiple times in a row, and expect to get the same value.Correct.This can force the range to incorporate a one element buffer and a flag indicating the status of that buffer.It may, but many ranges naturally work that way (e.g. arrays).You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.The range instance gets bigger and more expensive to copy, and the cost of manipulating the flag and the buffer is added to every loop iteration. Note that the user of a range can trivially use: auto e = r.front; ... using e multiple times ... instead.That would pessimize code using arrays of large structs.The proposed protocol pessimizes arrays of large structsNot really more than the existing protocol does (i.e. required buffering).and will trip the unwary if calling r.front again returns something else.I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.I don't think the proposal is good.I don't like being in the position of when I need high performance code, I have to implement my own ranges & algorithms, or telling customers they need to do so.
Mar 25 2014
On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright <newshound2 digitalmars.com> wrote:On 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:Surely you'd simply start with a range of pointers to expensive-to-copy objects? Or, return them by reference from the underlying range/array/source. You want to avoid *ever* copying them except explicitly where required.You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.The range instance gets bigger and more expensive to copy, and the cost of manipulating the flag and the buffer is added to every loop iteration. Note that the user of a range can trivially use: auto e = r.front; ... using e multiple times ... instead.That would pessimize code using arrays of large structs.My immediate expectation upon encountering ranges was that r.front would return the same item repeatedly until r.popFront was called. Breaking that guarantee will trip a *lot* of people up. IMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range. - "lazy" ranges SHOULD delay initialisation until r.empty is called R -- Using Opera's revolutionary email client: http://www.opera.com/mail/The proposed protocol pessimizes arrays of large structsNot really more than the existing protocol does (i.e. required buffering).and will trip the unwary if calling r.front again returns something else.I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.
Mar 26 2014
On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz> wrote:On Tue, 25 Mar 2014 23:22:18 -0000, Walter Bright <newshound2 digitalmars.com> wrote:These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987 -SteveOn 3/25/2014 2:29 PM, Andrei Alexandrescu wrote:Surely you'd simply start with a range of pointers to expensive-to-copy objects? Or, return them by reference from the underlying range/array/source. You want to avoid *ever* copying them except explicitly where required.You're already requiring copying with the buffering requirement. And besides, if I was creating a range of expensive-to-copy objects, I would add a layer of indirection to cheapen it.The range instance gets bigger and more expensive to copy, and the cost of manipulating the flag and the buffer is added to every loop iteration. Note that the user of a range can trivially use: auto e = r.front; ... using e multiple times ... instead.That would pessimize code using arrays of large structs.My immediate expectation upon encountering ranges was that r.front would return the same item repeatedly until r.popFront was called. Breaking that guarantee will trip a *lot* of people up. IMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range.The proposed protocol pessimizes arrays of large structsNot really more than the existing protocol does (i.e. required buffering).and will trip the unwary if calling r.front again returns something else.I'm not proposing that calling them wrongly would make things unsafe. But I don't think it's unreasonable to expect the protocol to be followed, just like file open/read/close.
Mar 26 2014
On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz> wrote:Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary. -SteveIMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range.These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987
Mar 26 2014
On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 26 Mar 2014 08:29:15 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/On Wed, 26 Mar 2014 06:46:26 -0400, Regan Heath <regan netmail.co.nz> wrote:Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.IMO the rules should be something like: - r.empty WILL return false if there is more data available in the range. - r.empty MUST be called before r.front, r.front WILL succeed if r.empty returned false. - r.front WILL repeatably return the current element in the range. It MAY return by value or by reference. - r.empty SHOULD be called before r.popFront, otherwise r.popFront MAY do nothing or throw (could also make this one a 'MUST') - r.popFront WILL advance to the next element in the range.These two rules are not necessary if you know the range is not empty. See the conversation inside this pull: https://github.com/D-Programming-Language/phobos/pull/1987
Mar 26 2014
On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz> wrote:On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty. -SteveGah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
Mar 26 2014
On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. -SteveNot only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length. Why pay for "range2.empty", when you know it'll always be false? There is a noticeable performance difference if you *don't* check.
Mar 26 2014
On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodra <monarchdodra gmail.com> wrote:On Wednesday, 26 March 2014 at 15:37:38 UTC, Steven Schveighoffer wrote:What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. -SteveNot only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length.Why pay for "range2.empty", when you know it'll always be false? There is a noticeable performance difference if you *don't* check.But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodraThe interface: The target *shall* have enough room to accommodate origin. Failure to meet that criteria is an Error. Output ranges may or may not auto expand as required. Arguably, it's a design flaw I don't want to get started on.Not only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length.What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..I don't know what checks you are talking about. Most ranges don't check anything on front or popFront. They merely assume they are in a state that where they can do their job. Failure to meet this condition prior to the call (note I didn't say "failure to check for"), means Error. It's really not much different from when doing an strcpy. "++p" and "*p" don't check anything. The fact that ranges *can* check doesn't always mean they should.Why pay for "range2.empty", when you know it'll always be false? There is a noticeable performance difference if you *don't* check.But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront.
Mar 26 2014
On Wed, 26 Mar 2014 17:32:30 -0000, monarch_dodra <monarchdodra gmail.com> wrote:On Wednesday, 26 March 2014 at 16:55:48 UTC, Regan Heath wrote:Ok. So long as *something* is throwing that Error I am down with this.On Wed, 26 Mar 2014 16:38:57 -0000, monarch_dodraThe interface: The target *shall* have enough room to accommodate origin. Failure to meet that criteria is an Error.Not only that, but it's also a performance criteria: If you are iterating on two ranges at once (think "copy"), then you *know* "range2" is longer than "range1", even if you don't know its length.What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..Output ranges may or may not auto expand as required. Arguably, it's a design flaw I don't want to get started on.:)Ok.. but lets take a naive range of say int with a 1 element cache in the member variable "int cache;". The simplest possible front would just be "return cache;". But, if cache hasn't been populated yet it's not going to throw an Error, it's just going to be wrong. So, presumably front has to check another boolean to verify it's been populated and throw an Error if not. That's one of the checks I meant. A typical loop over a range will call front one or more times, so you pay for that check 1 or more times per loop. popFront in this example doesn't need to check anything, it just populates cache regardless, as does empty. But, I imagine there are ranges which need some initial setup, and they have to do it somewhere, and they need to check they have done it in empty, front and popFront for every call. It's those checks we'd like to avoid if we can. So.. if we mandate that empty MUST be called, then they could just be done in one place, empty. However, in this situation nothing would be enforcing that requirement (in release anyway) and things could just go wrong. So, perhaps the checks always need to be there and we gain nothing by mandating empty is called first. idunno.I don't know what checks you are talking about. Most ranges don't check anything on front or popFront. They merely assume they are in a state that where they can do their job. Failure to meet this condition prior to the call (note I didn't say "failure to check for"), means Error.Why pay for "range2.empty", when you know it'll always be false? There is a noticeable performance difference if you *don't* check.But aren't you instead paying for 2 checks in front and 2 in popFront, so 4 checks vs 1? Or is the argument that these 4 checks cannot be removed even if we mandate r.empty is called before r.front/popFront.It's really not much different from when doing an strcpy. "++p" and "*p" don't check anything. The fact that ranges *can* check doesn't always mean they should.Sure. For performance reasons they might not, but isn't this just a tiny bit safer if we mandate empty must be called and do one check there.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
"Regan Heath" wrote in message news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..Some ranges will give you their length...
Mar 26 2014
On Thu, 27 Mar 2014 02:44:13 -0000, Daniel Murphy <yebbliesnospam gmail.com> wrote:"Regan Heath" wrote in message news:op.xdb9a9v354xghj puck.auriga.bhead.co.uk...Sure. And generally you could use it. copy() doesn't, and I was talking specifically about that example. :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/What guarantees range2 is longer than range1? The isArray case checks explicitly, but the generic one doesn't. Is it a property of being an output range that it will expand as required, or..Some ranges will give you their length...
Mar 27 2014
On Wed, 26 Mar 2014 15:37:38 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz> wrote:Sure, it's not required for some algorithms in some situations.On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement.Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.Sure, as above, makes perfect sense. It seemed from this thread that there was some confusion about how ranges should be written and used, and I thought it might help if the requirements were more fixed, that's all. If r.empty was mandatory then every range implementer would have a place to lazily initialise, r.front would be simpler, r.popFront too. Basically it would lower the bar for "good" range implementations. We might just need better documentation tho. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 26 2014
On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:On Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz> wrote:I think requiring users to call empty before front on input ranges is a concession we should make. AndreiOn Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
Mar 26 2014
On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu wrote:I think requiring users to call empty before front on input ranges is a concession we should make.Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready". IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.
Mar 26 2014
On Wednesday, 26 March 2014 at 18:04:44 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 26 March 2014 at 17:36:08 UTC, Andrei Alexandrescu wrote:Why not? It is how iterators work in most OO languages. -- PauloI think requiring users to call empty before front on input ranges is a concession we should make.Then the name should change to "ready". It makes sense to require the user to check that the range is "ready", but not to check that it is "not empty". This will also make more sense for async implementations that will block if "not ready". IMO the whole interface needs rethinking if you want to gracefully support async data streams where you need to distinguish between: "ready" vs "empty", "front" vs "firstavailable". Both quick-sort, merge-sort, filter and map work well with async data streams.
Mar 27 2014
On Thursday, 27 March 2014 at 08:13:28 UTC, Paulo Pinto wrote:Why not? It is how iterators work in most OO languages.Why not what? A query for "empty()" should not have any side effects. In what library does that happen? Tests should in general not have side effects unless the name suggest so. Anyway, I wish D was more clear about use scenarios. If D is going to be efficient in the server space then it should also make low latency programming easier, meaning supporting async collections out-of-the-box.
Mar 27 2014
On Wed, 26 Mar 2014 13:36:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:if(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code. -SteveOn Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz> wrote:I think requiring users to call empty before front on input ranges is a concession we should make.On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
Mar 26 2014
On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:if(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code.The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
Mar 26 2014
On Wed, 26 Mar 2014 22:48:16 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls. -Steveif(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code.The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
Mar 26 2014
On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls.As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.
Mar 26 2014
On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false." (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.htmlOK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls.As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.
Mar 27 2014
On Thu, 27 Mar 2014 10:49:42 -0000, Marc Sch=FCtz <schuetzm gmx.net> wro= te:On Thursday, 27 March 2014 at 04:17:16 UTC, Walter Bright wrote:u =On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:OK, but it's logical to assume you *can* avoid a call to empty if yo=t =know what's going on under the hood, no? Then at that point, you have los==the requirement -- people will avoid calling empty because they can get =away with it, and then altering the under-the-hood requirements cause code ==breakage later. Case in point, the pull request I referenced, the author originally =tried to just use empty to lazily initialize filter, but it failed due to =. =existing code in phobos that did not call empty on filtered data before processing=He had to instrument all 3 calls.As with *any* API, if you look under the hood and make assumptions ==about the behavior based on a particular implementation, assumptions =.that are not part of the API, the risk of breakage inevitably follows==If you've identified Phobos code that uses ranges but does not follow=e =the protocol, the Phobos code is broken - please file a bugzilla issu=e =on it.I was originally going to do that, but then I took a closer look at th=documentation, which says ([1] in the documentation of `isInputRange()=`):"Calling r.front is allowed only if calling r.empty has, or would have=, =returned false." (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don='t =actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.htmlThat's because up until now we've made no attempt to set this in stone, = = and as such many interpretations have surfaced. This documentation woul= d, = of course, change to match the final decision made. R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 27 2014
On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false."Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively. Andrei
Mar 27 2014
On Thu, 27 Mar 2014 11:19:15 -0400, Andrei Alexandrescu = <SeeWebsiteForEmail erdani.org> wrote:On 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:heI was originally going to do that, but then I took a closer look at t=documentation, which says ([1] in the documentation of =e,`isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would hav=returned false."Probably we need to amend that. For efficient ranges, front() and =popFront() should only be guaranteed to work if either empty() or =length() were evaluated first, and they returned false or nonzero, =respectively.This is a misleading statement. An array is probably the most efficient = of = ranges, which does not require this. We should be more specific here. For certain types of *nonempty* ranges,= = front is only guaranteed to work if empty/length was called before front= = is called. I would not necessarily lump popFront into that requirement = automatically, popFront may be able to cope with not having cached the = first element without losing efficiency. Questions to answer: 1. What types of ranges does this rule apply to? 2. What is the cost of not requiring this in terms of efficiency and eas= e = of implementation. 3. Is there a better mechanism to use than misusing empty? should we = introduce another property/call? At the moment, the only candidates I can think of are lazy caching range= s. = How many of those exist? -Steve
Mar 27 2014
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:At the moment, the only candidates I can think of are lazy caching ranges. How many of those exist?filter comes to mind. -- Andrei
Mar 27 2014
On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:On 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our "assert(!empty);" calls and all.I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false."Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively.
Mar 27 2014
On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra <monarchdodra gmail.co= m> = wrote:On Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:=theOn 3/27/14, 3:49 AM, "Marc Sch=C3=BCtz" <schuetzm gmx.net>" wrote:I was originally going to do that, but then I took a closer look at =documentation, which says ([1] in the documentation of =ve,`isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would ha=returned false."Probably we need to amend that. For efficient ranges, front() and =popFront() should only be guaranteed to work if either empty() or =length() were evaluated first, and they returned false or nonzero, =respectively.I just want to point out that I think allowing empty to have an =*observable* side effect will have cataclysmic repercussion in =validating a release build, what with all our "assert(!empty);" calls ==and all.This is an excellent point. assert(!empty) is everywhere. -Steve
Mar 27 2014
On 3/27/14, 11:53 AM, Steven Schveighoffer wrote:On Thu, 27 Mar 2014 12:02:16 -0400, monarch_dodra <monarchdodra gmail.com> wrote:Yah, agreed. -- AndreiOn Thursday, 27 March 2014 at 15:19:37 UTC, Andrei Alexandrescu wrote:This is an excellent point. assert(!empty) is everywhere. -SteveOn 3/27/14, 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:I just want to point out that I think allowing empty to have an *observable* side effect will have cataclysmic repercussion in validating a release build, what with all our "assert(!empty);" calls and all.I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false."Probably we need to amend that. For efficient ranges, front() and popFront() should only be guaranteed to work if either empty() or length() were evaluated first, and they returned false or nonzero, respectively.
Mar 27 2014
On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:Yah, agreed. -- AndreiCompletely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 27 2014
On Thu, 27 Mar 2014 16:58:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:In the land of ranges, it's construction and popFront that generally advances the range. Empty does not. This is why assert(!empty) is all over the place, it's considered to be non-destructive. Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward. The awkwardness for shoehorning streams into ranges I see is that the advancement (popFront) and the check for termination (empty) are decoupled, when in reality they are synced for a stream. This REQUIRES a sticky bit for popFront to communicate with empty on whether it is EOF or not. Even a single byte buffer is not enough, you need a bool to indicate the stream is done. -SteveYah, agreed. -- AndreiCompletely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 27 2014
On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward.The range becomes the one element buffer in this case. It is completely workable.The awkwardness for shoehorning streams into ranges I see is that the advancement (popFront) and the check for termination (empty) are decoupled, when in reality they are synced for a stream. This REQUIRES a sticky bit for popFront to communicate with empty on whether it is EOF or not.The range protocol is designed to work with streams. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.Even a single byte buffer is not enough, you need a bool to indicate the stream is done.Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.
Mar 27 2014
On Thu, 27 Mar 2014 17:24:11 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 2:13 PM, Steven Schveighoffer wrote:A 1 byte buffered stream? If it's performance you are looking for, you will not find it there.Note that a stream makes a terrible range, simply because of what you say -- reading is destructive, and determining emptiness is dependent on reading. You need a buffered stream to make a good range, and then the logic becomes much more straightforward.The range becomes the one element buffer in this case. It is completely workable.The range protocol is designed to work with streams. It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.Ranges work well on top of buffered streams, but not AS streams.A stream requires one primitive -- read. In one function call, you get the data you want, you can tell if it's empty, and the stream object does not need to concern itself with projecting a cached element. Adding range primitives on top of a stream does not make sense. -SteveEven a single byte buffer is not enough, you need a bool to indicate the stream is done.Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.
Mar 27 2014
On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:Adding range primitives on top of a stream does not make sense.Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone? I'm also curious what a generic read() should do when the stream is empty.
Mar 27 2014
On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 2:31 PM, Steven Schveighoffer wrote:Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.Adding range primitives on top of a stream does not make sense.Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read. -Steve
Mar 28 2014
On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:On Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright <newshound2 digitalmars.com> wrote:It increasingly seems to me we need to do this. -- AndreiOn 3/27/2014 2:31 PM, Steven Schveighoffer wrote:Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.Adding range primitives on top of a stream does not make sense.Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read.
Mar 28 2014
On Fri, 28 Mar 2014 20:14:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 3/28/14, 3:42 AM, Steven Schveighoffer wrote:It is in the works. I need to finish it. I've been incorporating Dmitry's I/O buffer into my design. -SteveOn Thu, 27 Mar 2014 20:25:56 -0400, Walter Bright <newshound2 digitalmars.com> wrote:It increasingly seems to me we need to do this. -- AndreiOn 3/27/2014 2:31 PM, Steven Schveighoffer wrote:Not necessary. You just need to implement one range on top of a buffered stream, and then it works with all other algorithms that accept input ranges.Adding range primitives on top of a stream does not make sense.Are ready to implement a parallel universe of stream based algorithms to go alongside all the range based ones and be ready to constantly justify that to everyone?I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read.
Mar 28 2014
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote:It increasingly seems to me we need to do this. -- AndreiWhat I find funny here is that walter is saying we must enforce that empty always be called in the range - "for performance". Yet about a year ago, he made a proposal for a "SentinelInputRange", where you could do away with empty altogether because "tl,dr: PERFORMANCE!". http://forum.dlang.org/thread/kgmatj$1v8b$1 digitalmars.com If you need to deviate from the rules for extra performance, then fine. Just document that it is a lower level range, with useage limitations. However, making a global design change because *1* type of range needs it, "for performance", I'm not fine with.
Mar 29 2014
On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote:It increasingly seems to me we need to do this. -- AndreiQuestion: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the "test empty" issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone?
Mar 29 2014
On Saturday, 29 March 2014 at 09:06:45 UTC, monarch_dodra wrote:On Saturday, 29 March 2014 at 00:14:40 UTC, Andrei Alexandrescu wrote:Did this conversation die the instant I actually had something smart to say? andralex, WalterBright ? Thoughts?It increasingly seems to me we need to do this. -- AndreiQuestion: Should this new restriction only be applied to *Input*Ranges? If so, then it might make more sense. Ranges that also satisfy the *Forward* interface (which has save), or specifically, the *Output* interface (that doesn't even have empty) should not have this obligation. In particular, output ranges *write*, so usually can't even buffer data. *Input*-only Ranges are the ones that are usually implemented over streams, and that are most subject to the "test empty" issue too, so they are the ones that need this restriction. Could this maybe be a good solution for everyone?
Mar 30 2014
On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read.
Mar 28 2014
Am Fri, 28 Mar 2014 19:23:29 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:I guess we all have a clear concept of streams in our mind from all kinds of programming languages. They typically operate on bytes, have an EOF flag and offer read/write methods that you pass a byte pointer and a length into. The result is the count of bytes read or written. Optionally they have an "available" property and handle any combination of the following: o all basic data types of the programming language o POD structs o formatted strings o bitwise operations o seeking after calculating offsets They are used for I/O on heterogeneous data like binary file formats or network protocols. Ranges on the other hand work on sequences of items of the same type, which is a small subset of what streams are supposed to support. While there should be a connection between both worlds, one cannot replace the other. There will always be a separate raw stream type in Phobos.I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read.Meaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?You can write your stream in such a way that you "map" a memory area. E.g. you call "stream.waitForSoManyBytes(123);" and then "ubyte[] arr = stream.mapMemory(123);" where arr is then a slice into the stream's internal buffer. (This also requires a mechanism to release the buffer after use, so the stream can reuse it: stream.doneWithSoManyBytes(123);) -- Marco
Mar 29 2014
On Fri, 28 Mar 2014 22:23:29 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/28/2014 3:42 AM, Steven Schveighoffer wrote:I don't understand this point/question. -SteveMeaning it must also write through a pointer if it did read something. How is that faster than a 1 element buffer?I'm also curious what a generic read() should do when the stream is empty.Returns 0 elements read.
Mar 31 2014
On 3/27/14, 2:24 PM, Walter Bright wrote:The range protocol is designed to work with streams.It's designed to work with containers.It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.It's not a giant fail, we just need to adjust the notion. Andrei
Mar 27 2014
On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:On 3/27/14, 2:24 PM, Walter Bright wrote:I know we talked about streams when we designed it.The range protocol is designed to work with streams.It's designed to work with containers.Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.It's not a giant fail, we just need to adjust the notion.
Mar 27 2014
Am Thu, 27 Mar 2014 17:20:25 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.On 3/27/14, 2:24 PM, Walter Bright wrote:I know we talked about streams when we designed it.The range protocol is designed to work with streams.It's designed to work with containers.Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.It's not a giant fail, we just need to adjust the notion.
Mar 28 2014
On 3/28/2014 1:32 AM, Johannes Pfau wrote:Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string? empty-front-popFront works with streams and non-streams. Why break this?
Mar 28 2014
28-Mar-2014 13:55, Walter Bright пишет:On 3/28/2014 1:32 AM, Johannes Pfau wrote:Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?empty-front-popFront works with streams and non-streams. Why break this?Pipe dream. -- Dmitry Olshansky
Mar 28 2014
On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:28-Mar-2014 13:55, Walter Bright пишет:Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network? I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings.On 3/28/2014 1:32 AM, Johannes Pfau wrote:Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?
Mar 28 2014
28-Mar-2014 21:07, Walter Bright пишет:On 3/28/2014 9:48 AM, Dmitry Olshansky wrote:WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.28-Mar-2014 13:55, Walter Bright пишет:Yes, it does require a one element buffer. But seriously, does a one character buffer from a socket have a measurable impact on reading from a network?On 3/28/2014 1:32 AM, Johannes Pfau wrote:Certainly NOT a socket. There is no escaping the fact that there are specifics to unbuffered direct streams. What you mention only makes sense with buffering either implicit or (I'd prefer) explicit.Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?I'm an efficiency wonk as much or more than anyone, and this appears to me to be a false savings.Oh crap. This is very wrong. Do you often work with I/O and networking? -- Dmitry Olshansky
Mar 28 2014
On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.That's why we have things like byLine().
Mar 28 2014
28-Mar-2014 22:29, Walter Bright пишет:On 3/28/2014 10:11 AM, Dmitry Olshansky wrote:Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW. -- Dmitry OlshanskyWAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.That's why we have things like byLine().
Mar 28 2014
On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.How about a PR to fix it?
Mar 28 2014
On Friday, 28 March 2014 at 22:05:33 UTC, Walter Bright wrote:On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:Yes. I hold the opinion that there is not a whole lot of reason why something like byChunk can't be fast like we might desire. If it's down to getting chunks of data the right way, whatever that is, and the problem is an IO bound problem, then I don't see why we can't submit pull requests to offer optimisations on it. I don't think we need to go down this "the way it does IO isn't fast enough so we need to replace it" route. Just optimise the existing thing more.Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.How about a PR to fix it?
Mar 28 2014
29-Mar-2014 02:05, Walter Bright пишет:On 3/28/2014 11:40 AM, Dmitry Olshansky wrote:Sorry, I'm not inclined to do any work on hacking arbitrary C runtime libraries. Too much work on reverse engineering and making sure it stays in sync. We need new I/O package and hopefully Steven has something brewing. -- Dmitry OlshanskyWhich uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.How about a PR to fix it?
Mar 29 2014
On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:28-Mar-2014 22:29, Walter Bright пишет:byLine doesn't use getc. -- AndreiOn 3/28/2014 10:11 AM, Dmitry Olshansky wrote:Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.That's why we have things like byLine().
Mar 28 2014
29-Mar-2014 06:47, Andrei Alexandrescu пишет:On 3/28/14, 11:40 AM, Dmitry Olshansky wrote:Indeed not every version. Linux looks almost OK. There is still this stuff in empty: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1604 And looking through the code I see that Win64, OSX and FreeBSD versions still use getc. Win32 hacks right into DMC run-time, and on linux getdelim is used. The point about ranges only ever making sense over buffered I/O still stands. Preferably not C runtime. -- Dmitry Olshansky28-Mar-2014 22:29, Walter Bright пишет:byLine doesn't use getc. -- AndreiOn 3/28/2014 10:11 AM, Dmitry Olshansky wrote:Which uses C's BUFFERED I/O and it reads from it byte by byte via getc. Even though sys calls are amortized by C runtime, we have a function call per byte. No wonder it's SLOW.WAT? The overhead is in issuing system calls, you'd want to do as little of them as possible. Reading byte by byte is an exemplar of idiocy in I/O code.That's why we have things like byLine().
Mar 29 2014
Am Fri, 28 Mar 2014 02:55:45 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:On 3/28/2014 1:32 AM, Johannes Pfau wrote:Sorry, but no. It sure sounds nice first, but I can't really imagine a use case where I would want any generic algorithms to work directly on a socket. Having a look on the cheat sheet of std.algorithm 99% of these algorithms do not make sense to operate on sockets, those that do (count, until) would be bad in performance terms when operating directly byte per byte. You at least want buffered reads when doing IO operations. If you then extend range interface to give access to an internal buffer you just reinvented streams.Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.Do you see a point to be able to, in an algorithm, seamlessly swap a socket with a string?empty-front-popFront works with streams and non-streams. Why break this?It 'works' with streams but it's way too slow. You don't want to read byte-per-byte. Of course you can always implement ranges on top of streams. Usually these will not provide byte-per-byte access but efficient higher level abstractions (byLine, byChunk, decodeText). The point is you can implement ranges on streams easily, but you can't use ranges as the generic primitive for raw data. What's the element type of a data range? ubyte - performance sucks ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont work as expected (find, count, ...). (the call empty/don't call empty discussion is completely unrelated to this, btw. You can implement ranges on streams either way, but again, using ranges for raw data streams is not a good idea.)
Mar 28 2014
On Friday, 28 March 2014 at 16:59:05 UTC, Johannes Pfau wrote:It 'works' with streams but it's way too slow. You don't want to read byte-per-byte. Of course you can always implement ranges on top of streams. Usually these will not provide byte-per-byte access but efficient higher level abstractions (byLine, byChunk, decodeText). The point is you can implement ranges on streams easily, but you can't use ranges as the generic primitive for raw data. What's the element type of a data range? ubyte - performance sucks ubyte[n], ubyte[] now you have a range of ranges, most algorithms wont work as expected (find, count, ...). (the call empty/don't call empty discussion is completely unrelated to this, btw. You can implement ranges on streams either way, but again, using ranges for raw data streams is not a good idea.)I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently. auto fileByteRange = File("something").byChunk(chunkSize).joiner; I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.
Mar 28 2014
Am Fri, 28 Mar 2014 17:22:26 +0000 schrieb "w0rp" <devw0rp gmail.com>:I think a key is to offer something with gives you chunks at a time right at the top, and the use .joiner on that. I read files this way currently. auto fileByteRange = File("something").byChunk(chunkSize).joiner;byChunk is implemented on top of the file rawRead API though, and that's a stream API ;-) As said before implementing ranges on top of streams is fine, but if you want ranges to replace streams as the lowest level interface you'll either suffer from performance issues or you'll have to extend the range interface and effectively make it a stream interface. (For example byChunk doesn't offer a way to provide a buffer. I'd expect a low level API to offer this, but it'll complicate range API a lot. File.rawRead on the other hand provides exactly that and you can implement byChunk on top of rawRead easily. The other way round is not as easy). BTW: If this code performs well of course depends what you with that fileByteRange range. For example if you only read the complete file into a memory buffer joiner would reduce performance significantly.I believe this to be a very good way to get good performance without losing the functionality of std.algorithm.Yes, that's exactly how ranges/streams should interface, there's no real problem for users. stream.getSomeRange().rangeAPICalls....
Mar 28 2014
On Friday, 28 March 2014 at 08:34:08 UTC, Johannes Pfau wrote:Am Thu, 27 Mar 2014 17:20:25 -0700 schrieb Walter Bright <newshound2 digitalmars.com>:There are stream iterators in C++: http://www.cplusplus.com/reference/iterator/istream_iterator/On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:Ranges have equivalents in other languages: iterators in c++, Iterator in java all these languages have special stream types for raw data. I don't think it's bad if we also have streams/ranges separate in D.On 3/27/14, 2:24 PM, Walter Bright wrote:I know we talked about streams when we designed it.The range protocol is designed to work with streams.It's designed to work with containers.Are you suggesting that ranges needn't support streams? Note also that I suggested a way Steven could create an adapter with the behavior he desired, yet still adhere to protocol. No notion adjustments required.It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.It's not a giant fail, we just need to adjust the notion.
Mar 28 2014
28-Mar-2014 04:20, Walter Bright пишет:On 3/27/2014 2:56 PM, Andrei Alexandrescu wrote:If streams are like hot raw sockets then certainly it doesn't make sense. Ranges work nice when there are some "slots" down below, that is a buffer.On 3/27/14, 2:24 PM, Walter Bright wrote:I know we talked about streams when we designed it.The range protocol is designed to work with streams.It's designed to work with containers.Yes they "don't support streams". They are an abstraction somewhere on top of buffering. This is where they fit nicely. -- Dmitry OlshanskyAre you suggesting that ranges needn't support streams?It's a giant fail if they do not, or if you want to create a separate, non-range universe to deal with streams.It's not a giant fail, we just need to adjust the notion.
Mar 28 2014
The protocol is not intuitive. I'm not entirely against having to call something to do lazy initialization but it's not like anyone will expect empty property to lazily initialize. Empty and front could be methods added by a mixin template which would use efficient methods with strict protocol implemented by the user.Even a single byte buffer is not enough, you need a bool to indicate the stream is done.Right. But empty for a stream still has to read. Just follow the protocol, and the range will work, even with streams.
Mar 27 2014
On 3/27/2014 3:23 PM, QAston wrote:The protocol is not intuitive.I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive. But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical. Sometimes ya gotta go with what works.
Mar 27 2014
On Friday, 28 March 2014 at 00:41:42 UTC, Walter Bright wrote:On 3/27/2014 3:23 PM, QAston wrote:I _strongly_ agree with Walter: people learning D in my groups have no problems with the empty-front-popFront sequence. Please don't complicate or change the notion of range: you can find an adjustment that don't break code, but for sure that will break the mindset of people. For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront. -- PaoloThe protocol is not intuitive.I find empty-front-popFront as perfectly intuitive. I don't find the counter proposals, which come with baggage like constructors that may fail, and front() that may fail in unspecified ways, or throwing entire paradigms out the window, as intuitive. But I concede that other people think differently. Not everyone thinks the same. But consider this: floating point math is not intuitive. There has never been a shortage of proposals to make fp intuitive, but they've all failed because they are impractical. Sometimes ya gotta go with what works.
Mar 28 2014
On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 28 2014
On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural. -- PaoloFor what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R
Mar 28 2014
On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjsOn Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R
Mar 28 2014
On Fri, 28 Mar 2014 16:30:36 -0000, John Stahara <john.stahara+dlang gmail.com> wrote:On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:Thanks, that was confusing me :) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/On Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjsOn Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R
Mar 28 2014
On Fri, Mar 28, 2014 at 04:30:36PM +0000, John Stahara wrote:On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:[...] FWIW I find it perfectly natural as well. (But I know not everyone agrees. :P) T -- The two rules of success: 1. Don't tell everything you know. -- YHLOn Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup.On Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R
Mar 28 2014
On Friday, 28 March 2014 at 16:30:36 UTC, John Stahara wrote:On Fri, 28 Mar 2014 16:23:11 +0000, Paolo Invernizzi wrote:Thank you John, that's exact: I'm talking about my colleagues working with D. -- PaoloOn Friday, 28 March 2014 at 09:30:25 UTC, Regan Heath wrote:To clarify for Mr. Invernizzi: the "we" to which he refers is the group of people he works with, and /not/ the members of this newsgroup. --jjsOn Fri, 28 Mar 2014 08:59:34 -0000, Paolo Invernizzi <paolo.invernizzi no.address> wrote:What I'm meaning, it's that we don't care: we are always respecting the sequence "empty > front > pop", and everybody here find it natural.For what concern us, everyone here is happy with the fact that empty *must* be checked prior to front/popFront.This is actually not true. R
Mar 28 2014
On 3/27/14, 1:58 PM, Walter Bright wrote:On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:That's a good point too. AndreiYah, agreed. -- AndreiCompletely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 27 2014
On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu wrote:On 3/27/14, 1:58 PM, Walter Bright wrote:Yes, but the "observability" should be sort lived, since empty is virtually guaranteed to be followed by "front" anyways. If both front/empty do the same (lazy) operation on the stream, then whether or not empty was called isn't really "observable" (to a certain extent). In contrast, having front completly fail if empty was NOT called, means "empty" has a hell of a lot of observable side effect.On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:That's a good point too. AndreiYah, agreed. -- AndreiCompletely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 27 2014
On Thursday, 27 March 2014 at 21:54:00 UTC, Andrei Alexandrescu wrote:On 3/27/14, 1:58 PM, Walter Bright wrote:Maybe new kind of range can help? Which has "bool popFront()" instead of "bool empty() + void popFront()".On 3/27/2014 12:50 PM, Andrei Alexandrescu wrote:That's a good point too. AndreiYah, agreed. -- AndreiCompletely unworkable. To determine if a range is empty or not, it may have to actually read from its input. TTYs are an example. Although empty may then cache the result, and not read the second time it is called, an observer of TTY could see that an item was read from the TTY.
Mar 28 2014
On 3/27/2014 3:49 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:I was originally going to do that, but then I took a closer look at the documentation, which says ([1] in the documentation of `isInputRange()`): "Calling r.front is allowed only if calling r.empty has, or would have, returned false." (And the same for `popFront()`.) That is, the documentation more or less explicitly states that you don't actually need to call `empty` if you know it returned `true`. [1] http://dlang.org/phobos/std_range.htmlRight, it does say that, and it is seriously wrong.
Mar 27 2014
On Thu, 27 Mar 2014 00:17:21 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/26/2014 7:55 PM, Steven Schveighoffer wrote:Like range.save. It's "required", but frequently omitted, without any consequences. I'm not saying requiring it would be an invalid decision, I'm saying requiring it would be a futile gesture -- the requirement would be ignored. What happens when one of our "clients" code breaks severely because we make a change in phobos that assumes empty is always called first on a newly-created range?OK, but it's logical to assume you *can* avoid a call to empty if you know what's going on under the hood, no? Then at that point, you have lost the requirement -- people will avoid calling empty because they can get away with it, and then altering the under-the-hood requirements cause code breakage later. Case in point, the pull request I referenced, the author originally tried to just use empty to lazily initialize filter, but it failed due to existing code in phobos that did not call empty on filtered data before processing. He had to instrument all 3 calls.As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows.If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol. BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell. -Steve
Mar 27 2014
On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote:What happens when one of our "clients" code breaks severely because we make a change in phobos that assumes empty is always called first on a newly-created range?You probably can run-time test this in Debug builds, but…I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol.This is it. There is way too much awkwardness in D already. And essentially, the closer the design is to mainstream the more annoying it is when you diverge and are inconsistent. Code should behave as expected without any need for memorizing. Having to differentiate between dialects is so much harder than differentiating between orthogonal languages.
Mar 27 2014
On 3/27/2014 5:45 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrote:Sure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically.I think we should work on making a protocol that does not require awkward calls, rather than alienating developers who don't follow the awkward protocol.This is it.
Mar 27 2014
On Thu, 27 Mar 2014 17:06:35 -0400, Walter Bright = <newshound2 digitalmars.com> wrote:On 3/27/2014 5:45 AM, "Ola Fosheim Gr=C3=B8stad" =<ola.fosheim.grostad+dlang gmail.com>" wrote:e:On Thursday, 27 March 2014 at 12:35:04 UTC, Steven Schveighoffer wrot=I think we should work on making a protocol that does not require =d =awkward calls, rather than alienating developers who don't follow the awkwar=If this is all we needed, ranges would never have been invented. -SteveSure, but what is awkward about empty-front-popFront? You can still always use: foreach (e; range) { ... } which will follow the correct protocol automatically.protocol.This is it.
Mar 27 2014
On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell.I think byXchar are important and are not streams. -- Andrei
Mar 27 2014
On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:What's broken about those? -SteveBTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell.I think byXchar are important and are not streams. -- Andrei
Mar 27 2014
On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Speed.On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:What's broken about those?BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell.I think byXchar are important and are not streams. -- Andrei
Mar 27 2014
On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu wrote:On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:I call shenanigans. This is the code: property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks. Furthermore, popFront() has a guaranteed "1 call" per element. Weareas "empty" and "front" may be called several times in a row for a single element. If you want efficiency, stop being "member-wise" lazy, and put some eagerness in your code.On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Speed.On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:What's broken about those?BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell.I think byXchar are important and are not streams. -- Andreifilter comes to mind. -- AndreiYou *just* turned down a pull that did exactly that, for exactly the same reasons. Have "byXChar" function like filter: construction and popFront do the work, and front/empty is 0(1), and const, and strongly pure to boot. I *know* the compiler likes that in terms of speed.
Mar 27 2014
On 3/27/14, 9:12 AM, monarch_dodra wrote:I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andreifilter comes to mind. -- AndreiYou *just* turned down a pull that did exactly that, for exactly the same reasons.
Mar 27 2014
On Thursday, 27 March 2014 at 19:03:26 UTC, Andrei Alexandrescu wrote:On 3/27/14, 9:12 AM, monarch_dodra wrote:I still think there's ambiguity in the word "lazy". I think this is the distinction most make: Not lazy: //---- auto r = getSomeRange(); auto arr = someRange.array(); foreach( ref e ; arr) doSomething; remove!fun(arr); return arr; //---- Lazy: //---- return getSomeRange() .map!doSomething() .filter!fun() .array(); //---- What you are talking about (IMO) is better described as "eager" vs "not eager". "A constructor does some work for a lazy range that does eager processing of its elements": I think this is a better description, and I think is acceptable.I was on the verge, and we need to discuss this more. A constructor that does arbitrary work for a lazy range doesn't sit well. -- Andreifilter comes to mind. -- AndreiYou *just* turned down a pull that did exactly that, for exactly the same reasons.
Mar 27 2014
On 3/27/2014 9:12 AM, monarch_dodra wrote:If you initialized the next element in both constructorAs mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it. As also mentioned earlier, this throws under the bus entire categories of use cases for ranges, all to avoid a single call to 'empty'.
Mar 27 2014
On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 9:12 AM, monarch_dodra wrote:Not impossible. Use lazy initialization.If you initialized the next element in both constructorAs mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.As also mentioned earlier, this throws under the bus entire categories of use cases for ranges, all to avoid a single call to 'empty'.This is going to look weird: r.empty; // now we can use it... It only makes sense if you are using empty in your processing of the range. -Steve
Mar 27 2014
On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright <newshound2 digitalmars.com> wrote:That still doesn't work, because then you could be calling front, and front might fail if there's no input. What is wrong with following the protocol? Why must we introduce all these style LINQ, etc.? For what gain?On 3/27/2014 9:12 AM, monarch_dodra wrote:Not impossible. Use lazy initialization.If you initialized the next element in both constructorAs mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.It only makes sense if you are using empty in your processing of the range.I.e. follow the danged protocol and it works.
Mar 27 2014
On Thu, 27 Mar 2014 17:28:05 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 1:59 PM, Steven Schveighoffer wrote:In the cases where you know there is input, that's not true. What I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case.On Thu, 27 Mar 2014 16:53:35 -0400, Walter Bright <newshound2 digitalmars.com> wrote:That still doesn't work, because then you could be calling front, and front might fail if there's no input.On 3/27/2014 9:12 AM, monarch_dodra wrote:Not impossible. Use lazy initialization.If you initialized the next element in both constructorAs mentioned earlier, requiring this of a range constructor makes it impossible to separate composition of a range with using it.What is wrong with following the protocol? Why must we introduce all these caveats for ranges, like streams not working, front that can fail,There is no gain to requiring empty on ranges where you logically prove they are not empty. It's a wasted call. For things where you're not sure if they are empty or not (e.g. an input file), of course empty is required to be called, and of course it may do some processing to determine that. But I would only expect that on the first call. Subsequent checks would already be done by popFront.I mean, if you are calling empty regularly, because you aren't sure whether the elements are there. That's not always the case, sometimes you are sure the elements are there. Requiring a call to empty to do anything other than to check whether a range is empty, is just plain wrong. I'm sure there's better ways to implement what you want without stitching the functionality onto empty. -SteveIt only makes sense if you are using empty in your processing of the range.I.e. follow the danged protocol and it works.
Mar 27 2014
On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:In the cases where you know there is input, that's not true.The range doesn't know what you know.What I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case.You can't use that on generic code, because generic code cannot assume that front will not fail.I know that you want to get rid of empty. But getting rid of empty means that front may fail. This is why there is an empty, and any generic code MUST respect that. What you can do is, in your range: enum empty = true; and then it won't cost anything to call it.What is wrong with following the protocol? Why must we introduce all these style LINQ, etc.? For what gain?There is no gain to requiring empty on ranges where you logically prove they are not empty. It's a wasted call. For things where you're not sure if they are empty or not (e.g. an input file), of course empty is required to be called, and of course it may do some processing to determine that. But I would only expect that on the first call. Subsequent checks would already be done by popFront.I mean, if you are calling empty regularly, because you aren't sure whether the elements are there. That's not always the case, sometimes you are sure the elements are there.Then use an adapter range that has: enum empty = true; and forwards the front and popFront() calls to its parameter. You'll get the optimization you want, and won't violate protocol.Requiring a call to empty to do anything other than to check whether a range is empty, is just plain wrong.There are solid reasons to require it (i.e. streams), and you've responded that ranges needn't even support streams. If I have interpreted your position correctly, I cannot agree with it.
Mar 27 2014
On Thu, 27 Mar 2014 19:52:52 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 3/27/2014 2:46 PM, Steven Schveighoffer wrote:There is a difference here that I want to establish. First, I completely agree that for generic code, for when an unknown range is passed in, empty is required to be called to verify that front is valid. Second, while I agree that empty should be required to verify front is valid on unknown ranges, it should not be required, when it is known that it's not empty. What you are proposing is that we take advantage of the requirement for empty on unknown ranges to require it on known ones. I disagree, the code looks awkward and incorrect. When I know something is empty, calling empty again doesn't give me information. instead empty now becomes "is it empty, and if it needs priming, prime it." I'd rather see another primitive, since empty does not convey that message. A counter proposal -- What if we create a global method in std.range called prime. prime(r) will call r.prime if it exists, otherwise calls r.empty, and ignores the result. Code that wants to work with "primable" ranges, can call that function, and it's harmless for normal ranges, but will allow primable ranges to fetch the first element. I think penalizing all range-using code to force a non-functional empty call would 1. artificially invalidate lots of currently valid code and 2. look completely awkward in cases where it's not normally necessary. -SteveWhat I am protesting is the idea that you *know* input exists, you must still call empty. Front is enough in that case.You can't use that on generic code, because generic code cannot assume that front will not fail.
Mar 28 2014
On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:On Thursday, 27 March 2014 at 15:45:14 UTC, Andrei Alexandrescu wrote:... but of course lose laziness.On 3/27/14, 8:44 AM, Steven Schveighoffer wrote:I call shenanigans. This is the code: property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks.On Thu, 27 Mar 2014 11:40:59 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Speed.On 3/27/14, 5:35 AM, Steven Schveighoffer wrote:What's broken about those?BTW, I think there has been recent talk about not focusing on dust when we are laying bricks. This would have my vote as useless dust. This does not solve the problem of streams-as-ranges, because streams don't make good ranges. It doesn't really solve any problems as far as I can tell.I think byXchar are important and are not streams. -- AndreiFurthermore, popFront() has a guaranteed "1 call" per element. Weareas "empty" and "front" may be called several times in a row for a single element. If you want efficiency, stop being "member-wise" lazy, and put some eagerness in your code.Does a flag "isInitialized" even have an effect on performance where it matters? It should only be relevant in tight loops, and there I expect a moderately well-working optimizer to inline calls to `empty` and `front`. It can then see that it has just set "isInitialized" to true/false, and remove the checks completely, effectively generating code as if the value were only fetched/computed in `empty`. That only leaves the larger struct size as a potential problem. However I've seen LDC being able to "decompose" a struct into separate variables, so I guess it can remove the unnecessary member in our case. In other situations, where you already have more complex code, an additional conditional branch will not have as much weight anyway. Does somehow have numbers for realistic programs, with GDC/LDC/DMD? I suspect this entire discussion is moot, because we can easily have non-eagerly initialized ranges without sacrificing performance in most situations.
Mar 28 2014
On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> = wrote:On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:=I call shenanigans. This is the code: property bool empty() { if (nLeft) return false; if (r.empty) return true; If you initialized the next element in both constructor and popFront,=In this case, laziness is not critical. Decoding the element is an O(1) = = operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you = = have decoded on every empty or front call may override the front-loaded = = cost of decoding the first element on construction. It's sure to add to = = the cost if you are processing all 20 elements, since you decode them al= l = anyway. On other ranges, it's more important when the first element costs a lot = to = fetch. HOWEVER, it's not critically important to delay that unless you a= re = not going to process that element. For example, if you are foreach'ing = over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, dependi= ng = on the situation. Requiring a protocol change for such detailed knowledg= e = seems unbalanced. -Stevethen you'd get rid of both these checks.... but of course lose laziness.
Mar 28 2014
On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote:On Fri, 28 Mar 2014 07:47:22 -0400, Marc Schütz <schuetzm gmx.net> wrote:I was more thinking of the fact that you need to read something on construction, rather than on consumption, and this reading might be noticeable. There was the example of `stdin.byLine().filter(...)` (or something similar, don't remember exactly), which reads from stdin on construction. This changes the behaviour of the program, because the read operation will (probably) block. I'd suggest to make it a requirement for ranges and algorithms _not_ to start consuming the underlying data until one of empty/front/popFront is called, even if that has a negative effect on performance. That's why I was asking for performance numbers, to see whether there even is an effect. If there isn't, that's just another argument for adding that requirement. This is then, IMO, a very acceptable additional burden to place on the writers of ranges. I agree, however, that it's not a good idea to change the range protocol, i.e. what _users_ of ranges have to abide by. That would be a breaking change, and it would be an especially bad one because there I see no way to detect that a user failed to call `empty` in an iteration if they knew that there are more elements available.On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:In this case, laziness is not critical. Decoding the element is an O(1) operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if you have decoded on every empty or front call may override the front-loaded cost of decoding the first element on construction. It's sure to add to the cost if you are processing all 20 elements, since you decode them all anyway. On other ranges, it's more important when the first element costs a lot to fetch. HOWEVER, it's not critically important to delay that unless you are not going to process that element. For example, if you are foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, depending on the situation. Requiring a protocol change for such detailed knowledge seems unbalanced.If you initialized the next element in both constructor and popFront, then you'd get rid of both these checks.... but of course lose laziness.
Mar 29 2014
On Sat, 29 Mar 2014 17:02:30 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> = wrote:On Saturday, 29 March 2014 at 01:36:46 UTC, Steven Schveighoffer wrote=:t> =On Fri, 28 Mar 2014 07:47:22 -0400, Marc Sch=C3=BCtz <schuetzm gmx.ne=t, =wrote:On Thursday, 27 March 2014 at 16:12:36 UTC, monarch_dodra wrote:If you initialized the next element in both constructor and popFron=1) =In this case, laziness is not critical. Decoding the element is an O(=then you'd get rid of both these checks.... but of course lose laziness.ou =operation, and when looping through, you will decode it anyway. When processing a 20 element string, the cost of checking to see if y=ed =have decoded on every empty or front call may override the front-load=to =cost of decoding the first element on construction. It's sure to add ==the cost if you are processing all 20 elements, since you decode them=ot =all anyway. On other ranges, it's more important when the first element costs a l==to fetch. HOWEVER, it's not critically important to delay that unless=you are not going to process that element. For example, if you are =foreach'ing over all the elements, the delay doesn't matter. I'd rather the higher level code decide whether to delay or not, =depending on the situation. Requiring a protocol change for such =detailed knowledge seems unbalanced.I was more thinking of the fact that you need to read something on =construction, rather than on consumption, and this reading might be =noticeable. There was the example of `stdin.byLine().filter(...)` (or ==something similar, don't remember exactly), which reads from stdin on ==construction. This changes the behaviour of the program, because the =read operation will (probably) block.Blocking operations may have a noticeable impact, but they may not. It = depends on what you do between construction and processing. For example, if you have: foreach(l; stdin.byLine) Lazily fetching the first line makes no operational difference whatsoeve= r, = even if it blocks, because you're immediately going to process it. But if it *is* lazily fetched, you are paying some possibly small, but = nonzero, cost for that laziness that you didn't need to pay. This is why I said it's not critically important to delay unless you are= = not going to process the first element. Because there is no primitive to= = "prime" the range, we must do it on a property fetch, or on construction= . = But after that, popFront is a perfectly reasonable place to get the next= = element. All this fuss is over the *first* element in the range. I think providin= g = a mechanism to decide whether you want it now or later is reasonable. Fo= r = example a lazy range wrapper. Note, however, the case Andrei was arguing about was one of the string = decoders. When you are blocking for input, hell, even if you aren't = blocking, but need to call system calls to get it, the performance cost = of = checking a boolean every call is negligible. But when you are decoding 1= -4 = bytes of data in memory, checking a boolean becomes significant. -Steve
Mar 31 2014
On Monday, 31 March 2014 at 11:40:11 UTC, Steven Schveighoffer wrote:Blocking operations may have a noticeable impact, but they may not. It depends on what you do between construction and processing. For example, if you have: foreach(l; stdin.byLine) Lazily fetching the first line makes no operational difference whatsoever, even if it blocks, because you're immediately going to process it. But if it *is* lazily fetched, you are paying some possibly small, but nonzero, cost for that laziness that you didn't need to pay. This is why I said it's not critically important to delay unless you are not going to process the first element. Because there is no primitive to "prime" the range, we must do it on a property fetch, or on construction. But after that, popFront is a perfectly reasonable place to get the next element. All this fuss is over the *first* element in the range. I think providing a mechanism to decide whether you want it now or later is reasonable. For example a lazy range wrapper.I've found the example I was talking about: http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-d puremagic.com I misremembered, filter wasn't even involved. But similar situations may arise from using filter or another range that eagerly fetches the first element, even if its source doesn't.Note, however, the case Andrei was arguing about was one of the string decoders. When you are blocking for input, hell, even if you aren't blocking, but need to call system calls to get it, the performance cost of checking a boolean every call is negligible. But when you are decoding 1-4 bytes of data in memory, checking a boolean becomes significant.That's what I meant by a tight loop. My hypothesis is that in such cases the optimizers of at least GDC and LDC are good enough to remove the check for the boolean completely, and that at least LDC can then remove the boolean itself from the range structure.
Mar 31 2014
On Mon, 31 Mar 2014 13:43:05 -0400, Marc Sch=C3=BCtz <schuetzm gmx.net> = wrote:I've found the example I was talking about: http://forum.dlang.org/thread/mailman.323.1393458346.6445.digitalmars-=d puremagic.comI misremembered, filter wasn't even involved. But similar situations m=ay =arise from using filter or another range that eagerly fetches the firs=t =element, even if its source doesn't.Sure, but fetching lazily adds cost. It's possible to add it as an optio= n = when needed, but it's not easy to reclaim the cost if it's the default = implementation.That's what I meant by a tight loop. My hypothesis is that in such cas=es =the optimizers of at least GDC and LDC are good enough to remove the =check for the boolean completely, and that at least LDC can then remov=e =the boolean itself from the range structure.I suspect that this is an optimization that will break down when not don= e = in the "right way." I'd rather be able to choose, do I need lazy = evaluation or not? The cases where lazy evaluation is needed are not = common. -Steve
Mar 31 2014
On 03/27/2014 05:17 AM, Walter Bright wrote:As with *any* API, if you look under the hood and make assumptions about the behavior based on a particular implementation, assumptions that are not part of the API, the risk of breakage inevitably follows. If you've identified Phobos code that uses ranges but does not follow the protocol, the Phobos code is broken - please file a bugzilla issue on it.I feel uneasy about the assertion that code that assumes that 'empty' checks for an empty range is broken. If the API does not allow fast range implementations without arbitrarily restrictions on some 'protocol', then this is a problem with the API. Maybe empty/front/popFront should just be a single primitive, like: Nullable!T get();
Mar 27 2014
On 03/27/2014 11:48 PM, Timon Gehr wrote:arbitrarily restrictions on some 'protocol'arbitrarily restricting interactions to some 'protocol'
Mar 27 2014
On 3/27/2014 3:48 PM, Timon Gehr wrote:Maybe empty/front/popFront should just be a single primitive, like: Nullable!T get();What if get() fails?
Mar 27 2014
On Thu, Mar 27, 2014 at 05:11:55PM -0700, Walter Bright wrote:On 3/27/2014 3:48 PM, Timon Gehr wrote:That's the whole point of Nullable!T: you can return null if get() fails. I can't say I like this, though. There are many cases in my code that need separation of .empty from .front from .popFront, such as recursive descent parsers or code that does if-match-then-consume algorithms. One of the *nice* things about the current range API is that that code works with arrays, streams, and lots of other stuff, without needing to care about implementation details. Changing it to get() will require lots of manual buffering and reintroduce lots of ugly boilerplate that I had to live with in C++. I'm with Walter on this one. Generic code should NOT assume anything about a range it was given, and therefore it should call .empty before calling .front or .popFront. If you "know" that a particular range doesn't require calling .empty beforehand, then by definition your code is no longer generic, and you just have to live with the consequences of that. Nothing stops you, for example, from exposing a .get function or whatever else, *in addition* to the standard range API. Then code that knows how to deal with .get will use it, and your range remains usable with other generic code. Nothing about the range API dictates that .empty cannot do useful work like initialize buffers. That should be an implementation detail that generic code shouldn't rely on. T -- The early bird gets the worm. Moral: ewww...Maybe empty/front/popFront should just be a single primitive, like: Nullable!T get();What if get() fails?
Mar 27 2014
On Friday, 28 March 2014 at 02:14:40 UTC, H. S. Teoh wrote:I'm with Walter on this one. Generic code should NOT assume anything about a range it was given, and therefore it should call .empty before calling .front or .popFront. If you "know" that a particular range doesn't require calling .empty beforehand, then by definition your code is no longer generic, and you just have to live with the consequences of that. Nothing stops you, for example, from exposing a .get function or whatever else, *in addition* to the standard range API. Then code that knows how to deal with .get will use it, and your range remains usable with other generic code.It's not that *you* know a particular range doesn't need "empty" called, it's that the algorithm you are using has already previously validated there are elements in it. For example, the splitter algorithm will first save a copy of its range, and then walk it, searching for the "splitting" elements. It then realizes it has walked N elements. From there, it takes the original range, and packs it into a "takeExactly(N)" of the original range. Iterating that "takeExactly(N)" is faster than a raw "take", *because* "takeExactly" was already promised that the range holds N elements, and as such, *doesn't* check for empty. Ditto for "findSplit". And again, there are functions, such as "copy", then simply *require* that a certain range have at least a certain amount of elements. Why check for it, if not providing it is a violation of its interface. On Thursday, 27 March 2014 at 23:52:46 UTC, Walter Bright wrote:I know that you want to get rid of empty. But getting rid of empty means that front may fail. This is why there is an empty, and any generic code MUST respect that.Front only fails *if* the range is empty. Not if you fail to call empty. Generic code respects that.What you can do is, in your range: enum empty = true;That's not the same. That's making the assumption your range will *never* be empty, which is a whole other concept (infinite ranges).
Mar 28 2014
On 27.03.2014 03:48, Walter Bright wrote:On 3/26/2014 7:19 PM, Steven Schveighoffer wrote:IIUC what you are proposing would be covered better by removing front and return the value by popFront. Instead of forcing strong, breaking and sometimes unintuitive semantics we could also improve dmd's optimizer. This caching range example: /////////////////////////////////// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0); } T front() { empty(); return _cache; } void popFront() { _cached = false; } } int foo(irange!int rg) { int sum = 0; while(!rg.empty) { sum += rg.front; rg.popFront; } return sum; } /////////////////////////////////// generates this code with "dmd2.065 -O -inline -release -noboundscheck -m64": _D7irange23fooFS7irange213__T6irangeTiZ6irangeZi: 0000: push rbp 0001: mov rbp,rsp 0004: push rax 0005: push rsi 0006: mov qword ptr [rbp+10h],rcx 000A: xor esi,esi 000C: lea rcx,[rbp+10h] 0010: sub rsp,20h 0014: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0019: add rsp,20h 001D: xor al,1 001F: je 0050 0021: lea rcx,[rbp+10h] 0025: sub rsp,20h 0029: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 002E: add rsp,20h 0032: mov eax,dword ptr [rbp+14h] 0035: add esi,eax 0037: mov byte ptr [rbp+10h],0 003B: lea rcx,[rbp+10h] 003F: sub rsp,20h 0043: call _D7irange213__T6irangeTiZ6irange5emptyMFZb 0048: add rsp,20h 004C: xor al,1 004E: jne 0021 0050: mov eax,esi 0052: pop rsi 0053: lea rsp,[rbp] 0057: pop rbp 0058: ret _D7irange213__T6irangeTiZ6irange5emptyMFZb: 0000: push rbp 0001: mov rbp,rsp 0004: mov qword ptr [rbp+10h],rcx 0008: cmp byte ptr [rcx],0 000B: je 0011 000D: xor eax,eax 000F: pop rbp 0010: ret 0011: sub rsp,20h 0015: call _D7irange211__T4getcTiZ4getcFZi 001A: add rsp,20h 001E: mov rcx,qword ptr [rbp+10h] 0022: mov dword ptr [rcx+4],eax 0025: shr eax,1Fh 0028: mov byte ptr [rcx],al 002A: xor al,1 002C: pop rbp 002D: ret and this with gdc (-O2 on godbolt.org): int example.foo(example.irange!(int).irange): mov rax, rdi push rbx xor ebx, ebx shr rax, 32 test dil, dil je .L5 .L2: add ebx, eax .L5: call int example.getc!(int).getc() test eax, eax js .L2 mov eax, ebx pop rbx ret All traces of the caching but the initial state check have been removed, no extra calls.if(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); } Should we be required to re-verify that r2 is not empty before using it? It clearly is not, and would be an artificial requirement (one that map likely would not enforce!). This sounds so much like a convention that simply won't be followed, and the result will be perfectly valid code.The idea is that there are three functions: empty, front, and popFront. The functionality of the range will be distributed between these 3 functions. Different things being ranged over will naturally split their operations into the 3 functions in different ways. If the 3 functions can be called in any order, then extra logic has to be added to them to signal to each other. This slows things down. To write generic code calling ranges, it's necessary to stop assuming what is happening under the hood of empty, front, and popFront, and stick to the protocol.
Mar 26 2014
On 27.03.2014 07:53, Rainer Schuetze wrote:bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0); }Ouch, pasted before fixing: bool empty() { if(_cached) return false; _cache = getc!T(); _cached = _cache >= 0; // this line added return !_cached; } The asm is with an unintended "_cached = _cache < 0;", but that's the same, just accepting negative input instead of non-negative.
Mar 26 2014
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:IIUC what you are proposing would be covered better by removing front and return the value by popFront.Different things being ranged over make different decisions as to what goes into each function.Instead of forcing strong, breaking and sometimes unintuitive semanticsI don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); } What I do find unintuitive and unattractive is all those flags in the range implementation.we could also improve dmd's optimizer.Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.
Mar 27 2014
On 27.03.2014 09:42, Walter Bright wrote:On 3/26/2014 11:53 PM, Rainer Schuetze wrote:This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.IIUC what you are proposing would be covered better by removing front and return the value by popFront.Different things being ranged over make different decisions as to what goes into each function.Instead of forcing strong, breaking and sometimes unintuitive semanticsI don't understand what is so unintuitive about: while (!empty) { ... front ...; popFront(); }What I do find unintuitive and unattractive is all those flags in the range implementation.I guess gcc doesn't use patterns here, but value propagation and notices that all the checks are unnecessary.we could also improve dmd's optimizer.Yes, that's true. The thing about optimizers, though, is they work on patterns. If your code doesn't quite match the right pattern, it won't tickle the optimization you might be expecting. The pattern for recognizing the ROL and ROR patterns is one such.
Mar 27 2014
On 3/27/2014 12:21 PM, Rainer Schuetze wrote:This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 27 2014
On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:On 3/27/2014 12:21 PM, Rainer Schuetze wrote:Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. TobiThis loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 28 2014
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:In c++ if you had a list or deque you would obviously did if(empty()) before calling front() too, and the before a call to pop_front(). Container is kind of range and in c++ it looks exactly the same: while (!cont.empty()) { auto& var = cont.front(); cont.pop_front(); // consume } 3 operations are needed.On 3/27/2014 12:21 PM, Rainer Schuetze wrote:Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; }This loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 28 2014
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:Well, it might seem kind of weird at first that an InputRange has these concepts in three distinct methods, but they really are three separate operations. After I have spent enough time working with Java Iterators, I have found it much nicer to be able to get the current value or see if I'm at the end or not without having to worry about caching the current value myself. Also as I've said before it gives you some opportunity to make trying to fetch something out of an empty range less error prone, because you don't have to check for a null value, etc.On 3/27/2014 12:21 PM, Rainer Schuetze wrote:Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. TobiThis loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 28 2014
On Friday, 28 March 2014 at 23:14:56 UTC, Tobias Müller wrote:On Thursday, 27 March 2014 at 20:49:16 UTC, Walter Bright wrote:First sorry for my english. I agree. All work is done in "bool popFront()" (well, this name may no longer be the most appropriate). In this function the first / next item is obtained and caches. Returns true if any element, false otherwise. "front" return de cached element as now, as many times as desired and without side effects. "empty" function is no longer necessary, but it might be useful to keep changing the return type for example to int: 1 - Sure there is more elements 0 - Sure there is NO more elements -1 - I don't known. Try popFront to know if more element Of course this function as "front", without side effectsOn 3/27/2014 12:21 PM, Rainer Schuetze wrote:Disclaimer: I'm a C++ programmer just lurking here, I've never actually used D. I find it very counter-intuitive that 'empty' is required before front or popFront. Since 'pump priming' in the constructor isn't wanted either, i'd suggest the following protocol: while (popFront()) { front; } popFront is then required to return !empty. 'empty' as a separate property getter can stay but is not required for the protocol. This way, it's clear that the work to fetch the next element is always done in popFront. Generally I find dependecies between functions problematic that require a specific call sequence. If they can be removed, they should. With my proposed solution, there's still one minor dependency, namely that front is not valid before the first popFront. This could be solved by again combining the two, as proposed by someone else in this thread. TobiThis loop is intuitive. Not being allowed to call empty or front multiple times or not at all is unintuitive. They should not be named as if they are properties then.I can concede that. But I can't concede being able to call front without first calling empty, or calling popFront without calling empty and front, or requiring 'pump priming' in the constructor.
Mar 30 2014
On 3/26/2014 11:53 PM, Rainer Schuetze wrote:This caching range example: /////////////////////////////////// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0);What happens if empty is called twice in a row? Two characters get read! That isn't right.} T front() { empty(); return _cache; }What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges.void popFront() { _cached = false; } }
Mar 27 2014
On 27.03.2014 10:06, Walter Bright wrote:On 3/26/2014 11:53 PM, Rainer Schuetze wrote:Good catch. Somehow I pasted the wrong version, but posted the corrections a few minutes later.This caching range example: /////////////////////////////////// T getc(T)(); struct irange(T) { bool _cached; T _cache; this(T[] arr) { _cached = false; } bool empty() { if(_cached) return false; _cache = getc!T(); return (_cache < 0);What happens if empty is called twice in a row? Two characters get read! That isn't right.The caller is told to guarantee that empty must not return true. I just didn't wanted to repeat the stuff in empty. GDC removed it anyway...} T front() { empty(); return _cache; }What happens if empty returns true? EOF? I don't think that's intuitive. You could have front throw, but that prevents anyone from building nothrow ranges.
Mar 27 2014
On 3/27/2014 12:24 PM, Rainer Schuetze wrote:The caller is told to guarantee that empty must not return true.I think that is the very definition of looking under the hood in order to violate protocol. A range CANNOT be written generically and support that.
Mar 27 2014
On Thu, 27 Mar 2014 02:19:13 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:if(!r.empty) { auto r2 = map!(x => x * 2)(r); do { auto x = r2.front; ... } while(!r2.empty); }if(r.empty) return; auto r2 = map!(x => x * 2)(r); while(!r2.empty) { auto x = r2.front; ... r2.popFront(); //bug fix for your version which I noticed because I followed "the pattern" :D } ahh.. much better. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 27 2014
On Wednesday, March 26, 2014 10:36:08 Andrei Alexandrescu wrote:On 3/26/14, 8:37 AM, Steven Schveighoffer wrote:I don't know. It's not the end of the world if we require it (at least with a range that doesn't have length), since you're almost always going to be forced to call empty anyway, because ranges are generally used in generic code. However, it seems really off to me for there to be any real work going on in empty, and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior). Regardless, if the range has length, then I'd think that quite a few of the algorithms which actually used length wouldn't actually need to call empty, making calling empty unnecessary overhead. But for the most part, I think that a lot of the weird cases go away when you end up with length and random-access ranges and the like, since that usually means that the range isn't generative but likely has an array underneath (though map certainly has its quirks thanks to the fact that front could technically allocate a new object on every call, and it can be random-access). - Jonathan M DavisOn Wed, 26 Mar 2014 11:09:04 -0400, Regan Heath <regan netmail.co.nz> wrote:I think requiring users to call empty before front on input ranges is a concession we should make.On Wed, 26 Mar 2014 12:30:53 -0000, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yes, but when you know that empty is going to return false, there isn't any logical reason to call it. It is an awkward requirement. I had the same thinking as you, why pay for an extra check for all 3 calls? But there was already evidence that people were avoiding empty.Gah, I didn't cut out the right rules. I meant the two rules that empty must be called before others. Those are not necessary.I see. I was thinking we ought to make empty mandatory to give more guaranteed structure for range implementors, so lazy initialisation can be done in one place only, etc etc.
Mar 26 2014
On 3/26/2014 11:48 PM, Jonathan M Davis wrote:However, it seems really off to me for there to be any real work going on in empty,Many things, like some ttys, can not be tested for being empty without reading it, and reading of a tty is destructive.and I'd be leery of any range which actually required empty to be called before calling front (though I definitely think that calling front on an empty range should not be considered okay and would expect that to generally be undefined behavior).That's exactly why empty should be called before front - because front is expected to succeed.
Mar 27 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is.Some thoughts: - A while ago I realized I did not know how some of those underspecified details should be implemented (I think my initial doubt was if the range was expected/required to enforce(!empty)). Assuming it was just my ignorance (and incomplete or hard to find documentation), I asked in the IRC channel and, IIRC, the answer I got there was that ranges are an abstract concept, so such minutia were supposed to be implementation details. From this forum thread and that IRC feedback, it seems that, indeed, people have been operating under an illusion of a shared universal assumption. I'm very happy to see this addressed. - (Lots of text deleted because I was no longer sure about it ...) - Is it allowed for an InputRange to become empty and then not empty again? For a finite RandomAccessRange to increase in length? E.g.: // Example 1: (with "outside" control flow) if(someQueue.empty) addStuffToQueue(); assert(!someQueue.empty); // Example 2: (with only range-related control flow) if(!someQueue.empty) { auto n = someQueue.length; someQueue.popFront(); assert(someQueue.length == n-1); // can this fail? } - Is it allowed to not call front? E.g.: void dropAll(Range)(Range range) { while(!empty) range.popFront(); }
Mar 27 2014
""Luís Marques " wrote in message news:exmfmqftpykoaxdgluit forum.dlang.org...- Is it allowed for an InputRange to become empty and then not empty again? For a finite RandomAccessRange to increase in length? E.g.:Not by itself - if a range's empty has returned true, calling empty repeatedly should continue to return true. If you change it externally (ie not by using the range interface) then it can become non-empty. eg int[] arr; assert(arr.empty); arr ~= 3; // no longer empty, but we used a method outside the range interface to change it. I don't think a random number generator or some hardware device producing more data would be a good reason to change empty - range.empty is not asking 'is there more data ready', it's asking 'are there items left to iterate over'.- Is it allowed to not call front? E.g.:Yes.
Mar 29 2014
On Sunday, 23 March 2014 at 00:50:34 UTC, Walter Bright wrote:It's become clear to me that we've underspecified what an InputRange is.As someone who has missed this discussion, can I just say this: can we please document the result somewhere, and possibly even announce it clearly so that people know that something has changed?
Mar 29 2014