digitalmars.D - Ranges and/versus iterators
- clueless bystander (10/10) Mar 23 2010 Watching D evolve from the outside there seems to be a lot of ongoing di...
- Lars T. Kyllingstad (7/20) Mar 23 2010 I'm probably not the right person to answer your question, since I have
- clueless bystander (6/31) Mar 23 2010 Thanks Lars. I'm not quite willing to accept that 15 pages is succinct ...
- clueless bystander (7/32) Mar 23 2010 Yes, well, thanks again. The first 7 pages seemed to have plausible arg...
- Lars T. Kyllingstad (13/47) Mar 23 2010 The reason ranges are not popular is because they haven't had time to
- Andrei Alexandrescu (16/48) Mar 23 2010 Thank you for your interest. The article
- Jesse Phillips (16/29) Mar 23 2010 I think a good way to learn ranges is by looking at the interface to the...
- Igor Lesik (6/7) Mar 23 2010 how=0A>it differs from the iterator concept (if that's appropriate???) ...
- Fawzi Mohamed (22/22) Mar 23 2010 Andrei, as the topic just came up a comment on the range interface.
- Andrei Alexandrescu (13/34) Mar 23 2010 We've discussed this extensively, and I lost sleep over this simple
- Steven Schveighoffer (16/54) Mar 23 2010 A while back, you identified one of the best interfaces for input ranges...
- Andrei Alexandrescu (7/65) Mar 23 2010 I'd gladly reconsider E* getNext(), and I like it a lot, but that
- Fawzi Mohamed (15/51) Mar 23 2010 charset=US-ASCII;
- grauzone (2/10) Mar 23 2010 Nullable!(E) getNext(); ?
- Andrei Alexandrescu (3/13) Mar 23 2010 And if returning a reference...?
- grauzone (10/27) Mar 23 2010 Extend auto ref to template parameters:
- Fawzi Mohamed (15/28) Mar 23 2010 yes I agree that it is a difficult problem, the single function works
- Andrei Alexandrescu (4/33) Mar 23 2010 next(ref T) works well _only_ on streams. It works badly on containers.
- Steven Schveighoffer (20/43) Mar 23 2010 First, a range backed by getchar is about as useful as functional qsort ...
- Andrei Alexandrescu (3/28) Mar 23 2010 Actually I need one. Think fscanf, i.e. unformat() for streams.
- Andrei Alexandrescu (27/64) Mar 23 2010 I agree. But it's one thing to read and pass along, and a different
- Fawzi Mohamed (69/88) Mar 24 2010 I had a pushBack (I called that unget) in http://github.com/fawzi/blip/b...
- Andrei Alexandrescu (19/79) Mar 24 2010 Thanks for sharing your design with me. Yes, peek() is more flexible
- Fawzi Mohamed (76/183) Mar 24 2010 well for stdout by default I use locking to ensure writing writes
- Fawzi Mohamed (20/43) Mar 25 2010 charset=US-ASCII;
- Andrei Alexandrescu (8/22) Mar 25 2010 I see. So what you're saying is that maybe the entire idea of iterating
- Fawzi Mohamed (5/11) Mar 24 2010 I forgot to say, that one of the main pita with that approach is
- Fawzi Mohamed (12/23) Mar 24 2010 if one would have methods
- Fawzi Mohamed (12/23) Mar 24 2010 if one would have methods
- Neal Becker (2/15) Mar 27 2010 Wow, what a great email address!
Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystander
Mar 23 2010
clueless bystander wrote:Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderI'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
Lars T. Kyllingstad Wrote:clueless bystander wrote:Thanks Lars. I'm not quite willing to accept that 15 pages is succinct but if that's what it takes to build up the background then that's what it takes. I'm up to page 3 now and, btw, like the way the author *does not* mince his words: "Such matters as a polynomial slowdown were too mundane to hinder the power of S-lists, so some functional programmers got imbued with an attitude of contempt toward arrays and associative arrays, data structures essential to many algorithms." c.b.Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderI'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
Lars T. Kyllingstad Wrote:clueless bystander wrote:Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderI'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
clueless bystander wrote:Lars T. Kyllingstad Wrote:The reason ranges are not popular is because they haven't had time to become popular yet. The range concept itself is rather new, not much older than the article I referred you to, and AFAIK it's only been implemented in the D2 standard library. I'm sure there are several people on this forum that can give you a satisfactory (and succinct) answer. You may want to check back in a few hours, when activity picks up. FWIW, I don't think the concept of ranges is very hard to grasp. I just haven't used them that much, that's why I don't want to be the one to answer your question (and, like I said before, I have never used C++ iterators so I can't compare them either). -Larsclueless bystander wrote:Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderI'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
On 03/23/2010 06:58 AM, clueless bystander wrote:Lars T. Kyllingstad Wrote:Thank you for your interest. The article (http://erdani.com/publications/on-iteration.html) is long because following my keynote talk at BoostCon 2009, I've received a flurry of emails asking me to flesh out the ranges design in greater detail and to better motivate them. That article not only describes the design, but it also explains the historical artifacts that led to today's imperfect state of affairs, motivates defining ranges with categories, and gives examples. The price for such thoroughness is - sorry - size. If you are familiar with STL iterators and GoF-style iterators (with hasmore/get/advance), ranges are so simple, they define themselves: a range is a GoF-style iterator that recognizes the necessity of STL's iterator categories (input, forward, bidirectional, and random). All the rest is aftermath. Andreiclueless bystander wrote:Yes, well, thanks again. The first 7 pages seemed to have plausible arguments but the going get tough thereafter. Maybe the reason ranges are not popular is that they are hard to explain even though they might be simple and obvious in hindsight. Sigh, c.b.Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderI'm probably not the right person to answer your question, since I have virtually no experience with C++ iterators. Instead I'll just refer you to Andrei's own article on the subject: http://www.informit.com/articles/article.aspx?p=1407357 Please don't hesitate to ask again if it didn't clear things up for you. :) -Lars
Mar 23 2010
I think a good way to learn ranges is by looking at the interface to them, compared to those used by other languages. For one thing, C++ iterators are way more complicated than what you will find in many other languages; for example Java's iterator interface[1] is similar to a D range[2]. In both languages the interface is 3 functions (there are many types of ranges which require more, but only one kind of iterator). Java: bool hasNext() E next() void remove() D: bool empty() E front() void popFront() These look almost identical but the semantics are very different. For example, hasNext() requires a look-ahead, while empty() does not. This is important since you may not be able to look ahead in a range. next() performs the equivalent of calling front(); popFront(); And remove() has nothing to do with the iterator as it performs on the underlining collection. Removing the look-ahead is probably one of the biggest improvements over Java's iterator. 1. http://java.sun.com/j2se/1.5.0/docs/api/java/util/Iterator.html 2. http://digitalmars.com/d/2.0/phobos/std_range.html#isInputRange clueless bystander Wrote:Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystander
Mar 23 2010
Can someone please explain in plain words just exactly what a range is and=how=0A>it differs from the iterator concept (if that's appropriate???) and= what are the benefits=0A>from a data modeling and access perspective.=0A= =0ACheck Andrei's presentation "Iterators must go":=0Ahttp://www.slideshare= .net/rawwell/iteratorsmustgo=0A=0AI=A0am interested myself to know how rang= es=0Aare envisioned to be in D. Does the book talk=0Aabout ranges?=0A=0A=0A= =0A
Mar 23 2010
Andrei, as the topic just came up a comment on the range interface. Just for plain forward iterators iterators having bool empty() E front() void popFront() makes the interface non reentrant. For that purpose having a single function is better. I use bool popFront(ref T t) // returns true if there is a next element, and in that case returns it in t this can be used by several consumers concurrently without problems and creating filters, combiners,... is simple. Another advantage is that a single object can implement several iterators. A disadvantage is that even if there is a single iterator D makes type inference cumbersome, i.e. you cannot simply use auto, as in a loop you have to declare the variable before using it as the loop is T a; while (it.popFront(a)){ //... }
Mar 23 2010
On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:Andrei, as the topic just came up a comment on the range interface. Just for plain forward iterators iterators having bool empty() E front() void popFront() makes the interface non reentrant. For that purpose having a single function is better. I use bool popFront(ref T t) // returns true if there is a next element, and in that case returns it in t this can be used by several consumers concurrently without problems and creating filters, combiners,... is simple. Another advantage is that a single object can implement several iterators. A disadvantage is that even if there is a single iterator D makes type inference cumbersome, i.e. you cannot simply use auto, as in a loop you have to declare the variable before using it as the loop is T a; while (it.popFront(a)){ //... }We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next. Andrei
Mar 23 2010
On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing). -SteveAndrei, as the topic just came up a comment on the range interface. Just for plain forward iterators iterators having bool empty() E front() void popFront() makes the interface non reentrant. For that purpose having a single function is better. I use bool popFront(ref T t) // returns true if there is a next element, and in that case returns it in t this can be used by several consumers concurrently without problems and creating filters, combiners,... is simple. Another advantage is that a single object can implement several iterators. A disadvantage is that even if there is a single iterator D makes type inference cumbersome, i.e. you cannot simply use auto, as in a loop you have to declare the variable before using it as the loop is T a; while (it.popFront(a)){ //... }We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.
Mar 23 2010
On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it. AndreiOn 03/23/2010 02:45 PM, Fawzi Mohamed wrote:A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing).Andrei, as the topic just came up a comment on the range interface. Just for plain forward iterators iterators having bool empty() E front() void popFront() makes the interface non reentrant. For that purpose having a single function is better. I use bool popFront(ref T t) // returns true if there is a next element, and in that case returns it in t this can be used by several consumers concurrently without problems and creating filters, combiners,... is simple. Another advantage is that a single object can implement several iterators. A disadvantage is that even if there is a single iterator D makes type inference cumbersome, i.e. you cannot simply use auto, as in a loop you have to declare the variable before using it as the loop is T a; while (it.popFront(a)){ //... }We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements. The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.
Mar 23 2010
charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On 23-mar-10, at 21:51, Andrei Alexandrescu wrote:On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:yes that also makes filters/combiners really nice to write. the basic thing is that you have to return two things, 1. if there is more and 2. if yes the element.On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: [...] A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing).I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.E* getNext would probably also be workable, at the cost of storing one element. But then as andrei correctly points out one still cannot expect the pointer to be valid after one iteration, so as soon as you don't have memory storage you loose thread safety... Now reentrancy/thread safety is not necessarily the most important thing for iterators, but I like that my queues, sources of work can have the same interface.
Mar 23 2010
Steven Schveighoffer wrote:A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hintedNullable!(E) getNext(); ?
Mar 23 2010
On 03/23/2010 04:06 PM, grauzone wrote:Steven Schveighoffer wrote:And if returning a reference...? AndreiA while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hintedNullable!(E) getNext(); ?
Mar 23 2010
Andrei Alexandrescu wrote:On 03/23/2010 04:06 PM, grauzone wrote:Extend auto ref to template parameters: struct Nullable(auto ref T) { ... } T would be actually a reference type if and only if you could return a reference to the variable the template parameter was inferred from from a SafeD function. Basically, the compiler would know that references to T can be passed around freely. (SafeD allows ref returns under circumstances.) Not a solution I would prefer, but in the spirit of the design of D2 and SafeD in general.Steven Schveighoffer wrote:And if returning a reference...?A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hintedNullable!(E) getNext(); ?Andrei
Mar 23 2010
On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:[...] We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements.yes I agree that it is a difficult problem, the single function works well in the basic iterator case, but does not generalize well to modifiable values. In most cases I resorted to returning pointers. The templates that generate opApply (still D1.0 ;) from that is smart enough to remove the pointer when possible as the ref already gives that. Still not perfect, as always there are tradeoffs...The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.already better (for basic iterators), but still not reentrant, if another thread executes between empty and getNext... anyway any choice has some drawbacks... I like bool next(ref T) because it works well also for streams... and somehow (using T* or not depending on the type) can accommodate all iteration needs. Not perfectly nice, but workable. Fawzi
Mar 23 2010
On 03/23/2010 05:41 PM, Fawzi Mohamed wrote:On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:It can't :o).[...] We've discussed this extensively, and I lost sleep over this simple matter more than once. The main problem with bool popFront(ref E) is that it doesn't work meaningfully for containers that expose references to their elements.yes I agree that it is a difficult problem, the single function works well in the basic iterator case, but does not generalize well to modifiable values. In most cases I resorted to returning pointers. The templates that generate opApply (still D1.0 ;) from that is smart enough to remove the pointer when possible as the ref already gives that. Still not perfect, as always there are tradeoffs...The interface with front() leaves it to the range to return E or ref E. An alternative is this: bool empty(); ref E getNext(); // ref or no ref I'm thinking seriously of defining input ranges that way. The underlying notion is that you always move forward - getting an element is simultaneous with moving to the next.already better (for basic iterators), but still not reentrant, if another thread executes between empty and getNext...anyway any choice has some drawbacks... I like bool next(ref T) because it works well also for streams... and somehow (using T* or not depending on the type) can accommodate all iteration needs. Not perfectly nice, but workable.next(ref T) works well _only_ on streams. It works badly on containers. Andrei
Mar 23 2010
Andrei Alexandrescu Wrote:On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:First, a range backed by getchar is about as useful as functional qsort ;) Second, you *have* to read data into memory. Even with the ranges as they currently are, you have to read into memory. At least this is less awkward. Take for instance a line iterator. You have to read enough to see the line terminator, but you most likely do not read *exactly* to the line terminator, so you just read in chunks until you get a line, then return the pointer to the data. It works actually quite elegantly. Third, the memory could be supplied by the caller. For instance, if you wrote the function like this: E* getNext(E* buf = null); Then foreach could do something like this: foreach(e; streamrange) => E _e; while(auto e = streamrange.getNext(&_e)) To avoid heap allocation. Of course, heap allocation would be the default if buf is null. Tango does this sort of trick quite often, and it makes the I/O code extremely fast. Also, another thing to think about is we can generalize the return type to satisfying the condition: iff range is empty then cast(bool)range.getNext == false. This means as long as your range cannot return a null element for a non-empty return, it is OK not to use a pointer. For example, the line iterator again... it can be written like: const(char)[] getNext() because you will only ever return a null const(char)[] when there is no data left. I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams. -SteveA while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing).I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.
Mar 23 2010
On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:Andrei Alexandrescu Wrote:Actually I need one. Think fscanf, i.e. unformat() for streams. AndreiOn 03/23/2010 03:46 PM, Steven Schveighoffer wrote:First, a range backed by getchar is about as useful as functional qsort ;)A while back, you identified one of the best interfaces for input ranges: E* getNext(); Which allows for null returns when no data is left. The drawback is that E must be either referenced or allocated on the heap (providing storage to the function is an option). But the killer issue was that safeD would not allow it. However, in recent times, you have hinted that safeD may allow pointers, but disallow bad pointer operations. In light of this, can we reconsider this interface, or other alternatives using pointers? I've always felt that if we were to define ranges for streams in a non-awkward way, we would need an "all in one" operation, since not only does getting data from the range move the range, but checking for empty might also move the range (empty on a stream means you tried to read and got nothing).I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.
Mar 23 2010
On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:Andrei Alexandrescu Wrote:I agree. But it's one thing to read and pass along, and a different thing to read and keep in a buffer inside the range.I'd gladly reconsider E* getNext(), and I like it a lot, but that doesn't accommodate ranges that want to return rvalues without storing them (e.g. a range using getchar() as a back-end, and generally streams that don't correspond to stuff stored in memory). If it's not in memory, there's no pointer to it.Second, you *have* to read data into memory. Even with the ranges as they currently are, you have to read into memory. At least this is less awkward.Take for instance a line iterator. You have to read enough to see the line terminator, but you most likely do not read *exactly* to the line terminator, so you just read in chunks until you get a line, then return the pointer to the data. It works actually quite elegantly.I disagree about the elegance part. If the range arrogates the right to use its own buffering, then when you decide you're done with that range and try to read some more from the stream, you discover data has been lost. The Phobos file I/O functions all avoid doing any more buffering than the backing FILE* does. They achieve performance by locking the file once with flockfile/funlockfile and then using fgetc_unlocked(). This puts me in real trouble with the formatted reading functions (a la fscanf but generalized to all input ranges), which I'm gestating about. The problem with the current API is that if you call input.front(), it will call fgetc(). But then say I decide I'm done with the range, as is the case with e.g. reading an integer and stopping at the first non-digit. That non-digit character will be lost. So there's a need to say, hey, put this guy back because whoever reads after me will need to look at it. So I need a putBackFront() or something (which would call fungetc()). I wish things were simpler.Third, the memory could be supplied by the caller. For instance, if you wrote the function like this: E* getNext(E* buf = null); Then foreach could do something like this: foreach(e; streamrange) => E _e; while(auto e = streamrange.getNext(&_e)) To avoid heap allocation. Of course, heap allocation would be the default if buf is null. Tango does this sort of trick quite often, and it makes the I/O code extremely fast.The problem is that that speed doesn't translate very well to in-memory containers. For containers it's preferable to pass null so you get a pointer to the actual element; for streams it's preferable to not pass null. So it's difficult to write code that works well for both.Also, another thing to think about is we can generalize the return type to satisfying the condition: iff range is empty then cast(bool)range.getNext == false. This means as long as your range cannot return a null element for a non-empty return, it is OK not to use a pointer. For example, the line iterator again... it can be written like: const(char)[] getNext() because you will only ever return a null const(char)[] when there is no data left.I see, but if I'm looking for ints? I'll have to return a pointer - or a nullable or something.I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams.Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o). Andrei
Mar 23 2010
On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:The Phobos file I/O functions all avoid doing any more buffering than the backing FILE* does. They achieve performance by locking the file once with flockfile/funlockfile and then using fgetc_unlocked(). This puts me in real trouble with the formatted reading functions (a la fscanf but generalized to all input ranges), which I'm gestating about. The problem with the current API is that if you call input.front(), it will call fgetc(). But then say I decide I'm done with the range, as is the case with e.g. reading an integer and stopping at the first non-digit. That non-digit character will be lost. So there's a need to say, hey, put this guy back because whoever reads after me will need to look at it. So I need a putBackFront() or something (which would call fungetc()). I wish things were simpler.I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible. What I did is to base most parsing on CharReaders (for example the char based ones from BasicIO): {{{ /// extent of a slice of a buffer enum SliceExtent{ Partial, Maximal, ToEnd } /// a delegate that reads in from a character source alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate) CharReader; /// a handler of CharReader, returns true if something was read alias bool delegate(CharReader)CharReaderHandler; }}} a char reader reads from the given buffer buf, and can either request more (by returning EOF), or eat some characters out of it. If it sets iterate to true it wants to iterate with the eaten buffer (useful to for example skip undefined amount of whitespace that might overflow the buffer). Once you have that you can easily create a Peeker structure that wraps a CharReader, and exposes a CharReaded that tries to match it, but always eats 0 characters, even if the match was successful. With it you can have a peek method that returns true if the CharReader that you pass in matches, false if it does not match, and what you want if the buffer is too small to resolve the issue. Most of these things are templates that work for any type T. Then I built buffered types that using a size_t delegate(T[]) give a Reader based interface. All this is not based on single elements anymore, but on arrays (ranges? :), but I think that is what is needed for efficient i/o.On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el); mapped to opApply(int delegate(ref T x) loopBody); loop on a source of elements T: bool popFront(ref T el); mapped to opApply(int delegate(ref T x) loopBody); (well the ref there does not make much sense, but that is how opApply works to avoid the explosion of opApply). All it takes is a check for pointers in the templates, and dereference the type of opApply. also direct loops often look reasonable thanks to with D automatically dereferencing with ".": while (it.popFront(el)){ el.doSomething; } yes assigning stuff directly to el, and not its components you need a T* iterator and you have to write *el=... and x= *el but that is not so terrible. filter applied on an iterator it is just bool popNext(ref T el){ while (it.popNext(el)){ if (acceptable(el)){ return true; } } return false; } combiners of iterators are likewise quite simple to write.I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams.Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).
Mar 24 2010
On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:Thanks for sharing your design with me. Yes, peek() is more flexible than get/unget, but I'm under the stdio tyranny. In fact I just realized something - I could call setvbuf(_handle, null, _IONBF, 0) whenever I bind a File to a FILE*. That way File can do its own buffering and can implement peek() etc. I wonder if we need to worry about sharing, because e.g. several threads would want to write to stdout.The Phobos file I/O functions all avoid doing any more buffering than the backing FILE* does. They achieve performance by locking the file once with flockfile/funlockfile and then using fgetc_unlocked(). This puts me in real trouble with the formatted reading functions (a la fscanf but generalized to all input ranges), which I'm gestating about. The problem with the current API is that if you call input.front(), it will call fgetc(). But then say I decide I'm done with the range, as is the case with e.g. reading an integer and stopping at the first non-digit. That non-digit character will be lost. So there's a need to say, hey, put this guy back because whoever reads after me will need to look at it. So I need a putBackFront() or something (which would call fungetc()). I wish things were simpler.I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible.What I did is to base most parsing on CharReaders (for example the char based ones from BasicIO): {{{ /// extent of a slice of a buffer enum SliceExtent{ Partial, Maximal, ToEnd } /// a delegate that reads in from a character source alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate) CharReader; /// a handler of CharReader, returns true if something was read alias bool delegate(CharReader)CharReaderHandler; }}} a char reader reads from the given buffer buf, and can either request more (by returning EOF), or eat some characters out of it. If it sets iterate to true it wants to iterate with the eaten buffer (useful to for example skip undefined amount of whitespace that might overflow the buffer). Once you have that you can easily create a Peeker structure that wraps a CharReader, and exposes a CharReaded that tries to match it, but always eats 0 characters, even if the match was successful. With it you can have a peek method that returns true if the CharReader that you pass in matches, false if it does not match, and what you want if the buffer is too small to resolve the issue. Most of these things are templates that work for any type T.Wait, if you called it CharReader, how come it works with any type T? Or are you referring to T as the parsed type?Then I built buffered types that using a size_t delegate(T[]) give a Reader based interface. All this is not based on single elements anymore, but on arrays (ranges? :), but I think that is what is needed for efficient i/o.Sounds good, but I wonder why you use delegates instead of classes. Is that for simplicity? I confess it's not 100% clear to me how the delegates are supposed to be used in concert, particularly why there's a need for both CharReader and CharReaderHandler.So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly? AndreiOn 03/23/2010 09:12 PM, Steven Schveighoffer wrote:give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el);I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams.Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).
Mar 24 2010
On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:well for stdout by default I use locking to ensure writing writes chunks atomically. I would say that by default streams imply sequence, so can safely be non threadsafe. stdout, stderr and logging are exceptions, there at least chunks should be written atomically.On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:Thanks for sharing your design with me. Yes, peek() is more flexible than get/unget, but I'm under the stdio tyranny. In fact I just realized something - I could call setvbuf(_handle, null, _IONBF, 0) whenever I bind a File to a FILE*. That way File can do its own buffering and can implement peek() etc. I wonder if we need to worry about sharing, because e.g. several threads would want to write to stdout.The Phobos file I/O functions all avoid doing any more buffering than the backing FILE* does. They achieve performance by locking the file once with flockfile/funlockfile and then using fgetc_unlocked(). This puts me in real trouble with the formatted reading functions (a la fscanf but generalized to all input ranges), which I'm gestating about. The problem with the current API is that if you call input.front(), it will call fgetc(). But then say I decide I'm done with the range, as is the case with e.g. reading an integer and stopping at the first non-digit. That non-digit character will be lost. So there's a need to say, hey, put this guy back because whoever reads after me will need to look at it. So I need a putBackFront() or something (which would call fungetc()). I wish things were simpler.I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I recently removed that in favor of a peek function that I think is much more flexible.Well I presented the CharReader for simplicity, and that is indeed only for chars, but most things can be generalized, and indeed if you look at the Reader(T) interface in http://github.com/fawzi/blip/blob/master/blip/io/BasicIO.d or blip.text.TextParser or similar they are templated with a generic type T. For TextParser I was thinking T=char,wchar or dchar, whereas others cases are even more generic.What I did is to base most parsing on CharReaders (for example the char based ones from BasicIO): {{{ /// extent of a slice of a buffer enum SliceExtent{ Partial, Maximal, ToEnd } /// a delegate that reads in from a character source alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate) CharReader; /// a handler of CharReader, returns true if something was read alias bool delegate(CharReader)CharReaderHandler; }}} a char reader reads from the given buffer buf, and can either request more (by returning EOF), or eat some characters out of it. If it sets iterate to true it wants to iterate with the eaten buffer (useful to for example skip undefined amount of whitespace that might overflow the buffer). Once you have that you can easily create a Peeker structure that wraps a CharReader, and exposes a CharReaded that tries to match it, but always eats 0 characters, even if the match was successful. With it you can have a peek method that returns true if the CharReader that you pass in matches, false if it does not match, and what you want if the buffer is too small to resolve the issue. Most of these things are templates that work for any type T.Wait, if you called it CharReader, how come it works with any type T? Or are you referring to T as the parsed type?there are both, and both have their place. delegates are very simple and can be easily built on the fly, I like that very much, they reduce the code footprint of various things. More complex behaviour is better captured by classes, and indeed there are (also in BasicIO) the following interfaces: interface OutStreamI{ void rawWriteStr(char[]); void rawWriteStr(wchar[]); void rawWriteStr(dchar[]); void rawWrite(void[]); CharSink charSink(); BinSink binSink(); void flush(); void close(); } /// a reader of elements of type T interface Reader(T){ /// read some data into the given buffer size_t readSome(T[]); /// character reader handler bool handleReader(size_t delegate(T[], SliceExtent slice,out bool iterate) r); /// shutdown the input source void shutdownInput(); } /// one or more readers interface MultiReader{ enum Mode{ Binary=1, Char=2, Wchar=4, Dchar=8 } /// returns the modes this reader supports uint modes(); /// returns the native modes of this reader (less overhead) uint nativeModes(); Reader!(char) readerChar(); Reader!(wchar) readerWchar(); Reader!(dchar) readerDchar(); Reader!(void) readerBin(); void shutdownInput(); } there are classes that can create the more full fledged objects out of delegates.Then I built buffered types that using a size_t delegate(T[]) give a Reader based interface. All this is not based on single elements anymore, but on arrays (ranges? :), but I think that is what is needed for efficient i/o.Sounds good, but I wonder why you use delegates instead of classes. Is that for simplicity?I confess it's not 100% clear to me how the delegates are supposed to be used in concert, particularly why there's a need for both CharReader and CharReaderHandler.mainly one needs CharReader, which is a method that reads something. CharReaderHandler is there just for completeness, it is a delegate of a method that actually reads, but normally one simply uses that method, i.e. it uses a Reader!(T).handleReader method...well the foreach loop is the same, but the iteration loop is indeed different in the sense that one uses a pointer to an element and the other the element itself. one can write code that removes the pointer that is there (dereferencing it, or doing and inline function with subsequent call which allows you to reuse the same variable name): void myF(ref x){ // code } myF(*x); (that is a nice trick that I used several times). But yes there *is* a difference and the difference is that with arrays you might modify the element, modifying the stored value, whereas with streams you can't. This conceptual difference and if reflected in the interface. One can then discuss if immutable arrays should be iterated with immutable pointers or with values (i.e. copying) just as streams are.So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly?On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried. loop on a T[] array: bool popFront(ref T* el);I don't think we should give up on trying to make a stream range that is not awkward, I really dislike the way today's input ranges map to streams.Me too. Let's keep on looking, I have the feeling something good is right behind the corner. But then I felt that way for a year :o).Andrei
Mar 24 2010
charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On 25-mar-10, at 00:09, Fawzi Mohamed wrote:On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:thinking more about this, you are right something that returns a ref can be used exactly the same way as something that returns a value if one takes the value with auto val=returnRefOrVal; can be used as value exactly in the same way. Whereas something that returns a T or a T* , need an explicit conversion. That is easy to do, and one can even easily wrap the delegate in place with something that returns T instead of T*, but the conversion has to be explicit (before feeding it to the code), or explicitly tested for in the code. In practice I hadn't real problems due to this, but it is something that is uglier than ref return. On the other hand it is easier to know if you might modify the value that you received expecting to modify the underlying structure.[...] So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly?well the foreach loop is the same, but the iteration loop is indeed different in the sense that one uses a pointer to an element and the other the element itself. one can write code that removes the pointer that is there (dereferencing it, or doing and inline function with subsequent call which allows you to reuse the same variable name): void myF(ref x){ // code } myF(*x); (that is a nice trick that I used several times). But yes there *is* a difference and the difference is that with arrays you might modify the element, modifying the stored value, whereas with streams you can't. This conceptual difference and if reflected in the interface. One can then discuss if immutable arrays should be iterated with immutable pointers or with values (i.e. copying) just as streams are.
Mar 25 2010
On 03/25/2010 05:32 AM, Fawzi Mohamed wrote:thinking more about this, you are right something that returns a ref can be used exactly the same way as something that returns a value if one takes the value with auto val=returnRefOrVal; can be used as value exactly in the same way. Whereas something that returns a T or a T* , need an explicit conversion. That is easy to do, and one can even easily wrap the delegate in place with something that returns T instead of T*, but the conversion has to be explicit (before feeding it to the code), or explicitly tested for in the code. In practice I hadn't real problems due to this, but it is something that is uglier than ref return. On the other hand it is easier to know if you might modify the value that you received expecting to modify the underlying structure.I see. So what you're saying is that maybe the entire idea of iterating streams and arrays in a unified way may be problematic, because you can do different things with the elements of the two. While I partially agree with that, there are a lot of things that one can do the same way over a stream or a collection, and I wasn't able to find a way to do that that's reasonably efficient for both. Andrei
Mar 25 2010
On 24-mar-10, at 15:00, Fawzi Mohamed wrote:[...] give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried.I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
Mar 24 2010
On 24-mar-10, at 15:11, Fawzi Mohamed wrote:On 24-mar-10, at 15:00, Fawzi Mohamed wrote:if one would have methods bool f(ref T) as valid iterators then syntactic sugar replacing expr(f(auto a)); with static if(is(f S==function)){ S args; expr(f(args)); } else {static assert(0);} would be nice.[...] give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried.I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
Mar 24 2010
On 24-mar-10, at 15:11, Fawzi Mohamed wrote:On 24-mar-10, at 15:00, Fawzi Mohamed wrote:if one would have methods bool f(ref T) as valid iterators then syntactic sugar replacing expr(f(auto a)); with static if(is(f S==function)){ S args; expr(f(args)); } else {static assert(0);} would be nice.[...] give a try to bool popFront(ref T) ( or next, or another name, or even just a delegate with that signature) I was surprised how well it works, not perfect but better than the other alternatives I had tried.I forgot to say, that one of the main pita with that approach is having to declare the arguments before using them, but should you decide that it is indeed a good alternative I have no doubt that you could find a good syntactic sugar that Walter could implement... :)
Mar 24 2010
clueless bystander wrote:Watching D evolve from the outside there seems to be a lot of ongoing discussion on this newsgroup about the D range idiom which is somehow opposed to conventional thinking about iterators. Can someone please explain in plain words just exactly what a range is and how it differs from the iterator concept (if that's appropriate???) and what are the benefits from a data modeling and access perspective. Sure, I'm clueless, though suspect many other bystanders would appreciate a succinct heads-up. Thanks, clueless bystanderWow, what a great email address!
Mar 27 2010