digitalmars.D - Lots of low hanging fruit in Phobos
- Walter Bright (12/12) Mar 06 2014 A major goal for D in the short term is to reduce reliance in Phobos on ...
- Vladimir Panteleev (7/12) Mar 06 2014 Would be pretty neat if std.string and std.regex would work with
- Steven Schveighoffer (5/9) Mar 06 2014 Perhaps he means many of the functions only accept const(Char)[], where ...
- Dmitry Olshansky (9/20) Mar 06 2014 Exactly. I've been toying with idea of having generic notion of
- Peter Alexander (3/8) Mar 06 2014 The same way std.algorithm does: through equality predicates.
- Walter Bright (4/8) Mar 06 2014 Use 'static if' to detect the element type.
- Meta (7/22) Mar 06 2014 Seems like a good idea to reduce memory usage wherever possible,
- Walter Bright (11/16) Mar 06 2014 1. By using template specializations, algorithms can be custom optimized...
- H. S. Teoh (6/16) Mar 06 2014 [...]
- Adam D. Ruppe (11/12) Mar 06 2014 I think most the string functions should be transformative, like
- Walter Bright (23/36) Mar 06 2014 A great question. I tend to regard output ranges as suitable for a conta...
- H. S. Teoh (31/73) Mar 06 2014 After some thought, and also some recent experiences, I've come to the
- Walter Bright (3/4) Mar 06 2014 I know. But we need to make the effort, at least for Phobos. The payoff ...
- bearophile (6/8) Mar 06 2014 Some persons have quickly shot down this idea that I have
- Nick Sabalausky (22/34) Mar 06 2014 Yes, this has been #1 on my wishlist for ranges for some time. We really...
- Walter Bright (1/1) Mar 06 2014 These are good ideas, but for now we need to just bite the bullet and fi...
- Nick Sabalausky (2/4) Mar 06 2014 Agreed.
- Timon Gehr (2/6) Mar 07 2014 I think that the separation of front and popFront causes much of this.
- H. S. Teoh (12/20) Mar 07 2014 I'm on the fence about that one. On the one hand, some ranges that do
- Walter Bright (3/5) Mar 07 2014 I usually start writing an InputRange by pasting in a boilerplate versio...
- Adam D. Ruppe (40/42) Mar 07 2014 Eh, I don't think it is a big deal and would be fairly limited
- bearophile (12/33) Mar 07 2014 In generally this is rather bad, unless you are a C programmer
- bearophile (5/7) Mar 08 2014 The optimization for Python yield is present in F# too:
- bearophile (4/5) Mar 08 2014 I have not yet found the paper, but I think this was the topic:
- bearophile (7/8) Mar 08 2014 Here it is:
- Adam D. Ruppe (35/38) Mar 07 2014 I slapped this together to see if we can do it with a mixin
- bearophile (9/12) Mar 07 2014 In general a typed syntax could be:
- H. S. Teoh (34/52) Mar 07 2014 [...]
- Timon Gehr (5/31) Mar 08 2014 This does not do the same thing. It computes the first row even if it is...
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/7) Mar 09 2014 Please have a look at this pull request:
- Nick Sabalausky (35/74) Mar 08 2014 Oh it would definitely be for nothing more advanced than a forward
- Nick Sabalausky (5/11) Mar 08 2014 The troublesome thing about that is anyone who writes a generator in D
- Timon Gehr (4/23) Mar 08 2014 The drawback is that without the compiler being really clever, ranges
- H. S. Teoh (5/33) Mar 08 2014 We can achieve at least forward ranges with coroutines, if they're pure.
- Piotr Szturmaj (2/4) Mar 08 2014 https://github.com/pszturmaj/dgenerators
- Nick Sabalausky (19/23) Mar 08 2014 Yea, there is that approach, which is nice in certain ways. I did the
- w0rp (9/10) Mar 08 2014 I think that's what I would go for. yield with co-routines could
- bearophile (8/10) Mar 09 2014 Yet there's no need for that. You can have your pie and eat it
- Peter Alexander (4/14) Mar 09 2014 How does this handle recursive generators, e.g. like you would
- Tobias Pankrath (3/19) Mar 09 2014 The generators mentioned are stack-less. You'd need to maintain
- Peter Alexander (3/24) Mar 09 2014 That's unfortunate. I find that non-recursive generators are
- bearophile (106/109) Mar 09 2014 A Python 2.x test program:
- Jakob Ovrum (16/29) Mar 06 2014 While I prefer this approach most of the time, I fear it has a
- Walter Bright (11/26) Mar 06 2014 std.buffer.scopebuffer was developed for that purpose, and as the discus...
- Dmitry Olshansky (5/10) Mar 07 2014 Fun is most of std.algorithm is special-cased on strings. So we have
A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges. 2. The outputs should be a range, too. In fact, the string functions should become algorithms. Then they won't need to allocate any memory at all. The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions. I've found that rewriting traditional code, which is what std.string is now, in terms of algorithms is a bit mind-bending. But it's well worth it, and fun. So who wants to step up? Don't have to do the whole thing in one go, just pick a function and do that one.
Mar 06 2014
On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions.How does one handle case sensitivity for ranges of abstract types?I've found that rewriting traditional code, which is what std.string is now, in terms of algorithms is a bit mind-bending. But it's well worth it, and fun.Would be pretty neat if std.string and std.regex would work with char-like types which actually carry more data per character. That way, it'd be possible to do string/regex transforms (search & replace, etc.) but keep track where exactly each character came from.
Mar 06 2014
On Thu, 06 Mar 2014 16:30:46 -0500, Vladimir Panteleev <vladimir thecybershadow.net> wrote:On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:Perhaps he means many of the functions only accept const(Char)[], where Char can be templatized, but it really should accept any range. -SteveThe existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions.How does one handle case sensitivity for ranges of abstract types?
Mar 06 2014
07-Mar-2014 01:30, Vladimir Panteleev пишет:On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:+1The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions.How does one handle case sensitivity for ranges of abstract types?Exactly. I've been toying with idea of having generic notion of Alphabets for more then a year now. That would also be generalizing code unit / code point stuff of Unicode, into legacy encodings and beyond. One case I had in mind is the very limited (A, C, T, G) alphabet in bio-informatics. -- Dmitry OlshanskyI've found that rewriting traditional code, which is what std.string is now, in terms of algorithms is a bit mind-bending. But it's well worth it, and fun.Would be pretty neat if std.string and std.regex would work with char-like types which actually carry more data per character. That way, it'd be possible to do string/regex transforms (search & replace, etc.) but keep track where exactly each character came from.
Mar 06 2014
On Thursday, 6 March 2014 at 21:30:47 UTC, Vladimir Panteleev wrote:On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:The same way std.algorithm does: through equality predicates.The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions.How does one handle case sensitivity for ranges of abstract types?
Mar 06 2014
On 3/6/2014 1:30 PM, Vladimir Panteleev wrote:On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:Use 'static if' to detect the element type. And besides, I had meant that an algorithm should work on an InputRange of char's, and not be restricted to char[].The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions.How does one handle case sensitivity for ranges of abstract types?
Mar 06 2014
On Thursday, 6 March 2014 at 21:26:45 UTC, Walter Bright wrote:A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges. 2. The outputs should be a range, too. In fact, the string functions should become algorithms. Then they won't need to allocate any memory at all. The existing functions should not be removed, but perhaps rewritten as wrappers around the algorithm versions. I've found that rewriting traditional code, which is what std.string is now, in terms of algorithms is a bit mind-bending. But it's well worth it, and fun. So who wants to step up? Don't have to do the whole thing in one go, just pick a function and do that one.Seems like a good idea to reduce memory usage wherever possible, but I thought that the reason std.string exists (and duplicates some functionality that exists elsewhere) is to provide string-specific versions that were either optimized specifically for strings, or have completely different functionality due to working with strings.
Mar 06 2014
On 3/6/2014 1:40 PM, Meta wrote:Seems like a good idea to reduce memory usage wherever possible, but I thought that the reason std.string exists (and duplicates some functionality that exists elsewhere) is to provide string-specific versions that were either optimized specifically for strings, or have completely different functionality due to working with strings.1. By using template specializations, algorithms can be custom optimized for certain inputs. 2. I expect the developer of these algorithms to do performance profiling of them vs the originals, and address any problems. 3. I've done some similar algorithms, and was able to achieve performance parity, and sometimes even do better (because no memory needed to be allocated). std.string was one of the first Phobos modules written, was written by myself, and long predates notions of ranges and algorithms. It has evolved since then, but its fundamental nature has not changed. It's time for that to change to a modern, D style.
Mar 06 2014
On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges. 2. The outputs should be a range, too. In fact, the string functions should become algorithms. Then they won't need to allocate any memory at all.[...] What about using output ranges? T -- What do you call optometrist jokes? Vitreous humor.
Mar 06 2014
On Friday, 7 March 2014 at 00:44:45 UTC, H. S. Teoh wrote:What about using output ranges?I think most the string functions should be transformative, like std.algorithm.map, so they take an input range and return an input range. This lets them chain most easily, letting the user sink them into a particular range at the end. Though we could do a bit of magic to both take an output range and return an input range for it (which can also be backward-compatible, as we talked about in a thread a month or so ago), the most straightforward way is surely to treat it all like map.
Mar 06 2014
On 3/6/2014 4:43 PM, H. S. Teoh wrote:On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:A great question. I tend to regard output ranges as suitable for a container to expose. An algorithm reads an InputRange, and presents its output as another InputRange. This is so that algorithms can be easily chained together by the user, and the last algorithm in the chain would be std.algorithm.copy(OutputRange). For example, we could implement toStringz() as an algorithm that looks like: auto toStringz(S)(S s) { return chain(s, repeat(0, 1)) } and use it like this: string s = ...; char[] buffer = ...; s.toStringz.copy(buffer); Note how the algorithm version of toStringz does not allocate any memory. buffer would be the output range, but note how what it actually is is, and how its memory is allocated, is selected by the user. (std.buffer.scopebuffer is ideal for this sort of usage.) std.file is loaded with calls to toStringz(), each of which leaks memory quite unnecessarily. Using an algorithm version of toStringz(), along with scopebuffer, will neatly eliminated these leaks. You might be tempted to think that these are just little leaks, who cares. But I recently wrote a D program that did tens of thousands of file operations, and those little leaks turned into a raging river.A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges. 2. The outputs should be a range, too. In fact, the string functions should become algorithms. Then they won't need to allocate any memory at all.[...] What about using output ranges?
Mar 06 2014
On Thu, Mar 06, 2014 at 05:15:45PM -0800, Walter Bright wrote:On 3/6/2014 4:43 PM, H. S. Teoh wrote:After some thought, and also some recent experiences, I've come to the same conclusion. Output ranges are basically sinks, which means they are no good for further chaining (attempting to do so is an exercise in frustration). So pretty much everything that generates or transforms data should return an input range, and output ranges should only be used when you're going to stop processing the data at that point, e.g., save it into a buffer, write to a file, etc.. Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases). I wonder if a mid- to long-term goal for D would be to ease writing input ranges by moving some of the boilerplate into the language, or providing range builder facilities in Phobos. The most nagging part of writing input ranges in the current language is the inability to use foreach to generate the returned elements. (Well, you *can* do that to a buffer array and then return the array, but that kinda defeats the purpose of GC avoidance.) Some kind of built-in coroutine syntax with 'yield' would help things a LOT.On Thu, Mar 06, 2014 at 01:26:46PM -0800, Walter Bright wrote:A great question. I tend to regard output ranges as suitable for a container to expose. An algorithm reads an InputRange, and presents its output as another InputRange. This is so that algorithms can be easily chained together by the user, and the last algorithm in the chain would be std.algorithm.copy(OutputRange).A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges. 2. The outputs should be a range, too. In fact, the string functions should become algorithms. Then they won't need to allocate any memory at all.[...] What about using output ranges?For example, we could implement toStringz() as an algorithm that looks like: auto toStringz(S)(S s) { return chain(s, repeat(0, 1)) } and use it like this: string s = ...; char[] buffer = ...; s.toStringz.copy(buffer); Note how the algorithm version of toStringz does not allocate any memory. buffer would be the output range, but note how what it actually is is, and how its memory is allocated, is selected by the user.Yes, I like this. This is the right way to go. [...]You might be tempted to think that these are just little leaks, who cares. But I recently wrote a D program that did tens of thousands of file operations, and those little leaks turned into a raging river.Yeah, I'm currently working on a program that generates potentially huge numbers of n-dimensional coordinates, and currently the code does this using ranges that return array literals (ouch). This is pretty bad for performance and also for GC pressure, so I'm thinking to restructure the code some time in the near future to get rid of obligatory allocations, much like how you describe it above. T -- Customer support: the art of getting your clients to pay for your own incompetence.
Mar 06 2014
On 3/6/2014 5:37 PM, H. S. Teoh wrote:Unfortunately, input ranges are somewhat tedious to writeI know. But we need to make the effort, at least for Phobos. The payoff is it makes user code much less effort.
Mar 06 2014
H. S. Teoh:Some kind of built-in coroutine syntax with 'yield' would help things a LOT.Some persons have quickly shot down this idea that I have suggested: https://d.puremagic.com/issues/show_bug.cgi?id=11880 Bye, bearophile
Mar 06 2014
On 3/6/2014 8:37 PM, H. S. Teoh wrote:Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases). I wonder if a mid- to long-term goal for D would be to ease writing input ranges by moving some of the boilerplate into the language, or providing range builder facilities in Phobos. The most nagging part of writing input ranges in the current language is the inability to use foreach to generate the returned elements. (Well, you *can* do that to a buffer array and then return the array, but that kinda defeats the purpose of GC avoidance.) Some kind of built-in coroutine syntax with 'yield' would help things a LOT.need a way to make input ranges (maybe even forward ranges if we're automatically converted behind-the-scenes into stackless fibers (ie, into state machines). A while back, I tried doing a library solution for this via regular fibers, but there turned out to be a lot of overhead due to the fiber's context-switching [1]. We need to take inspiration from two things: C's "protothreads" IMO. In both cases, the body of a coroutine is automagically transformed into a big switch statement. Yield statements then become roughly "return tuple([yielded value], [resume point]); case [resume point]:...". And then invoking the coroutine again automatically passes in the last [resume point] returned so the big switch() can jump straight to where it left off. The main difference in our case would be that we'd also auto-generate all the boilerplate rigging for this to be an input range. [1] http://semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration [2] C's "protothreads" library: http://dunkels.com/adam/pt/expansion.html
Mar 06 2014
These are good ideas, but for now we need to just bite the bullet and fix Phobos.
Mar 06 2014
On 3/6/2014 10:59 PM, Walter Bright wrote:These are good ideas, but for now we need to just bite the bullet and fix Phobos.Agreed.
Mar 06 2014
On 03/07/2014 02:37 AM, H. S. Teoh wrote:Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases).I think that the separation of front and popFront causes much of this.
Mar 07 2014
On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:On 03/07/2014 02:37 AM, H. S. Teoh wrote:I'm on the fence about that one. On the one hand, some ranges that do complex calculations in .front are better off with .popAndReturnFront. But on the other hand, there are also algorithms for which you don't want to advance the range just to pick off its front element. Since the separated semantics can't be achieved with a non-separated range API, I lean slightly towards favoring the current split approach. Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal. T -- Bomb technician: If I'm running, try to keep up.Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases).I think that the separation of front and popFront causes much of this.
Mar 07 2014
On 3/7/2014 5:09 PM, H. S. Teoh wrote:Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal.I usually start writing an InputRange by pasting in a boilerplate version, then modifying it.
Mar 07 2014
On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:Having a way to auto-generate input range boilerplate, though, would be really, *really* nice.Eh, I don't think it is a big deal and would be fairly limited compared to the current setup. If you use a fiber or state variable or something for the yield this yield that trick, how do you go backward? Random access? I think the best the yield stuff can do is maybe forward range and maybe infinite (probably with an annotation of some sort, since otherwise, the infiniteness wouldn't be obvious at compile time). So the best we're looking to automate is input or perhaps forward ranges. And how hard are these really to write? yield query(string q) { auto result = c_query(toStringz(q)); while(!HasRow(result)) yield GetNextRow(result); } OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I went with some kind of C database api here is everything else I could think of are actually pretty short when using std.algorithm functions to help define them.) struct query { private Result result; this(string q) { result = c_query(toStringz(q)); if(!empty) popFront(); } Row front; property bool empty() { return HasRow(result); } void popFront() in { assert(!empty); } body { front = GetNextRow(result); } } It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities. Translating recursive iteration to a range does take a bit more, you need to track your local variables and put them in a stack of your own, but even that isn't too hard (though a bit wordier). I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.
Mar 07 2014
Adam D. Ruppe:struct query { private Result result; this(string q) { result = c_query(toStringz(q)); if(!empty) popFront(); } Row front; property bool empty() { return HasRow(result); } void popFront() in { assert(!empty); } body { front = GetNextRow(result); } } It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities.Your code is badly formatted.Translating recursive iteration to a range does take a bit more, you need to track your local variables and put them in a stack of your own, but even that isn't too hard (though a bit wordier).In generally this is rather bad, unless you are a C programmer used to use tweezers and needles to implement your own hash set every other day :-(I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.yield is limited, but in a large amount of cases it's enough, especially if it's well implemented (now Python 3 has yield that is usable for coroutines too, and it has recently added another optimization). For the other cases you can use normal range protocols as you have done. Bye, bearophile
Mar 07 2014
(now Python 3 has yield that is usable for coroutines too, and it has recently added another optimization).http://theburningmonk.com/2011/09/fsharp-yield-vs-yield/ Bye, bearophile
Mar 08 2014
I have not yet found the paper, but I think this was the topic: http://blogs.msdn.com/b/dsyme/archive/2007/09/22/some-details-on-f-computation-expressions-aka-monadic-or-workflow-syntax.aspx Bye, bearophile
Mar 08 2014
I have not yet found the paper,Here it is: http://tomasp.net/academic/papers/computation-zoo/ More on the same: http://en.wikibooks.org/wiki/F_Sharp_Programming/Computation_Expressions http://msdn.microsoft.com/en-us/library/dd233182.aspx Bye, bearophile
Mar 08 2014
On Saturday, 8 March 2014 at 02:15:19 UTC, Adam D. Ruppe wrote:I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.I slapped this together to see if we can do it with a mixin template: http://arsdnet.net/dcode/next.d struct ShortRange { int next() { switch(callNumber) { case 0: return 50; case 1: return 60; case 2: return 70; case 3: return finished(); default: assert(0); } } mixin MakeShortRange!(); } Wordier than yield ShortRange() { yield 50; yield 60; yield 70; } but the same idea. (In fact, i'm sure the whole yield thing should pretty much just rewrite into something like this anyway). My hypothetical from the previous post would be written: struct ShortRange { Result result; this(string s) { result = c_query(s); } Row next() { if(!HasRow(result)) return finished(); return GetNextRow(result); } mixin MakeShortRange!(); } Which is a bit shorter and perhaps nicer to use than the long form range.
Mar 07 2014
Adam D. Ruppe:Wordier than yield ShortRange() { yield 50; yield 60; yield 70; } but the same idea.In general a typed syntax could be: yield(int) ShortRange() { yield 50; yield 60; yield 70; } Or even: yield(auto) ShortRange() { yield 50; yield 60; yield 70; } So yield(auto) could be sugar for yield(auto): yield ShortRange() { yield 50; yield 60; yield 70; } Bye, bearophile
Mar 07 2014
On Sat, Mar 08, 2014 at 03:27:15AM +0000, bearophile wrote:Adam D. Ruppe:[...] Yield syntax is generally not a big deal with simple ranges, but once you start needing recursion and branching logic, things become significantly hairier without yield. For example: yield ComplicatedRange(int arg) { if (arg == 0) { foreach (i; 0 .. 10) yield i; } else if (arg >= 1 && arg <= 5) { yield 10; yield 20; yield 30; } else if (arg >= 6) { foreach (i; 10 .. 20) { if (i >= 5 && i < 9) { yield 40; yield 50; } else { yield i*2; } } } } With yield, the branching logic is clear and relatively easy to follow. Without yield, this will turn into a nasty mess of nested state variables, due to the different numbers of yielded items within nested conditional blocks. (Of course, you could argue that if things get this complicated, one should be splitting it up into multiple composed ranges instead, but not everything is readily decomposible.) T -- Bare foot: (n.) A device for locating thumb tacks on the floor.Wordier than yield ShortRange() { yield 50; yield 60; yield 70; } but the same idea.In general a typed syntax could be: yield(int) ShortRange() { yield 50; yield 60; yield 70; } Or even: yield(auto) ShortRange() { yield 50; yield 60; yield 70; } So yield(auto) could be sugar for yield(auto): yield ShortRange() { yield 50; yield 60; yield 70; }
Mar 07 2014
On 03/08/2014 03:15 AM, Adam D. Ruppe wrote:So the best we're looking to automate is input or perhaps forward ranges. And how hard are these really to write? yield query(string q) { auto result = c_query(toStringz(q)); while(!HasRow(result)) yield GetNextRow(result); } OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I went with some kind of C database api here is everything else I could think of are actually pretty short when using std.algorithm functions to help define them.) struct query { private Result result; this(string q) { result = c_query(toStringz(q)); if(!empty) popFront(); } Row front; property bool empty() { return HasRow(result); } void popFront() in { assert(!empty); } body { front = GetNextRow(result); } } It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities.This does not do the same thing. It computes the first row even if it is never requested. std.algorithm.filter suffers from the same problem. This is part of the reason why I think the separation of front and popFront is suboptimal design.
Mar 08 2014
On Saturday, 8 March 2014 at 09:25:38 UTC, Timon Gehr wrote:This does not do the same thing. It computes the first row even if it is never requested. std.algorithm.filter suffers from the same problem. This is part of the reason why I think the separation of front and popFront is suboptimal design.Please have a look at this pull request: https://github.com/D-Programming-Language/phobos/pull/1987
Mar 09 2014
On 3/7/2014 9:15 PM, Adam D. Ruppe wrote:On Saturday, 8 March 2014 at 01:10:38 UTC, H. S. Teoh wrote:Oh it would definitely be for nothing more advanced than a forward range. It's a coroutine, ie a generator function, so the elements are determined by execution flow. Since execution can't go in reverse or random-access, anything beyond forward range is ruled out. But you knew that :) Forward range should be possible though (at least for pure-ish coroutines anyway), since all you'd have to do is clone the coroutine's internal state structure (which it must already have anyway, since you can't have coroutines without it). While it couldn't be used for bidirectional or random-access ranges, it would still be a big help for generators - ie the whole point of coroutines in the first place. Other languages have coroutines - we'd have coroutines that can be used as ranges :)Having a way to auto-generate input range boilerplate, though, would be really, *really* nice.Eh, I don't think it is a big deal and would be fairly limited compared to the current setup. If you use a fiber or state variable or something for the yield this yield that trick, how do you go backward? Random access?I think the best the yield stuff can do is maybe forward range and maybe infinite (probably with an annotation of some sort, since otherwise, the infiniteness wouldn't be obvious at compile time). So the best we're looking to automate is input or perhaps forward ranges. And how hard are these really to write? yield query(string q) { auto result = c_query(toStringz(q)); while(!HasRow(result)) yield GetNextRow(result); } OK, that is kinda nice, but, is the status quo so bad? (BTW the reason I went with some kind of C database api here is everything else I could think of are actually pretty short when using std.algorithm functions to help define them.) struct query { private Result result; this(string q) { result = c_query(toStringz(q)); if(!empty) popFront(); } Row front; property bool empty() { return HasRow(result); } void popFront() in { assert(!empty); } body { front = GetNextRow(result); } } It is certainly a bit longer, but it isn't that bad, and is easily extended to other range capabilities. Translating recursive iteration to a range does take a bit more, you need to track your local variables and put them in a stack of your own, but even that isn't too hard (though a bit wordier). I guess the whole yield thing can be kinda nice, I'm just not sure it is that big of a win given its other limitations compared to full ranges.I dunno, what you wrote sounds to me like a pretty convincing argument in favor of coroutine ranges. ;) Keep in mind, making *all* ranges easier to write isn't the charter here, and it doesn't need to be: Bidirectional and random-access ranges ARE more complicated than generators, so I think they're already just about as easy to write as we *can* make them. Instead, the problem we're looking at solving here is exactly what you've described above: Making generators in D is more complicated and boilerplate-y than it really needs to be (unless you want to give up on ranges and use opApply - which still isn't particularly great without that mixin you suggested a couple years ago to cover up the return value ugliness). Generators may be a subset of ranges, but they're an important subset. generators. What really itches at me is that we're so tantalizingly close with opApply. The only real problem with it (aside from the return value system being kinda awkward compared to a "yield" statement) is that it can't be used as a range. And it can't be converted to a range without using Phobos fibers which imposes too much of an overhead to be a one-size-fits-all technique for generators.
Mar 08 2014
On 3/8/2014 5:56 AM, Nick Sabalausky wrote:What really itches at me is that we're so tantalizingly close with opApply. The only real problem with it (aside from the return value system being kinda awkward compared to a "yield" statement) is that it can't be used as a range. And it can't be converted to a range without using Phobos fibers which imposes too much of an overhead to be a one-size-fits-all technique for generators.The troublesome thing about that is anyone who writes a generator in D is forced to choose between [mostly] straightforward implementation logic (ie opApply) *or* range compatibility. I think we can do better than to force an ugly choice like that.
Mar 08 2014
On 03/08/2014 02:09 AM, H. S. Teoh wrote:On Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:I agree that there are trade-offs involved.On 03/07/2014 02:37 AM, H. S. Teoh wrote:I'm on the fence about that one. On the one hand, some ranges that do complex calculations in .front are better off with .popAndReturnFront. But on the other hand, there are also algorithms for which you don't want to advance the range just to pick off its front element. Since the separated semantics can't be achieved with a non-separated range API, I lean slightly towards favoring the current split approach. ...Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases).I think that the separation of front and popFront causes much of this.Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal. ...The drawback is that without the compiler being really clever, ranges thus defined will be just input ranges.
Mar 08 2014
On Sat, Mar 08, 2014 at 10:38:50AM +0100, Timon Gehr wrote:On 03/08/2014 02:09 AM, H. S. Teoh wrote:We can achieve at least forward ranges with coroutines, if they're pure. T -- Your inconsistency is the only consistent thing about you! -- KDOn Sat, Mar 08, 2014 at 01:14:29AM +0100, Timon Gehr wrote:I agree that there are trade-offs involved.On 03/07/2014 02:37 AM, H. S. Teoh wrote:I'm on the fence about that one. On the one hand, some ranges that do complex calculations in .front are better off with .popAndReturnFront. But on the other hand, there are also algorithms for which you don't want to advance the range just to pick off its front element. Since the separated semantics can't be achieved with a non-separated range API, I lean slightly towards favoring the current split approach. ...Unfortunately, input ranges are somewhat tedious to write -- nice foreach loops have to be broken up into .empty, .front, .popFront, which is a lot of boilerplate code and not so nice in inner loops (though I suppose the idea is to improve compiler inlining to handle these cases).I think that the separation of front and popFront causes much of this.Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal. ...The drawback is that without the compiler being really clever, ranges thus defined will be just input ranges.
Mar 08 2014
W dniu 2014-03-08 02:09, H. S. Teoh pisze:Having a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal.https://github.com/pszturmaj/dgenerators
Mar 08 2014
On 3/8/2014 5:58 PM, Piotr Szturmaj wrote:W dniu 2014-03-08 02:09, H. S. Teoh pisze:Yea, there is that approach, which is nice in certain ways. I did the same kinda thing a couple years ago[1], but your API appears much, much nicer. The unfortunate downside, though, is that as jerro demonstrated[2] there's a high overhead when using fibers for iteration. It likely wouldn't be an issue for some things, like IO, but for other things it could be a problem. So what we end up with is an unfortunate three-way choice anytime someone needs a generator in D: 1. Give up on range compatibility (opApply) 2. Give up on straightforward implementation (input/forward range) 3. Give up on maximum performance (fiber-based coroutine range) D's just dancing all around the ideal solution for generators without ever actually hitting the mark. Which I find frustrating because I know that's a target D's capable of nailing. [1] http://semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration [2] http://forum.dlang.org/thread/jno6o5$qtb$1 digitalmars.comHaving a way to auto-generate input range boilerplate, though, would be really, *really* nice. Coroutine-style code would be ideal.https://github.com/pszturmaj/dgenerators
Mar 08 2014
On Saturday, 8 March 2014 at 23:23:21 UTC, Nick Sabalausky wrote:3. Give up on maximum performance (fiber-based coroutine range)I think that's what I would go for. yield with co-routines could be a quick and dirty way to create an InputRange at some performance cost. Then writing the InputRange by hand would then be an optimisation. I think I tend to write things in such a way where I have room to optimise in such a way, but I almost always reach for the easier to write, less performant option first. (It's a bonus when the easier way is also the more performant way.)
Mar 08 2014
w0rp:Yet there's no need for that. You can have your pie and eat it too, with a small cost. D can rewrite code that contains a yield in an efficient finite state machine (this is what ShedSkin same). Bye, bearophile3. Give up on maximum performance (fiber-based coroutine range)I think that's what I would go for.
Mar 09 2014
On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:w0rp:How does this handle recursive generators, e.g. like you would see with a depth-first tree traversal? The "state" in this case is a call stack. How is the memory allocated for the stack?Yet there's no need for that. You can have your pie and eat it too, with a small cost. D can rewrite code that contains a yield in an efficient finite state machine (this is what does the same).3. Give up on maximum performance (fiber-based coroutine range)I think that's what I would go for.
Mar 09 2014
On Sunday, 9 March 2014 at 11:00:13 UTC, Peter Alexander wrote:On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:The generators mentioned are stack-less. You'd need to maintain the stack yourself.w0rp:How does this handle recursive generators, e.g. like you would see with a depth-first tree traversal? The "state" in this case is a call stack. How is the memory allocated for the stack?Yet there's no need for that. You can have your pie and eat it too, with a small cost. D can rewrite code that contains a yield in an efficient finite state machine (this is what does the same).3. Give up on maximum performance (fiber-based coroutine range)I think that's what I would go for.
Mar 09 2014
On Sunday, 9 March 2014 at 11:17:14 UTC, Tobias Pankrath wrote:On Sunday, 9 March 2014 at 11:00:13 UTC, Peter Alexander wrote:That's unfortunate. I find that non-recursive generators are pretty easy to write as ranges anyway :-/On Sunday, 9 March 2014 at 10:30:37 UTC, bearophile wrote:The generators mentioned are stack-less. You'd need to maintain the stack yourself.w0rp:How does this handle recursive generators, e.g. like you would see with a depth-first tree traversal? The "state" in this case is a call stack. How is the memory allocated for the stack?Yet there's no need for that. You can have your pie and eat it too, with a small cost. D can rewrite code that contains a yield in an efficient finite state machine (this is what compiler does the same).3. Give up on maximum performance (fiber-based coroutine range)I think that's what I would go for.
Mar 09 2014
Peter Alexander:How does this handle recursive generators, e.g. like you would see with a depth-first tree traversal? The "state" in this case is a call stack. How is the memory allocated for the stack?A Python 2.x test program: class Node: def __init__(self, data, left, right): self.data = data self.left = left self.right = right def in_order(node): if node is not None: for n in in_order(node.left): yield n yield node.data for n in in_order(node.right): yield n def main(): tree = Node(1, Node(2, Node(4, Node(7, None, None), None), Node(5, None, None)), Node(3, Node(6, Node(8, None, None), Node(9, None, None)), None)) for n in in_order(tree): print n, print main() It prints: 7 4 2 5 1 8 6 9 3 ShedSkin translates it to some C++ code, here I show only the in_order function: class __gen_in_order : public __iter<__ss_int> { public: Node *node; __iter<__ss_int> *__0, *__1, *__4, *__5; __ss_int __2, __6, n; __iter<__ss_int>::for_in_loop __3, __7; int __last_yield; __gen_in_order(Node *node) { this->node = node; __last_yield = -1; } __ss_int __get_next() { switch(__last_yield) { case 0: goto __after_yield_0; case 1: goto __after_yield_1; case 2: goto __after_yield_2; default: break; } if ((node!=NULL)) { FOR_IN(n,in_order(node->left),0,2,3) __last_yield = 0; __result = n; return __result; __after_yield_0:; END_FOR __last_yield = 1; __result = node->data; return __result; __after_yield_1:; FOR_IN(n,in_order(node->right),4,6,7) __last_yield = 2; __result = n; return __result; __after_yield_2:; END_FOR } __stop_iteration = true; } }; __iter<__ss_int> *in_order(Node *node) { return new __gen_in_order(node); } Where FOR_IN is a macro to some foreach-like code: #define FOR_IN(e, iter, temp, i, t) \ { \ Note this is not efficient, because it could generate O(n^2) code, but this is not fault of ShedSkin, as the original Python code has the same problem. They have recently (in Python 3.3) added to Python the syntax "yield from" that will allow to remove that performance problem (currently I think the performance is not improved): http://legacy.python.org/dev/peps/pep-0380/ Look especially at the section regarding optimization: http://legacy.python.org/dev/peps/pep-0380/#optimisations With this improvement the in_order function becomes: def in_order(node): if node is not None: yield from in_order(node.left) yield node.data yield from in_order(node.right) kinds of yield are "yield" and "yield!": http://theburningmonk.com/2011/09/fsharp-yield-vs-yield/ http://stackoverflow.com/questions/3500488/f-yield-operator-implementation-and-possible-c-sharp-equivalents Bye, bearophile
Mar 09 2014
On Friday, 7 March 2014 at 01:15:43 UTC, Walter Bright wrote:On 3/6/2014 4:43 PM, H. S. Teoh wrote:While I prefer this approach most of the time, I fear it has a performance cost over sink-style algorithms (whether they use an output range or build an array result). I guess the question is whether this cost is generally offset by gains in the memory allocation code or not.What about using output ranges?A great question. I tend to regard output ranges as suitable for a container to expose. An algorithm reads an InputRange, and presents its output as another InputRange. This is so that algorithms can be easily chained together by the user, and the last algorithm in the chain would be std.algorithm.copy(OutputRange).For example, we could implement toStringz() as an algorithm that looks like: auto toStringz(S)(S s) { return chain(s, repeat(0, 1)) }std.range.only is more suitable than `repeat` here (and I don't know if `chain` would let you chain a range of characters with a range of integers): auto toStringz(S)(S s) if (is(Unqual!(ElementType!S) == dchar)) { return s.chain('\0'.only()); } Either way, perhaps the most serious problem is that when using `copy` to write the result to an array, both UTF decoding and re-encoding takes place (the result is always a range of `dchar`).
Mar 06 2014
On 3/6/2014 5:54 PM, Jakob Ovrum wrote:While I prefer this approach most of the time, I fear it has a performance cost over sink-style algorithms (whether they use an output range or build an array result). I guess the question is whether this cost is generally offset by gains in the memory allocation code or not.std.buffer.scopebuffer was developed for that purpose, and as the discussion on why it is the way it is shows, it was developed with extensive use of a profiler to ensure there was no speed compromise from using it.Yes, I missed that. I'm still learning about the proper way to use ranges.For example, we could implement toStringz() as an algorithm that looks like: auto toStringz(S)(S s) { return chain(s, repeat(0, 1)) }std.range.only is more suitable than `repeat` here(and I don't know if `chain` would let you chain a range of characters with a range of integers):My code fragment is untested and pretty much off the cuff. A real version would be a bit more complex, as you suggest.Either way, perhaps the most serious problem is that when using `copy` to write the result to an array, both UTF decoding and re-encoding takes place (the result is always a range of `dchar`).I know. I struggled with that myself, and I object to the design decision that auto-decodes and auto-encodes any ranges that deal with char's. It automatically turns fast code into molasses. I wound up casting everything to ubytes, ugh. But it's too late to change that now, sigh.
Mar 06 2014
07-Mar-2014 01:26, Walter Bright пишет:A major goal for D in the short term is to reduce reliance in Phobos on the GC. I was looking at std.string last night, and I noticed a couple things: 1. The inputs are constrained to being strings. This is overly restrictive, the inputs should be InputRanges.Fun is most of std.algorithm is special-cased on strings. So we have double volume of crap. -- Dmitry Olshansky
Mar 07 2014