digitalmars.D - Yet another strike against the current AA implementation
- dsimcha (23/23) Apr 26 2009 So I tried to create a struct to allow iteration over the keys or values...
- Michel Fortin (8/16) Apr 26 2009 Hum, are you saying opApply superior when it comes to iterating in a
- dsimcha (12/24) Apr 26 2009 Exactly. On the other hand, you lose a lot of flexibility with opApply....
- Michel Fortin (10/30) Apr 26 2009 Indeed. I certainly agree that both ranges and opApply have their place.
- dsimcha (28/33) Apr 26 2009 You probably have a point. If the range interface is "good enough" for ...
- Andrei Alexandrescu (4/36) Apr 26 2009 I think, all other things being equal, that opApply tends to be slower
- Steven Schveighoffer (15/51) Apr 27 2009 It depends. The range must have either inline-able functions, or must N...
- Andrei Alexandrescu (6/25) Apr 27 2009 This is a very good point. If a range offers virtual
- Andrei Alexandrescu (7/33) Apr 26 2009 1. Depth of iteration is low, so a short vector optimization will most
- Christopher Wright (9/12) Apr 26 2009 When you know the type beforehand or can use templates, that is, rather
- Andrei Alexandrescu (5/19) Apr 26 2009 Yah, that's a good motivation to change how hashes are currently
- dsimcha (64/76) Apr 26 2009 Rule number 1 of all performance discussions: Always measure if possibl...
- Robert Fraser (4/95) Apr 26 2009 Did you compile that with optimization/inlining? If not, it's not really...
- dsimcha (3/6) Apr 27 2009 Yes, -O -inline -release. I didn't bother stating that explicitly b/c i...
- Michel Fortin (11/15) Apr 27 2009 Nice.
- Daniel Keep (6/20) Apr 27 2009 Inlining does not automatically make things faster. Case in point:
- bearophile (5/12) Apr 27 2009 That's true in general: if you inline too much code you may end up with ...
- Frits van Bommel (8/23) Apr 27 2009 For non-virtual opApply this is exactly what LDC does[1] at the higher
- bearophile (4/7) Apr 27 2009 When you find a bit of free time please time it and show the timings :-)
- bearophile (57/58) Apr 27 2009 Very good, thank you. I have run your code, with a bit of changes:
- Andrei Alexandrescu (10/22) Apr 27 2009 Thanks for the numbers. The way I'd read them is not to look at the
- Frits van Bommel (19/32) Apr 27 2009 I edited this code to work with ldc (D1) + Tango, and saw the Direct and...
- Michel Fortin (12/16) Apr 28 2009 Thank you for your timings. I think it shows my point: that by
- Daniel Keep (7/21) Apr 28 2009 Not true. Here's an excellent reason to use ranges over opApply: you
- Michel Fortin (15/37) Apr 28 2009 I guess I removed too much context from the above posts. We're just
- downs (2/27) Apr 28 2009
- Steven Schveighoffer (4/26) Apr 28 2009 read: less overhead than full threads, not less overhead than ranges ;)
- grauzone (5/24) Apr 28 2009 First, don't they have LOTS of memory overhead, because each fiber needs...
- Joel C. Salomon (5/17) Apr 28 2009 A fiber-specific stack needn’t be very large. A few KB is often enough
- downs (2/19) Apr 28 2009 Furthermore, using anonymous mmapped files the memory is only allocated ...
- Georg Wrede (25/40) Apr 29 2009 The Commodore-64 had a stack of 0.25kB shared between your Basic
- Andrei Alexandrescu (8/41) Apr 29 2009 Since C64 programs have gotten a lot bigger, programming style favors
- Georg Wrede (16/58) Apr 29 2009 The old size estimation problem. No matter how big you decide on,
- Joel C. Salomon (6/17) Apr 29 2009 How much rewriting of the stack would have to happen? Relocating the
- Andrei Alexandrescu (4/5) Apr 29 2009 In a way it is trivial... if you accept a 7% added overhead per function...
- bearophile (11/12) Apr 29 2009 Now Delphi and FreePascal allow to switch that on and off. From the page...
- Daniel Keep (3/19) Apr 29 2009 Doesn't the OS protect you from stack overflows?
- Leandro Lucarella (8/13) Apr 29 2009 Using page protection have no overhead, unless you actually hit a stack
- Andrei Alexandrescu (17/38) Apr 29 2009 You can't be more serious than Mark Twain when giving advice on the
- Sean Kelly (9/46) Apr 29 2009 You could have the GC report stack sizes when it performs a collection.
- Georg Wrede (33/72) Apr 29 2009 Two methods come to mind.(1)
- Andrei Alexandrescu (11/36) Apr 29 2009 I'd be amazed if hello.d were your prototype of a useful program :o).
- Georg Wrede (13/49) Apr 29 2009 Somebody more resourceful than I might /prove/ some maximum for any
- Andrei Alexandrescu (12/17) Apr 29 2009 There's no need for a lot of expertise, it's a trivial fact for anyone
- Don (5/27) Apr 29 2009 I think in the first instance it was at least to get an idea of how much...
- Georg Wrede (9/63) Apr 30 2009 Having looked into things, it turns out I'm the one that now suspects
- Andrei Alexandrescu (10/25) Apr 30 2009 What I referred to was inferring a thread's actual stack requirements
- Georg Wrede (48/75) Apr 30 2009 Could it be that you're looking at this too much through the eyes of a
- Andrei Alexandrescu (22/111) Apr 30 2009 I am sorry, I need to fix a few mistakes in this post and then I'll have...
- Georg Wrede (2/4) Apr 30 2009 Sigh.
- bearophile (5/8) Apr 30 2009 See also sys.setrecursionlimit() to lift that limit where useful. You ca...
- Georg Wrede (3/18) Apr 30 2009 Python seems a nice language. It's a shame that it's a bit slow. Heh,
- Sean Kelly (9/19) Apr 29 2009 One thing I don't understand is how the OS can manage essentially
- Georg Wrede (4/24) Apr 29 2009 Well, unbounded and unbounded. It just means there's no set value. The
- Walter Bright (7/11) Apr 29 2009 No way in hell is there a reliable way for a compiler to estimate stack
- bearophile (35/47) Apr 28 2009 I'll try to zip two ranges that return the leaves of two different binar...
- dsimcha (14/45) Apr 28 2009 This can be a simple example to show to attract people to D2 language :-...
- bearophile (10/13) Apr 28 2009 Yes, sorry, I have found it few seconds after sending the post. I am dum...
- Steve Teale (9/35) Apr 26 2009 Ranges, ranges! That's all I hear these days, and it looks to me like th...
- dsimcha (18/53) Apr 26 2009 is implemented - like x ~= c to add a character to the end of an array -
- Leandro Lucarella (11/15) Apr 26 2009 A PEP[1]-like process would be great.
- Andrei Alexandrescu (17/21) Apr 26 2009 There was. Incidentally, it was called "RFC on range design for D2".
- Georg Wrede (27/57) Apr 27 2009 There's an illusion. And that illusion comes from the D newsgroups
- Derek Parnell (12/19) Apr 27 2009 Really?! I still find them very confusing. Maybe its just the formatting
- Andrei Alexandrescu (3/10) Apr 27 2009 Suggestions are always welcome.
- bearophile (5/7) Apr 27 2009 I find templates in D1 easy to use in most situations. And template cons...
- Andrei Alexandrescu (4/12) Apr 27 2009 Yes, well put. I think it would be great to define a digitalmars.d2 or
- Simen Kjaeraas (6/16) Apr 27 2009 And what would we do the day D987 came out? Start another group called
- Christopher Wright (2/22) Apr 27 2009 How about digitalmars.dnext?
- Denis Koroskin (2/21) Apr 27 2009 Maybe digitalmars.future? :)
So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range. This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage. Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature. The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees. I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions. However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose: 1. Every element is now (void*).sizeof bytes larger because you have to store a left and right pointer. 2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.) Both of these are important in the more common, general case of sane hashing performance.
Apr 26 2009
On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.)Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 26 2009
== Quote from Michel Fortin (michel.fortin michelf.com)'s articleOn 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:Exactly. On the other hand, you lose a lot of flexibility with opApply. If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far. IMHO, though, opApply should *not* be deprecated. opApply and ranges attempt to solve similar problems, but not the same problem. Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object. opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object. In some cases, like the one you just described, the latter can be better. However, it might make sense to document and give examples of this tradeoff somewhere.2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.)Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.
Apr 26 2009
On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:== Quote from Michel Fortin (michel.fortin michelf.com)'s articleIndeed. I certainly agree that both ranges and opApply have their place. So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse. -- Michel Fortin michel.fortin michelf.com http://michelf.com/Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.Exactly. On the other hand, you lose a lot of flexibility with opApply. If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far. IMHO, though, opApply should *not* be deprecated. opApply and ranges attempt to solve similar problems, but not the same problem. Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object. opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object. In some cases, like the one you just described, the latter can be better.
Apr 26 2009
== Quote from Michel Fortin (michel.fortin michelf.com)'s articleIndeed. I certainly agree that both ranges and opApply have their place. So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse.You probably have a point. If the range interface is "good enough" for the implementer of the object to write an optimal implementation without defining opApply, then the implementer will probably not define opApply. On the other hand, it could make sense to define a range interface for when the user of the object needs that flexibility and an opApply interface that is more efficient internally in the same object. What I think it really boils down to, and what should be emphasized in the documentation, is control of the stack. If the implementer of the object needs control of the stack during iteration, use opApply to get this control. Otherwise, use ranges to allow the user more control over the iteration process. However, what I care more about, upon thinking more about it, is that the concept of an iterable gets defined "officially" in Phobos and in the documentation. An iterable is any object of type T such that the following code works: foreach(elem; T.init) { // do stuff. } This can be considered a superset of input ranges, since all input ranges are iterables but not all iterables are input ranges. Of course, stuff that works like: foreach(key, value; T.init) { // do stuff. } uses opApply but would not be considered an iterable because it iterates more than one item. Therefore, strictly speaking, the model is broken, but these kinds of oddball situations are few enough and far enough between that I think they can be ignored, at least until the issue of making ranges do this kind of thing too is addressed. The idea, then, is that if all you need is an iterable for some generic function in Phobos or in any 3rd party library, make the constraint be that it only requires an iterable. To encourage this, an isIterable template should be included in Phobos, and std.range.ElementType should be generalized to return the element type for all iterables, not just ranges.
Apr 26 2009
Michel Fortin wrote:On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:I think, all other things being equal, that opApply tends to be slower than iterators. Andrei== Quote from Michel Fortin (michel.fortin michelf.com)'s articleIndeed. I certainly agree that both ranges and opApply have their place. So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse.Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.Exactly. On the other hand, you lose a lot of flexibility with opApply. If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far. IMHO, though, opApply should *not* be deprecated. opApply and ranges attempt to solve similar problems, but not the same problem. Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object. opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object. In some cases, like the one you just described, the latter can be better.
Apr 26 2009
On Sun, 26 Apr 2009 21:20:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Michel Fortin wrote:It depends. The range must have either inline-able functions, or must NOT call virtual functions. Otherwise, it could be worse. there is also the factor that you are using an inner function with opApply, so if you regularly access variables outside the foreach loop, then those add up. Observe that in opapply style, you do one delegate call per loop iteration. With range style, you do three calls on the range per loop iteration (empty, front, popFront). If those range calls call virtual calls or ARE virtual calls, then range loses. Of course, the difference might be washed if you do lots of accesses to an outer variable in your foreach loop, which require a pointer dereference vs. a stack dereference. We should quantify this and see what the actual performance is. -SteveOn 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:I think, all other things being equal, that opApply tends to be slower than iterators.== Quote from Michel Fortin (michel.fortin michelf.com)'s articleIndeed. I certainly agree that both ranges and opApply have their place. So what the implementer can do with opApply is write an optimized iteration algorithm for use with foreach. Which may mean that when both opApply and ranges are available for generating foreach's code, the compiler should favor opApply. Currently, I believe it's the reverse.Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.Exactly. On the other hand, you lose a lot of flexibility with opApply. If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far. IMHO, though, opApply should *not* be deprecated. opApply and ranges attempt to solve similar problems, but not the same problem. Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object. opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object. In some cases, like the one you just described, the latter can be better.
Apr 27 2009
Steven Schveighoffer wrote:On Sun, 26 Apr 2009 21:20:32 -0400, Andrei AlexandrescuThis is a very good point. If a range offers virtual empty/popFront/front, then opApply is a faster proposition. This makes for a good use case for making opApply the preferred way of iteration, if present (if you just do foreach and opApply is present, use it). AndreiI think, all other things being equal, that opApply tends to be slower than iterators.It depends. The range must have either inline-able functions, or must NOT call virtual functions. Otherwise, it could be worse. there is also the factor that you are using an inner function with opApply, so if you regularly access variables outside the foreach loop, then those add up. Observe that in opapply style, you do one delegate call per loop iteration. With range style, you do three calls on the range per loop iteration (empty, front, popFront). If those range calls call virtual calls or ARE virtual calls, then range loses. Of course, the difference might be washed if you do lots of accesses to an outer variable in your foreach loop, which require a pointer dereference vs. a stack dereference. We should quantify this and see what the actual performance is.
Apr 27 2009
dsimcha wrote:== Quote from Michel Fortin (michel.fortin michelf.com)'s article1. Depth of iteration is low, so a short vector optimization will most of the time solve the problem without allocating extra memory. 2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator. AndreiOn 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:Exactly. On the other hand, you lose a lot of flexibility with opApply. If you tried to port most of std.range to operate on the opApply interface instead fo the forward range interface, I doubt you'd get very far. IMHO, though, opApply should *not* be deprecated. opApply and ranges attempt to solve similar problems, but not the same problem. Ranges attempt to solve the problem of iterating over some object with maximum flexibility from the point of view of the user of the object. opApply attempts to solve the problem of iterating with maximum flexibility from the point of view of the implementer of the object. In some cases, like the one you just described, the latter can be better. However, it might make sense to document and give examples of this tradeoff somewhere.2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.)Hum, are you saying opApply superior when it comes to iterating in a tree? It seems that with opApply you could use the stack by recursively calling opApply with the given delegate on each one of the branches.
Apr 26 2009
Andrei Alexandrescu wrote:2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator.When you know the type beforehand or can use templates, that is, rather than wrapping your range struct in a wrapper class. If you can't use a template for whatever reason, ranges are going to suck -- three virtual calls rather than one. I don't usually care sufficiently about performance to worry about whether a call is virtual or not, but you brought that issue up before. And I imagine that, most of the time, you will know the range type in advance.
Apr 26 2009
Christopher Wright wrote:Andrei Alexandrescu wrote:Yah, that's a good motivation to change how hashes are currently handled. But above I was referring to the cost of opApply's callback, which adds to the costs you mention. Andrei2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator.When you know the type beforehand or can use templates, that is, rather than wrapping your range struct in a wrapper class. If you can't use a template for whatever reason, ranges are going to suck -- three virtual calls rather than one. I don't usually care sufficiently about performance to worry about whether a call is virtual or not, but you brought that issue up before. And I imagine that, most of the time, you will know the range type in advance.
Apr 26 2009
== Quote from Christopher Wright (dhasenan gmail.com)'s articleAndrei Alexandrescu wrote:Rule number 1 of all performance discussions: Always measure if possible. import std.stdio, std.perf; struct Direct { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= 1_000_000_000; } } struct Apply { int opApply(int delegate(ref uint) dg) { int res = 0; foreach(i; 0U..1_000_000_000) { res = dg(i); if(res) break; } return res; } } class Virtual { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= 1_000_000_000; } } void main() { scope pc = new PerformanceCounter; pc.start; foreach(elem; Direct.init) {} pc.stop; writeln("Direct: ", pc.milliseconds); pc.start; auto v = new Virtual; foreach(elem; v) {} pc.stop; writeln("Virtual: ", pc.milliseconds); pc.start; foreach(elem; Apply.init) {} pc.stop; writeln("opApply: ", pc.milliseconds); } Output: Direct: 2343 Virtual: 5695 opApply: 3014 Bottom line is that these pretty much come in the order you would expect them to, but there are no particularly drastic differences between any of them. To put these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have code that is performance critical *and* where the contents of the loop don't take long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you need the iteration to be polymorphic?2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator.When you know the type beforehand or can use templates, that is, rather than wrapping your range struct in a wrapper class. If you can't use a template for whatever reason, ranges are going to suck -- three virtual calls rather than one. I don't usually care sufficiently about performance to worry about whether a call is virtual or not, but you brought that issue up before. And I imagine that, most of the time, you will know the range type in advance.
Apr 26 2009
dsimcha wrote:== Quote from Christopher Wright (dhasenan gmail.com)'s articleDid you compile that with optimization/inlining? If not, it's not really a fair test... and if so why isn't the compiler getting rid of those pointless loops?Andrei Alexandrescu wrote:Rule number 1 of all performance discussions: Always measure if possible. import std.stdio, std.perf; struct Direct { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= 1_000_000_000; } } struct Apply { int opApply(int delegate(ref uint) dg) { int res = 0; foreach(i; 0U..1_000_000_000) { res = dg(i); if(res) break; } return res; } } class Virtual { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= 1_000_000_000; } } void main() { scope pc = new PerformanceCounter; pc.start; foreach(elem; Direct.init) {} pc.stop; writeln("Direct: ", pc.milliseconds); pc.start; auto v = new Virtual; foreach(elem; v) {} pc.stop; writeln("Virtual: ", pc.milliseconds); pc.start; foreach(elem; Apply.init) {} pc.stop; writeln("opApply: ", pc.milliseconds); } Output: Direct: 2343 Virtual: 5695 opApply: 3014 Bottom line is that these pretty much come in the order you would expect them to, but there are no particularly drastic differences between any of them. To put these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have code that is performance critical *and* where the contents of the loop don't take long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you need the iteration to be polymorphic?2. I haven't measured, but the cost of the indirect call is large enough to make me suspect that opApply is not as efficient as it's cracked to be, even when compared with an iterator.When you know the type beforehand or can use templates, that is, rather than wrapping your range struct in a wrapper class. If you can't use a template for whatever reason, ranges are going to suck -- three virtual calls rather than one. I don't usually care sufficiently about performance to worry about whether a call is virtual or not, but you brought that issue up before. And I imagine that, most of the time, you will know the range type in advance.
Apr 26 2009
== Quote from Robert Fraser (fraserofthenight gmail.com)'s articleDid you compile that with optimization/inlining? If not, it's not really a fair test... and if so why isn't the compiler getting rid of those pointless loops?Yes, -O -inline -release. I didn't bother stating that explicitly b/c it's the standard way of doing benchmarks.
Apr 27 2009
On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:Output: Direct: 2343 Virtual: 5695 opApply: 3014Nice. Isn't there room for optimisation on the compiler side though? I mean, the compiler could inline opApply, and while doing so it could notice that the delegate is constant and inline it too. I suspect that adding this optimisation would put opApply at the same performance level than ranges. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 27 2009
Michel Fortin wrote:On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:Inlining does not automatically make things faster. Case in point: downs' raytracer stacy actually slows down when you inline certain parts. The penalty of not fitting in cache was overwhelming the penalty from using virtual methods. -- DanielOutput: Direct: 2343 Virtual: 5695 opApply: 3014Nice. Isn't there room for optimisation on the compiler side though? I mean, the compiler could inline opApply, and while doing so it could notice that the delegate is constant and inline it too. I suspect that adding this optimisation would put opApply at the same performance level than ranges.
Apr 27 2009
Michel Fortin:Daniel Keep:Isn't there room for optimisation on the compiler side though? I mean, the compiler could inline opApply, and while doing so it could notice that the delegate is constant and inline it too. I suspect that adding this optimisation would put opApply at the same performance level than ranges.Inlining does not automatically make things faster.That's true in general: if you inline too much code you may end up with too much code to be moved inside and out of the caches, reducing the performance. I agree with Michel, if done wisely by the compiler some inlining may help with such loops. Bye, bearophile
Apr 27 2009
Michel Fortin wrote:On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:For non-virtual opApply this is exactly what LDC does[1] at the higher optimization levels :). (Assuming both opApply and the foreach body are deemed inline-worthy by the inliner) [1]: Since trunk r1219, which was about 11 days ago.Output: Direct: 2343 Virtual: 5695 opApply: 3014Nice. Isn't there room for optimisation on the compiler side though? I mean, the compiler could inline opApply, and while doing so it could notice that the delegate is constant and inline it too.I suspect that adding this optimisation would put opApply at the same performance level than ranges.I haven't actually measured to see if this is true, but there should indeed be very little difference since this optimization essentially turns opApply into a regular loop (and LDC does this before any loop optimizations run).
Apr 27 2009
Frits van Bommel:I haven't actually measured to see if this is true, but there should indeed be very little difference since this optimization essentially turns opApply into a regular loop (and LDC does this before any loop optimizations run).When you find a bit of free time please time it and show the timings :-) Bye, bearophile
Apr 27 2009
dsimcha:Rule number 1 of all performance discussions: Always measure if possible.Very good, thank you. I have run your code, with a bit of changes: import std.stdio: writeln; import std.perf: PerformanceCounter; enum uint N = 1_000_000_000; struct Direct { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= N; } } struct Apply { int opApply(int delegate(ref uint) dg) { int res; foreach (i; 0U .. N) { res = dg(i); if(res) break; } return res; } } class Virtual { uint n; uint front() { return n; } void popFront() { n++; } bool empty() { return n >= N; } } void main() { scope pc = new PerformanceCounter; pc.start; foreach (elem; Direct.init) {} pc.stop; writeln("Direct: ", pc.milliseconds); pc.start; foreach (elem; Apply.init) {} pc.stop; writeln("opApply: ", pc.milliseconds); pc.start; auto v = new Virtual; foreach (elem; v) {} pc.stop; writeln("Virtual: ", pc.milliseconds); pc.start; scope auto v2 = new Virtual; foreach (elem; v2) {} pc.stop; writeln("Scoped virtual: ", pc.milliseconds); } As you see I have added a version with scoped class at the bottom hoping to improve performance a bit, but the result is the opposite, can you explain me why? Direct: 2699 opApply: 3520 Virtual: 7543 Scoped virtual: 8550 Run on a Core2 2 GHz, on WinXP. Bye, bearophile
Apr 27 2009
dsimcha wrote:Output: Direct: 2343 Virtual: 5695 opApply: 3014 Bottom line is that these pretty much come in the order you would expect them to, but there are no particularly drastic differences between any of them. To put these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have code that is performance critical *and* where the contents of the loop don't take long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you need the iteration to be polymorphic?Thanks for the numbers. The way I'd read them is not to look at the milliseconds but at the fractions. Loops with opApply have an extra 28% overhead, and I think that doesn't make them look very well. Bearophile's numbers show a 30% overhead. I have measurements that show more dramatic pessimization, but I never keep those around. I know in my early tests for std.algorithm that had I used opApply in it, I would have essentially never used an algorithm if I could help it. Andrei
Apr 27 2009
dsimcha wrote: [snip]Output: Direct: 2343 Virtual: 5695 opApply: 3014 Bottom line is that these pretty much come in the order you would expect them to, but there are no particularly drastic differences between any of them. To put these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have code that is performance critical *and* where the contents of the loop don't take long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you need the iteration to be polymorphic?I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness). Virtual was roughly 10 times slower on my machine. (with ldc) Unfortunately, I can't directly compare timings between ldc and dmd directly because dmd is likely at a disadvantage due to being 32-bit in a 64-bit world. Although... the Virtual case takes about equal time with ldc- and dmd-compiled code, so maybe the slowness of Direct/dmd when compared to Direct/ldc (the dmd code is a factor 3 slower) is due to it apparently not register-allocating the loop variable. The opApply case was another factor 2 slower than Direct with dmd on my machine, probably because opApply and the loop body don't get inlined. It seems gdc is the only compiler to realize the first loop can be completely optimized away. It's again about equally fast for Virtual, but for opApply it's roughly a factor 3 slower than ldc; it seem to inline only opApply itself, not the loop body. [1]: -O3 -release (with inlining), x86_64
Apr 27 2009
On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer. I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 28 2009
Michel Fortin wrote:On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer.I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps.I suspect supporting ranges is just much easier. -- Daniel
Apr 28 2009
On 2009-04-28 06:33:13 -0400, Daniel Keep <daniel.keep.lists gmail.com> said:Michel Fortin wrote:I guess I removed too much context from the above posts. We're just timing various foreach implementations. You're right when you say ranges are more versatile than opApply, and I'm all for keeping both ranges and opApply. I just want the compiler to prefer opApply over ranges when both are available when generating code for foreach, since with opApply you sometime can optimize things in a way that that you can't with ranges.On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer.Sure, especially since they're already implemented in the compiler. Inlining of delegates known at compile-time would have a greater reach though. -- Michel Fortin michel.fortin michelf.com http://michelf.com/I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps.I suspect supporting ranges is just much easier.
Apr 28 2009
Daniel Keep wrote:Michel Fortin wrote:Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer.I'm thinking... with proper inlining, perhaps we could take the notion of ranges out of the compiler and just define a generic opApply in std.range that use front, popFront, and empty. :-) Perhaps.I suspect supporting ranges is just much easier. -- Daniel
Apr 28 2009
On Tue, 28 Apr 2009 07:35:22 -0400, downs <default_357-line yahoo.de> wrote:Daniel Keep wrote:read: less overhead than full threads, not less overhead than ranges ;) -SteveMichel Fortin wrote:Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer.
Apr 28 2009
downs wrote:Daniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack? Second, isn't the overhead still relatively high compared to simple function calls?Michel Fortin wrote:Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.On 2009-04-27 10:51:22 -0400, Frits van Bommel <fvbommel REMwOVExCAPSs.nl> said:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I edited this code to work with ldc (D1) + Tango, and saw the Direct and opApply cases generate identical code (inc, cmp, jne, with the loop counter in a register) [1], so they're equally fast (modulo process scheduling randomness).Thank you for your timings. I think it shows my point: that by prefering ranges over opApply we're just optimising around a deficiency in DMD's optimizer.
Apr 28 2009
grauzone wrote:downs wrote:A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each. —Joel SalomonDaniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
Apr 28 2009
Joel C. Salomon wrote:grauzone wrote:Furthermore, using anonymous mmapped files the memory is only allocated when accessed - the only thing taken is address space. Which, granted, can be limiting. Luckily 64-bit will fix that. :)downs wrote:A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each. —Joel SalomonDaniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
Apr 28 2009
downs wrote:Joel C. Salomon wrote:The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs. *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed. This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case". Andrei, Walter??? PS, on the 6502 (the C64 CPU), the stack was hard wired to 0100-01FF. This made the processor vastly simpler, which was essential for cost savings in an era where an additional transistor in the processor did cost an arm and a leg. The area before the stack was used with 8-bit addressing, which was faster, and traditionally it was used more or less as an extension to processor registers, because the 6502 only had the X, Y, A, PC, and Flags registers. The area after the stack was free for the circuit board designer to use in any way he liked. On the C64 this was used for ROM, memory mapped peripherals, IO, the heap, and the program area. Interestingly, self-modifying code was used for keyboard input, to enhance speed and code size. This concept was copied from the VIC-20, which came with 3k of usable RAM.grauzone wrote:downs wrote:A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each.Daniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
Apr 29 2009
Georg Wrede wrote:downs wrote:Since C64 programs have gotten a lot bigger, programming style favors deeper call stacks, and (mutually) recursive functions are not a rarity. Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky. AndreiJoel C. Salomon wrote:The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs. *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed. This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case". Andrei, Walter???grauzone wrote:downs wrote:A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each.Daniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
Apr 29 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size. If this shows the maximum stack usage as 160 bytes, then today's youngsters could allocate, say, 0.25k, (or even 1k) instead of the 50k they'd otherwise allocate. Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.downs wrote:Since C64 programs have gotten a lot bigger, programming style favors deeper call stacks, and (mutually) recursive functions are not a rarity. Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.Joel C. Salomon wrote:The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs. *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed. This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case". Andrei, Walter???grauzone wrote:downs wrote:A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each.Daniel Keep wrote:First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
Apr 29 2009
Georg Wrede wrote:*If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed. This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case".What do you mean by “thread-specific stacks”?Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.How much rewriting of the stack would have to happen? Relocating the stack will invalidate any pointers to automatic (in the C sense) variables. Detecting stack overflow at runtime might also not be trivial. —Joel Salomon
Apr 29 2009
Joel C. Salomon wrote:Detecting stack overflow at runtime might also not be trivial.In a way it is trivial... if you accept a 7% added overhead per function call :o). (The number is what I remember costed Borland Pascal.) Andrei
Apr 29 2009
Andrei Alexandrescu:[Detecting stack overflow at runtime] In a way it is trivial... if you accept a 7% added overhead per function call :o). (The number is what I remember costed Borland Pascal.)<Now Delphi and FreePascal allow to switch that on and off. From the page: http://www.freepascal.org/docs-html/prog/progsu100.html 1.2.24 $S : Stack checking The {$S+} directive tells the compiler to generate stack checking code. This generates code to check if a stack overflow occurred, i.e. to see whether the stack has grown beyond its maximally allowed size. If the stack grows beyond the maximum size, then a run-time error is generated, and the program will exit with exit code 202. Specifying {$S-} will turn generation of stack-checking code off. The command line compiler switch -Ct has the same effect as the {$S+} directive. By default, no stack checking is performed. I have never timed programs to test how much you have to pay if you want stack checking code. The compiler is free, and suitable testing programs can be found in the Shootout site, so if you want you don't need much time to test this. Bye, bearophile
Apr 29 2009
bearophile wrote:Andrei Alexandrescu:Doesn't the OS protect you from stack overflows? -- Daniel[Detecting stack overflow at runtime] In a way it is trivial... if you accept a 7% added overhead per function call :o). (The number is what I remember costed Borland Pascal.)<Now Delphi and FreePascal allow to switch that on and off. From the page: http://www.freepascal.org/docs-html/prog/progsu100.html 1.2.24 $S : Stack checking The {$S+} directive tells the compiler to generate stack checking code. This generates code to check if a stack overflow occurred, i.e. to see whether the stack has grown beyond its maximally allowed size. If the stack grows beyond the maximum size, then a run-time error is generated, and the program will exit with exit code 202. Specifying {$S-} will turn generation of stack-checking code off. The command line compiler switch -Ct has the same effect as the {$S+} directive. By default, no stack checking is performed. I have never timed programs to test how much you have to pay if you want stack checking code. The compiler is free, and suitable testing programs can be found in the Shootout site, so if you want you don't need much time to test this. Bye, bearophile
Apr 29 2009
Andrei Alexandrescu, el 29 de abril a las 09:58 me escribiste:Joel C. Salomon wrote:Using page protection have no overhead, unless you actually hit a stack overflow. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------Detecting stack overflow at runtime might also not be trivial.In a way it is trivial... if you accept a 7% added overhead per function call :o). (The number is what I remember costed Borland Pascal.)
Apr 29 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it."Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.That would be great. I don't know whether it can be done. By the way, I think you're underestimating how much stack the OS gives to the stack. On my system: $ grep PTHREAD_STACK /usr/include/**/*.h /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384 /usr/include/pthread.h: minimal size of the block must be PTHREAD_STACK_MIN.*/ /usr/include/pthread.h: to be started. This size must never be less than PTHREAD_STACK_MIN $ _ You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger. Andrei
Apr 29 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:You could have the GC report stack sizes when it performs a collection. No guarantee it would be a high-water mark, but it's something.Andrei Alexandrescu wrote:Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.Windows works this way, and it's possible to do in user code (the Fiber implementation has some of the groundwork for this). Beyond "usable" stack is a guard page that triggers a stack extension when hit. But I think if you grow the stack past the guard page without ever touching it you can still blow the stack.Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.That would be great. I don't know whether it can be done.By the way, I think you're underestimating how much stack the OS gives to the stack. On my system: $ grep PTHREAD_STACK /usr/include/**/*.h /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384 /usr/include/pthread.h: minimal size of the block must be PTHREAD_STACK_MIN.*/ /usr/include/pthread.h: to be started. This size must never be less than PTHREAD_STACK_MIN $ _ You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger.On Windows I think the default size is 1MB.
Apr 29 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:I'd be amazed if the stack usage of hello.d varied from run to run.Andrei Alexandrescu wrote:You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it."Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.Two methods come to mind.(1)Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.That would be great. I don't know whether it can be done.By the way, I think you're underestimating how much stack the OS gives to the stack. On my system: $ grep PTHREAD_STACK /usr/include/**/*.h /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384 /usr/include/pthread.h: minimal size of the block must be PTHREAD_STACK_MIN.*/ /usr/include/pthread.h: to be started. This size must never be less than PTHREAD_STACK_MINSeems like that minimum means the OS won't start a new thread with less [stack] space left on the computer. Gotta be /some/ value, big enough to be fine with virtually any app.You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger.Totally not having researched this... The page http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-occupied-process.html states that "Most Linux systems have no stack limits set" (didn't have time to find a definitive reference, yet). Also, PTHREAD_STACK_MIN would be the main process stack size. If we're running co-operative multitasking, or continuations or some such, then the other stacks really can be smaller. The important thing to know is, does "somebody else" than our program use the current stack? If not, then we're free to allocate "just enough" (in practice much more) to the stacks. IIUC, our stack is only used by our app, its library calls, and the OS API calls. (1) Using several stacks: What does one need to use several stacks? Let's assume we have the "regular" stack, and then we temporarily want to use another. First we allocate space for the new stack. Then we disable interrupts, (we could even save processor state), change the stack pointer to point to the new stack. Enable interrupts, and go on with the program (i.e. call the routine we want to have using the new stack). Once it finishes, disable interrupts, reset the stack pointer, enable interrupts. Knowing in advance the needed stack space helps here. Especially if instances of the same code are to run with each stack, then all we need is an array of stack spaces. :-) The other thing: suppose we really don't know the stack usage. Then each routine can use the same stack while it runs, and at task switch we copy the actually used portion of the stack to the heap. Next time we want to run that task, we copy the data back from the heap. I admit this isn't done in an evening, but it's doable.
Apr 29 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:I'd be amazed if hello.d were your prototype of a useful program :o). Let's face it. Any program that uses mutual recursion of any kind (even when the user did not set out to use recursion) or that uses alloca (fewer), is bound to have the consumed stack dependent on the input. Even without recursion, various traces of a program depend on a data and have different stack depths. That's why your estimate is utterly useless. I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it). I can't even believe I'm replying! :o) AndreiGeorg Wrede wrote:I'd be amazed if the stack usage of hello.d varied from run to run.Andrei Alexandrescu wrote:You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it."Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.
Apr 29 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:Somebody more resourceful than I might /prove/ some maximum for any program that doesn't use recursion (be it implicit or explicit recursion). Hello.d attempted to be an example of such a program.Andrei Alexandrescu wrote:I'd be amazed if hello.d were your prototype of a useful program :o).Georg Wrede wrote:I'd be amazed if the stack usage of hello.d varied from run to run.Andrei Alexandrescu wrote:You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it."Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.Let's face it. Any program that uses mutual recursion of any kind (even when the user did not set out to use recursion) or that uses alloca (fewer), is bound to have the consumed stack dependent on the input.Of course. And there's a limit to the stack size, ultimately at the hardware level. For any increase in available stack size, the probability of an out-of-stack exception decreases, never reaching zero. I agree.Even without recursion, various traces of a program depend on a data and have different stack depths. That's why your estimate is utterly useless.It'd be useless if we were attempting to give 100.000% guarantees.I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it).Jesting on a level where you might not get it, wouldn't be fun. And definitely boring and confusing to the rest of the crowd. :-) Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.
Apr 29 2009
Georg Wrede wrote:Somebody more resourceful than I might /prove/ some maximum for any program that doesn't use recursion (be it implicit or explicit recursion). Hello.d attempted to be an example of such a program.There's no need for a lot of expertise, it's a trivial fact for anyone who's written even a toy compiler. The tidbit that's difficult is the interprocedural part: in order to compute the stack consumed by any function, you need to know the stack consumed by all functions it invokes.Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.I think my failure is seeing what the idea is. My understanding is that you use one trace of a run to estimate the (average? minimum? maximum?) stack needed by future runs of a thread. It's unclear to me how that estimate would be used, what sort of information it provides (if any - I highly doubt it tells much), and generally how you'd integrate that into a compiler. Andrei
Apr 29 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:I think in the first instance it was at least to get an idea of how much stack space you're typically using. I must confess to having absolutely no idea of how much stack space my D apps are using. It'd be an interesting thing to know.Somebody more resourceful than I might /prove/ some maximum for any program that doesn't use recursion (be it implicit or explicit recursion). Hello.d attempted to be an example of such a program.There's no need for a lot of expertise, it's a trivial fact for anyone who's written even a toy compiler. The tidbit that's difficult is the interprocedural part: in order to compute the stack consumed by any function, you need to know the stack consumed by all functions it invokes.Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.I think my failure is seeing what the idea is. My understanding is that you use one trace of a run to estimate the (average? minimum? maximum?) stack needed by future runs of a thread. It's unclear to me how that estimate would be used, what sort of information it provides (if any - I highly doubt it tells much), and generally how you'd integrate that into a compiler.Andrei
Apr 29 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:Having looked into things, it turns out I'm the one that now suspects you of jesting.Georg Wrede wrote:Somebody more resourceful than I might /prove/ some maximum for any program that doesn't use recursion (be it implicit or explicit recursion). Hello.d attempted to be an example of such a program.Andrei Alexandrescu wrote:I'd be amazed if hello.d were your prototype of a useful program :o).Georg Wrede wrote:I'd be amazed if the stack usage of hello.d varied from run to run.Andrei Alexandrescu wrote:You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it."Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.Let's face it. Any program that uses mutual recursion of any kind (even when the user did not set out to use recursion) or that uses alloca (fewer), is bound to have the consumed stack dependent on the input.Of course. And there's a limit to the stack size, ultimately at the hardware level. For any increase in available stack size, the probability of an out-of-stack exception decreases, never reaching zero. I agree.Even without recursion, various traces of a program depend on a data and have different stack depths. That's why your estimate is utterly useless.It'd be useless if we were attempting to give 100.000% guarantees.I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it).Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.There is prior art, by the truckload. A /very/ short slide show here: http://zope.stackless.com/StacklessEuroPy.ppt I'd kill to get that in D. And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.
Apr 30 2009
Georg Wrede wrote:What I referred to was inferring a thread's actual stack requirements from one trace of its run. That's just untenable.Having looked into things, it turns out I'm the one that now suspects you of jesting.I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it).Interesting, just you can't claim you were talking about that all along. I mean, come on! You say one thing that was untenable, and now you come up with a different one that is. And you were right all along?Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.There is prior art, by the truckload. A /very/ short slide show here: http://zope.stackless.com/StacklessEuroPy.ppt I'd kill to get that in D.And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.Python has apparently set a maximum to have a guarantee it won't crash when given a certain minimum stack size. That's nice, but I'm not quite seeing how that's opposite to what I said in this discussion. Andrei
Apr 30 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:Could it be that you're looking at this too much through the eyes of a compiler developer? With souce code that can include templates, it should be no surprise that a programmer can write a file which results in preposterous stack needs. This is certainly the case with C++, for example.Having looked into things, it turns out I'm the one that now suspects you of jesting.I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it).What I referred to was inferring a thread's actual stack requirements from one trace of its run.Of course you'd give different input data, a little like you give different input data to your unit tests.That's just untenable.Agreed, IF there's a possibility of recursion happening, the variation in input data is not guaranteed, etc. But for practical purposes (read: most of my programs, and presumably those of the other programmers'), the situation is different. For a particular application, we expect some explicit or implicit constarints on the input data. We also know (er, expect) whether (and to what extent) there will be recursion. Additionally (as with any real-life application), we accept that if the input data goes wild, the application /doesn't/ have to cope with it. (Making nuke proof apps where reasonably good is sufficient, is just gratuituous work.) For example, a robust application, instead of accepting and coping with *anything*, instead has *documented limitations*. IMHO, that is the right way to "fix infinity problems". Not demanding limitless stack for all kinds of programs.I was actually talking about *two* things. If you remember, I was talking about measuring actually used stack depth. Then I got the idea of having separate stacks for threads. Two separate things in a post. (Now that I think more about it, I must have heard about stackless Python somewhere. But as I've seen others do, this time I, too, believed I invented something that I've actually seen somewhere.)Interesting, just you can't claim you were talking about that all along. I mean, come on! You say one thing that was untenable, and now you come up with a different one that is. And you were right all along?Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.There is prior art, by the truckload. A /very/ short slide show here: http://zope.stackless.com/StacklessEuroPy.ppt I'd kill to get that in D.Nope. It's because - They don't want the entire computer thrashing each time some idiot has a typo in his recursive algorithm. - They genuinely don't think you usually need preposterous levels of recursion. And if you do, then just change the limit. Python is for Real Computers (as opposed to Windows). There it's polite to make your apps (here, the Python interpreter) have some decency as to how much resources they take. The idea is that there are hundreds of other users, and dozens of daemons running critical apps, round the clock. If developers didn't know (by measurement or analysis) the reasonable stack size needed, there would be no mobile phones, no set-top boxes with hard disks, no GPS receivers, no digital PBX appliances, no solid state firewalls, no ADSL boxes, no programmable calculators...... Dammit, any pocket calculator that has parenthesis buttons needs a stack. And that stack space is predermined by the manufacturer. I'm still maintaining that the stack space needs of regular applications are much less than people tend to think. Last time I checked, you ran out of stack on a name brand calculator with just ten levels of parentheses. That's not much. On the other hand, in real life, not too many people in the world have ever seen a calculator run out of stack space.And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.Python has apparently set a maximum to have a guarantee it won't crash when given a certain minimum stack size. That's nice, but I'm not quite seeing how that's opposite to what I said in this discussion.
Apr 30 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:I am sorry, I need to fix a few mistakes in this post and then I'll have to leave this conversation as it becomes ungainful for us both. The above is wrong, or at least contains a claim that you'd need to support better. Stack size requirements grow with the prevalence of short functions. Those are a style of programming not particularly correlated with templates. I don't have lots of small functions that call one another in templated code, any more than in regular code.Georg Wrede wrote:Could it be that you're looking at this too much through the eyes of a compiler developer? With souce code that can include templates, it should be no surprise that a programmer can write a file which results in preposterous stack needs. This is certainly the case with C++, for example.Having looked into things, it turns out I'm the one that now suspects you of jesting.I can't believe you brought it up as anything else but jesting (maybe you are and I'm not getting it).This is utter nonsense. Practical vs. theoretical has absolutely nothing to do with this all. Forget recursion. I'm talking about straight "if"s that create different program traces that in turn require different stack sizes. Collecting one and saying it's fine "for practical purposes" is just nonsense. What's so complicated about this?What I referred to was inferring a thread's actual stack requirements from one trace of its run.Of course you'd give different input data, a little like you give different input data to your unit tests.That's just untenable.Agreed, IF there's a possibility of recursion happening, the variation in input data is not guaranteed, etc. But for practical purposes (read: most of my programs, and presumably those of the other programmers'), the situation is different.For a particular application, we expect some explicit or implicit constarints on the input data. We also know (er, expect) whether (and to what extent) there will be recursion. Additionally (as with any real-life application), we accept that if the input data goes wild, the application /doesn't/ have to cope with it. (Making nuke proof apps where reasonably good is sufficient, is just gratuituous work.)Again, just to drive the point home - recursion has nothing to do with. Your inferences are starting from a faulty premise.For example, a robust application, instead of accepting and coping with *anything*, instead has *documented limitations*. IMHO, that is the right way to "fix infinity problems". Not demanding limitless stack for all kinds of programs.Threads already have separate stacks. (You might have said more.)I was actually talking about *two* things. If you remember, I was talking about measuring actually used stack depth. Then I got the idea of having separate stacks for threads. Two separate things in a post.Interesting, just you can't claim you were talking about that all along. I mean, come on! You say one thing that was untenable, and now you come up with a different one that is. And you were right all along?Maybe I'll have to dig up prior art, or something. Clearly I'm not qualified to properly defend the validity of this idea.There is prior art, by the truckload. A /very/ short slide show here: http://zope.stackless.com/StacklessEuroPy.ppt I'd kill to get that in D.(Now that I think more about it, I must have heard about stackless Python somewhere. But as I've seen others do, this time I, too, believed I invented something that I've actually seen somewhere.)Fortran was stackless for... for a long time :o). An activation record in Fotran is in static memory. To call a Fortran function, you deposit arguments in that static storage, issue the CALL, and wait.Sure, I'm not contending that. But that's a rather different claim than some others you made. AndreiNope. It's because - They don't want the entire computer thrashing each time some idiot has a typo in his recursive algorithm. - They genuinely don't think you usually need preposterous levels of recursion. And if you do, then just change the limit. Python is for Real Computers (as opposed to Windows). There it's polite to make your apps (here, the Python interpreter) have some decency as to how much resources they take. The idea is that there are hundreds of other users, and dozens of daemons running critical apps, round the clock. If developers didn't know (by measurement or analysis) the reasonable stack size needed, there would be no mobile phones, no set-top boxes with hard disks, no GPS receivers, no digital PBX appliances, no solid state firewalls, no ADSL boxes, no programmable calculators...... Dammit, any pocket calculator that has parenthesis buttons needs a stack. And that stack space is predermined by the manufacturer. I'm still maintaining that the stack space needs of regular applications are much less than people tend to think. Last time I checked, you ran out of stack on a name brand calculator with just ten levels of parentheses. That's not much. On the other hand, in real life, not too many people in the world have ever seen a calculator run out of stack space.And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.Python has apparently set a maximum to have a guarantee it won't crash when given a certain minimum stack size. That's nice, but I'm not quite seeing how that's opposite to what I said in this discussion.
Apr 30 2009
Andrei Alexandrescu wrote:I am sorry, I need to fix a few mistakes in this post and then I'll have to leave this conversation as it becomes ungainful for us both.Sigh.
Apr 30 2009
Georg Wrede:And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.See also sys.setrecursionlimit() to lift that limit where useful. You can also define how much deep to show the when an error occurs (to avoid flooding the shell, for example). Python is a high-level language, so naturally people expect it to be fit for recursive algorithms, because they are sometimes shorter and more natural (when they work on nested data structures). But the low stack limit (and the very low performance of function calls) forces people to often manually translate recursive algorithms into iterative ones. I hate this. Bye, bearophile
Apr 30 2009
bearophile wrote:Georg Wrede:Python seems a nice language. It's a shame that it's a bit slow. Heh, that's good for D.And about your comments on stack size, seems regular Python has an in-built limit on recursion, at 1000 deep. That should be diametrically opposite your stance on stack size.See also sys.setrecursionlimit() to lift that limit where useful. You can also define how much deep to show the when an error occurs (to avoid flooding the shell, for example). Python is a high-level language, so naturally people expect it to be fit for recursive algorithms, because they are sometimes shorter and more natural (when they work on nested data structures). But the low stack limit (and the very low performance of function calls) forces people to often manually translate recursive algorithms into iterative ones. I hate this.
Apr 30 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:One thing I don't understand is how the OS can manage essentially unbounded stack growth in a multithreaded program with a flat address space. Is it simply that the physical memory for each stack is mapped into the same virtual address space when a context switch occurs? And if so, how does this work with multiple cores? The only alternative I can think of would be to dedicate a virtual address range to each stack, which would set a definite upper bound on the stack size for each thread, and wasted/unused memory between stacks.You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger.Totally not having researched this... The page http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-o cupied-process.html states that "Most Linux systems have no stack limits set" (didn't have time to find a definitive reference, yet).
Apr 29 2009
Sean Kelly wrote:Georg Wrede wrote:Well, unbounded and unbounded. It just means there's no set value. The stack can grow as long as there is space. And the virtual address range is more than there's physical memory, so I guess they can say no limit.Andrei Alexandrescu wrote:One thing I don't understand is how the OS can manage essentially unbounded stack growth in a multithreaded program with a flat address space. Is it simply that the physical memory for each stack is mapped into the same virtual address space when a context switch occurs? And if so, how does this work with multiple cores? The only alternative I can think of would be to dedicate a virtual address range to each stack, which would set a definite upper bound on the stack size for each thread, and wasted/unused memory between stacks.You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger.Totally not having researched this... The page http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-o cupied-process.html states that "Most Linux systems have no stack limits set" (didn't have time to find a definitive reference, yet).
Apr 29 2009
Andrei Alexandrescu wrote:Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.No way in hell is there a reliable way for a compiler to estimate stack depth. Heck, just a virtual function call blows it. The stack, however, is checked. The virtual memory system puts a "guard" page following the end of the stack, so running off the end produces an exception. If there is unallocated space beyond that, the operating system will allocate more pages, then resume from the exception.
Apr 29 2009
Some more notes about Phobos2. Some of the things I say may be wrong because my experience with Phobos2 is limited still. Daniel Keep:Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I'll try to zip two ranges that return the leaves of two different binary trees. This can be a simple example to show to attract people to D2 language :-) So I've tried using zip: import std.range: zip; import std.stdio: writeln; void main() { auto a = [1, 2, 3]; string[] b = ["a", "b", "c"]; foreach (xy; zip(a, b)) writeln(xy.at!(0), " ", xy.at!(1)); } That doesn't work: ...\dmd\src\phobos\std\range.d(1847): Error: template instance Zip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...) if (R.length && allSatisfy!(isInputRange,R)) probably because 'a' is a static array. Is isInputRange false for static arrays? ------------ The most basic and useful range is the one of integer numbers. How can I create with Phobos2 lazy and eager ranges like the following ones?[1, 2, 3, 4]range(1, 5)[5, 4, 3, 2]range(5, 1, -1)[5, 6, 7, 8, 9]list(xrange(5, 10))[5, 7, 9] Similar ranges are useful with map,zip, and in many other situations. (I hope the x..y range syntax of D2 foreach will evolve in a syntax that can be used everywhere an integral range can be used). ------------ The docs say about "map":list(xrange(5, 10, 2))Multiple functions can be passed to map. In that case, the element type of map is a tuple containing one element for each function.< [...] foreach (e; map!("a + a", "a * a")(arr1))Having the possibility to map two functions in parallel may be useful in some situations (where for example computing the items is costly) but quite more useful can be to map a function that takes two or more arguments. An example:['a', 'bb', 'ccc'] If you don't have that you are forced to use a zip inside the map, but then you also are forced to change the D2 mapping function to something like: (p){ return p.at!(0) * p.at!(1); } ---------- all() and any() functions are useful to test if all or any items of an iterable are true (with optinal mapping function too). They are useful as templates too, so I suggest to rename allSatisfy as "All" and "anySatisfy" as "Any". A similar template named "Map" can be created if not already present, to map a given template to a typetuple. In future I'll probably write more notes, suggestions and questions about Phobos2. I hope such notes are very useful. Bye, bearophilemap(lambda x, y: x*y, ["a", "b", "c"], xrange(1, 4))
Apr 28 2009
== Quote from bearophile (bearophileHUGS lycos.com)'s articleSome more notes about Phobos2. Some of the things I say may be wrong because myexperience with Phobos2 is limited still.Daniel Keep:This can be a simple example to show to attract people to D2 language :-)Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation.I'll try to zip two ranges that return the leaves of two different binary trees.So I've tried using zip: import std.range: zip; import std.stdio: writeln; void main() { auto a = [1, 2, 3]; string[] b = ["a", "b", "c"]; foreach (xy; zip(a, b)) writeln(xy.at!(0), " ", xy.at!(1)); } That doesn't work: ...\dmd\src\phobos\std\range.d(1847): Error: template instanceZip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...) if (R.length && allSatisfy!(isInputRange,R))probably because 'a' is a static array. Is isInputRange false for static arrays?Yes, because the range interface for arrays relies on std.array being imported and using extension method syntax. This has bitten me a bunch of times, too. The functions in std.array take dynamic arrays and static arrays are not implicitly convertible to dynamic arrays, except in the specific case of binding an array literal to a variable. Whether they should be, I don't know.------------ The most basic and useful range is the one of integer numbers. How can I createwith Phobos2 lazy and eager ranges like the following ones?used everywhere an integral range can be used). I think you're looking for std.range.iota, although it doesn't work so well right now because of bug 2871.[1, 2, 3, 4]range(1, 5)[5, 4, 3, 2]range(5, 1, -1)[5, 6, 7, 8, 9]list(xrange(5, 10))[5, 7, 9] Similar ranges are useful with map,zip, and in many other situations. (I hope the x..y range syntax of D2 foreach will evolve in a syntax that can belist(xrange(5, 10, 2))
Apr 28 2009
dsimcha:Whether they should be, I don't know.<In my D1 dlibs most things digest static arrays too (but they don't let you iterate on them or return them, that's a limit of the language). I don't know D2 enough yet to give you a better answer, but I think it's better to remove similar limits from the language. zip-ing two static arrays has to be a legit operation.I think you're looking for std.range.iota,<Yes, sorry, I have found it few seconds after sending the post. I am dumb.although it doesn't work so well right now because of bug 2871.<And I have found the bug, of course... :-) I can also see Denis Koroskin has suggested a small fix: http://d.puremagic.com/issues/show_bug.cgi?id=2871 Regarding iota: it's better for the third argument of iota to defult to 1, becasuse most of the times you want a step = 1. In Python range() goes one step further: if you give just one argument it's meant to be the second one and the first defaults to zero. In practice this causes no problems because in Python you use range()/xrange() all the time. In D I may accept iota to have 2-3 arguments because it will probably used much less often. Bye and thank you, bearophile
Apr 28 2009
dsimcha Wrote:So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range. This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage. Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature. The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees. I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions. However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose: 1. Every element is now (void*).sizeof bytes larger because you have to store a left and right pointer. 2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.) Both of these are important in the more common, general case of sane hashing performance.Ranges, ranges! That's all I hear these days, and it looks to me like the continuing advance of D toward being a complete meta-language. Where do I see ranges described in terms that an old hand can understand? I'm constantly having to roll my own in many areas when I see how the meta stuff is implemented - like x ~= c to add a character to the end of an array - reallocation every time? I thought D was supposed to be a systems programming language, not something that was guaranteed to win the universe obfuscated code competition. I've been trying to keep my projects up to date and compilable with D2.0xx, but I think I'm going to give up on that and rewrite them for whatever the current version of D1 is. I seriously think that the crew who are driving the development should realize that not all computer programmers are going to have an IQ of 140, and that any practical language should be comprehensible to a majority of its users. Maybe the problem I'm complaining about is just a lack of documentation. Generating said from comments really does not hack it - comments are always skimped, and usually lie. Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.
Apr 26 2009
== Quote from Steve Teale (steve.teale britseyeview.com)'s articledsimcha Wrote:continuing advance of D toward being a complete meta-language.So I tried to create a struct to allow iteration over the keys or values of a builtin associative array using a lazy forward range. This would allow one to pass the keys or values of an AA to any function that expects a regular old forward range w/o any heap activity, eager copying, etc. and with O(1) auxiliary memory usage. Now that ranges are the lingua franca of iteration in D, this seems like a pretty necessary feature. The problem is that D's current AAs are based on a hash table with collision resolution done using binary trees. I guess this decision was made to allow for O(log N) worst case performance even when there are ridiculous amounts of hash collisions. However, this is a very bad decision because, in exchange for more bulletproof performance in corner cases, you lose: 1. Every element is now (void*).sizeof bytes larger because you have to store a left and right pointer. 2. You can't iterate over a binary tree in O(1) auxiliary space, at least not in any way I'm aware of. This means that a lazy range to iterate over the keys or values is relatively useless because it would generate heap activity anyhow to store the stack or queue necessary to iterate over the tree, so you may as well just use the current .keys or .values properties, which just return an array. (Because of the way that ranges are designed, you can't just use the call stack, unless you use fibers or something, which is obviously not a win from an efficiency point of view.) Both of these are important in the more common, general case of sane hashing performance.Ranges, ranges! That's all I hear these days, and it looks to me like theWhere do I see ranges described in terms that an old hand can understand? I'm constantly having to roll my own in many areas when I see how the meta stuffis implemented - like x ~= c to add a character to the end of an array - reallocation every time?I thought D was supposed to be a systems programming language, not somethingthat was guaranteed to win the universe obfuscated code competition.I've been trying to keep my projects up to date and compilable with D2.0xx, butI think I'm going to give up on that and rewrite them for whatever the current version of D1 is.I seriously think that the crew who are driving the development should realizethat not all computer programmers are going to have an IQ of 140, and that any practical language should be comprehensible to a majority of its users.Maybe the problem I'm complaining about is just a lack of documentation.Generating said from comments really does not hack it - comments are always skimped, and usually lie.Before something like Ranges are implemented, there should be some sort of RFCprocess where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all. But there was an RFC, and it was even called that. It just happened to take place on this newsgroup. Also, keep in mind that D2 is still in alpha. Worrying about how to explain this stuff to people who aren't heavily involved with D at this point would slow down the pace of evolution and generate more confusion. The time to do this is when the dust has settled a little.
Apr 26 2009
Steve Teale, el 26 de abril a las 14:39 me escribiste:Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.A PEP[1]-like process would be great. [1] http://www.python.org/dev/peps/ -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- <o_O> parakenotengobarraespaciadora <o_O> aver <o_O> estoyarreglandolabarraporkeserompiounapatita
Apr 26 2009
Steve Teale wrote:Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.There was. Incidentally, it was called "RFC on range design for D2". I understand your frustration, but if you think for a minute, you realize your comments are uncalled for. We have discussed ranges at length and the community has had a long time to establish semantics and even names. I have (early, often, and repeatedly) warned the community that there will be a Phobos update that is bound to break a lot of code. I actively tried to massage several breaking changes into one single release so as to sweeten the pill. That all took a lot of time and thought from several people (Walter, Sean, and myself) who have better things to do. Now it would be great if you put yourself in our shoes, read your comments again, and tell me how they reflect on the writer. You mention you want to see ranges described in terms that an old hand can understand. That's great. But that one good request was marred by comments that show you are more interested in reaffirming your prejudice, than in overcoming it. Andrei
Apr 26 2009
Steve Teale wrote:Ranges, ranges! That's all I hear these days, and it looks to me like the continuing advance of D toward being a complete meta-language. Where do I see ranges described in terms that an old hand can understand? I'm constantly having to roll my own in many areas when I see how the meta stuff is implemented - like x ~= c to add a character to the end of an array - reallocation every time? I thought D was supposed to be a systems programming language, not something that was guaranteed to win the universe obfuscated code competition. I've been trying to keep my projects up to date and compilable with D2.0xx, but I think I'm going to give up on that and rewrite them for whatever the current version of D1 is. I seriously think that the crew who are driving the development should realize that not all computer programmers are going to have an IQ of 140, and that any practical language should be comprehensible to a majority of its users.It will, be patient. See below.Maybe the problem I'm complaining about is just a lack of documentation. Generating said from comments really does not hack it - comments are always skimped, and usually lie. Before something like Ranges are implemented, there should be some sort of RFC process where they are properly described rather than a reliance on D users to have read every thread of the newsgroup, and remembered it all.There's an illusion. And that illusion comes from the D newsgroups having "wrong" names. The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality. Being as it is (a de facto D2 development newsgroup), d.D will contain legthy discussions that meander and roam, possibly for months, before something crystallises, and the issue is settled (at least for the time). A lack of proper documentation belongs to this setup. (But then, one really has to congratulate Andrei, Phobos docs have *never* been as good as they are today!!!) About ranges: I suspect that once D2 is solidified (and we start working on D3 :-) ), ranges will be easy to use, easy to understand, and practical. Incidentally, that same complaint could have been said about templates. It took more than a year to get the discussion settled down a bit, and today templates seem not impossible at all to begin using (unless you want to begin right with the hairy stuff). And earlier, it took two months of intense discussion here before Unicode, UTF and their relation to char, wchar and dchar got generally understood. Today you can program in D without much thinking about UTF, and things "just work". Even abroad. But to sum it all, any project whose purpose is anything else than learning D2, should be written in D1.
Apr 27 2009
On Mon, 27 Apr 2009 11:32:30 +0300, Georg Wrede wrote:Phobos docs have *never* been as good as they are today!!!Really?! I still find them very confusing. Maybe its just the formatting but I often find it so hard to understand or find what I'm after that I refer to the source code to work it out.About ranges: I suspect that once D2 is solidified (and we start working on D3 :-) ), ranges will be easy to use, easy to understand, and practical. Incidentally, that same complaint could have been said about templates. It took more than a year to get the discussion settled down a bit, and today templates seem not impossible at all to begin using (unless you want to begin right with the hairy stuff).It must be me, but I still find D2 templates as clear as mud. The syntax is butt-ugly, counter intuitive, and reader-hostile, IMHO. I'm sure that C++ is worse, but that is not the issue. I'll have to wait until somebody writes the "D Templates for Dummies" book. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Apr 27 2009
Derek Parnell wrote:On Mon, 27 Apr 2009 11:32:30 +0300, Georg Wrede wrote:Suggestions are always welcome. AndreiPhobos docs have *never* been as good as they are today!!!Really?! I still find them very confusing. Maybe its just the formatting but I often find it so hard to understand or find what I'm after that I refer to the source code to work it out.
Apr 27 2009
Derek Parnell:[...] I still find D2 templates as clear as mud. The syntax is butt-ugly, counter intuitive, and reader-hostile, IMHO.I find templates in D1 easy to use in most situations. And template constraints of D2 too are easy. What kind of syntax/semantics do you suggest to improve the current situation? If you have ideas I am interested. Bye, bearophile
Apr 27 2009
Georg Wrede wrote:There's an illusion. And that illusion comes from the D newsgroups having "wrong" names. The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup. Andrei
Apr 27 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:And what would we do the day D987 came out? Start another group called digitalmars.d987-design? Or just create one called digitalmars.d.design, and use that no matter the version of D being discussed? -- SimenThere's an illusion. And that illusion comes from the D newsgroups having "wrong" names. The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
Apr 27 2009
Simen Kjaeraas wrote:Andrei Alexandrescu wrote:How about digitalmars.dnext?Georg Wrede wrote:And what would we do the day D987 came out? Start another group called digitalmars.d987-design? Or just create one called digitalmars.d.design, and use that no matter the version of D being discussed? -- SimenThere's an illusion. And that illusion comes from the D newsgroups having "wrong" names. The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
Apr 27 2009
On Tue, 28 Apr 2009 01:38:27 +0400, Christopher Wright <dhasenan gmail.com> wrote:Simen Kjaeraas wrote:Maybe digitalmars.future? :)Andrei Alexandrescu wrote:How about digitalmars.dnext?Georg Wrede wrote:And what would we do the day D987 came out? Start another group called digitalmars.d987-design? Or just create one called digitalmars.d.design, and use that no matter the version of D being discussed? -- SimenThere's an illusion. And that illusion comes from the D newsgroups having "wrong" names. The D2 newsgroup should have a name like "D2 discussion -- for D language development folks, enter at your own risk". And a *D1* newsgroup should then be for anybody who actually uses the language for something. Currently, actually, the D.learn newsgroup has sort-of assumed this functionality.Yes, well put. I think it would be great to define a digitalmars.d2 or digitalmars.d2-design newsgroup.
Apr 27 2009