digitalmars.D - Performance penalty for using ranges
- Paul Jurczak (49/49) Aug 25 2013 I wrote a few versions of a palindrome testing function and
- bearophile (68/73) Aug 25 2013 I have modified and improved and fixed your code a little:
- Paul Jurczak (37/60) Aug 25 2013 [..]
- Joseph Rushton Wakeling (6/7) Aug 25 2013 It's never used, so notating it like this means you'll never clash with ...
- monarch_dodra (10/17) Aug 25 2013 It never clashes untill you nest two foreach, and then you have
- Joseph Rushton Wakeling (2/9) Aug 25 2013 Good call. :-)
- Andrei Alexandrescu (5/18) Aug 25 2013 I don't see why the excitement around anonymous bindings. It's a rare
- bearophile (19/22) Aug 25 2013 It's not a rare case, I can test this searching in my code for
- bearophile (4/6) Aug 25 2013 Beside my code, if you want I can show 16 URLs to use cases of
- Joseph Rushton Wakeling (4/7) Aug 25 2013 Every little helps. :-)
- monarch_dodra (9/30) Aug 26 2013 I think "the excitement" is an overstatement. The subject comes
- Paul Jurczak (59/62) Aug 25 2013 [..]
- bearophile (34/61) Aug 25 2013 I have used _ as variable iteration name to remind the person
- Joseph Rushton Wakeling (20/24) Aug 25 2013 It's slightly annoying that one can't readily get immutability to play n...
- bearophile (15/25) Aug 25 2013 Thankfully foreach_reverse was not deprecated:
- monarch_dodra (2/12) Aug 25 2013 If somebody were to implement this, I would *love* to review it.
- H. S. Teoh (14/28) Aug 25 2013 If somebody were to attempt to implement this, I'd recommend taking this
- Joseph Rushton Wakeling (4/5) Aug 25 2013 Really? I thought it was viewed these days as some kind of awful abomin...
- Timon Gehr (2/5) Aug 25 2013 Why would that be the case?
- bearophile (13/17) Aug 25 2013 In the real world, where we live, there is a thing named
- Timon Gehr (2/8) Aug 25 2013 This is uncalled for.
- Joseph Rushton Wakeling (6/7) Aug 25 2013 I'm sure you didn't mean it, but your original remark, "Why would that b...
- Timon Gehr (5/10) Aug 25 2013 It was a genuine question. I assumed that the inliner actually works.
- bearophile (7/8) Aug 25 2013 I didn't mean to offend, sorry for the unneeded (light) sarcasm.
- Timon Gehr (3/7) Aug 25 2013 Well, this kind of thing does not propagate too well over the Internet.
- Joseph Rushton Wakeling (8/9) Aug 25 2013 I don't know why it _need_ be the case, but it _is_ the case in practice...
I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges? Timing was done on Windows with DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release. // 6.9ms bool isPalindrome0(T)(T[] s) { auto i = 0; for (; i < s.length/2 && s[i] == s[$-i-1]; ++i) {} return i == s.length/2; } // 21ms bool isPalindrome1(T)(T[] s) { while (s.length > 1 && s.front == s.back) { s.popBack(); s.popFront(); } return s.length <= 1; } // 21ms bool isPalindrome2(T)(T[] s) { for (; s.length > 1 && s.front == s.back; s.popBack(), s.popFront()) {} return s.length <= 1; } // 21.4ms bool isPalindrome3(T)(T s) { foreach (i; 0..s.length/2) if (s[i] != s[$-i-1]) return false; return true; } int test() { int a[10]; int n = 0; foreach (i; 0..1_000_000) { a[4] = !a[4]; n += isPalindrome0(a); } return n; } void main() { enum N = 16; auto t = benchmark!(test)(N); writeln(cast(float)t[0].msecs/N); writeln(test); }
Aug 25 2013
Paul Jurczak:I wrote a few versions of a palindrome testing function and noticed that versions using ranges (isPalindrome1..isPalindrome3) are 3 times slower than array indexing one (isPalindrome0). Is there a more efficient way of using ranges?I have modified and improved and fixed your code a little: import std.stdio, std.datetime, std.array, std.typetuple, std.range, std.algorithm; bool isPalindrome0(T)(in T[] s) pure nothrow { size_t i = 0; for (; i < s.length / 2 && s[i] == s[$ - i - 1]; ++i) {} return i == s.length / 2; } bool isPalindrome1(T)(T[] s) pure nothrow { while (s.length > 1 && s.front == s.back) { s.popBack; s.popFront; } return s.length <= 1; } bool isPalindrome2(T)(T[] s) pure nothrow { for (; s.length > 1 && s.front == s.back; s.popBack, s.popFront) {} return s.length <= 1; } bool isPalindrome3(T)(in T[] s) pure nothrow { foreach (immutable i; 0 .. s.length / 2) if (s[i] != s[$ - i - 1]) return false; return true; } /// A high-level version. bool isPalindrome4(T)(in T[] s) /*pure nothrow*/ { return s.retro.equal(s); } int test(alias F)(in int nLoops) /*pure nothrow*/ { int[10] a; typeof(return) n = 0; foreach (immutable _; 0 .. nLoops) { a[4] = !a[4]; n += F(a); } return n; } void main() { enum size_t nLoops = 60_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { sw.reset; sw.start; immutable n = test!alg(nLoops); sw.stop; writeln(n, " ", sw.peek.msecs); } } Using a compiler with a better optimizing back-end (especially with a better inlining), the timings show that the faster versions are the second and third, but all of them have a very similar performance (the last one reads the whole array twice, so it's a little slower, as expected): ...>ldmd2 -O -release -inline -noboundscheck -run test2.d 30000000 1150 30000000 1073 30000000 1038 30000000 1215 30000000 1621 Someday a function like isPalindrome4 will be pure & nothrow. Bye, bearophile
Aug 25 2013
On Sunday, 25 August 2013 at 14:54:04 UTC, bearophile wrote:I have modified and improved and fixed your code a little:[..] Thank you for your analysis. I've run it on Linux with DMD64 2.063.2: dmd -O -noboundscheck -inline -release and relative timings are different than on Windows: 500000 7160 500000 18588 500000 18592 500000 14466 500000 28368 I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why. I have a few newbie questions below:int test(alias F)(in int nLoops) /*pure nothrow*/ { int[10] a; typeof(return) n = 0; foreach (immutable _; 0 .. nLoops) { a[4] = !a[4]; n += F(a); } return n; }What is the purpose of "immutable _;" above? Why not "i;"?void main() { enum size_t nLoops = 60_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { sw.reset; sw.start; immutable n = test!alg(nLoops); sw.stop; writeln(n, " ", sw.peek.msecs); } }Same question here: why immutable? I wanted to get more stable run-to-run results, so I'm looking for minimums: void main() { enum N = 100; enum M = 1_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { int [N] results; long[N] timings; foreach (i; 0..N) { sw.reset; sw.start; results[i] = test!alg(M); sw.stop; timings[i] = sw.peek.usecs; } writeln(results.reduce!min, " ", timings.reduce!min); } }
Aug 25 2013
On 25/08/13 20:07, Paul Jurczak wrote:What is the purpose of "immutable _;" above? Why not "i;"?It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i). Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...
Aug 25 2013
On Sunday, 25 August 2013 at 18:13:39 UTC, Joseph Rushton Wakeling wrote:On 25/08/13 20:07, Paul Jurczak wrote:It never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009What is the purpose of "immutable _;" above? Why not "i;"?It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i). Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...
Aug 25 2013
On 25/08/13 21:10, monarch_dodra wrote:It never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009Good call. :-)
Aug 25 2013
On 8/25/13 12:46 PM, Joseph Rushton Wakeling wrote:On 25/08/13 21:10, monarch_dodra wrote:I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o). AndreiIt never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009Good call. :-)
Aug 25 2013
Andrei Alexandrescu:I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o).It's not a rare case, I can test this searching in my code for "foreach (_;". And while not essential, anonymous loops are useful because adding an useless variable to your code increases its complexity a little without giving something back. It's just like adding an useless variable inside one of your functions. In C/C++ I look for the "unused variable" warnings (a warning that is currently missing in D). When you define a function, and you don't want to use its second argument, D allows you to not give it a name, despite you are not running out of symbols: void foo(int x, int) {} When I see code with a loop with a variable, to understand the loop I first of all search how such variable is used inside the loop. If the loop variable is anonymous this tells me something important about the loop, and reduces the noise. Bye, bearophile
Aug 25 2013
It's not a rare case, I can test this searching in my code for "foreach (_;".Beside my code, if you want I can show 16 URLs to use cases of "anonymous iteration" in D code. Bye, bearophile
Aug 25 2013
On 25/08/13 23:17, Andrei Alexandrescu wrote:I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o).Every little helps. :-) (Don't know if you get the cultural reference on that, it probably helps to be British...:-)
Aug 25 2013
On Sunday, 25 August 2013 at 21:17:43 UTC, Andrei Alexandrescu wrote:On 8/25/13 12:46 PM, Joseph Rushton Wakeling wrote:I think "the excitement" is an overstatement. The subject comes up every now and then. The "foreach(_)" semantic is getting a bit more common, but it tends to confuse newbies, that ask about it. At that point, I simply point out that there is this ER, and people tend to agree it would be a nice addition. I agree it's not like its a real problem or anything, but it would be nice to have. Is all.On 25/08/13 21:10, monarch_dodra wrote:I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o). AndreiIt never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009Good call. :-)
Aug 26 2013
On Sunday, 25 August 2013 at 18:07:33 UTC, Paul Jurczak wrote: [..]I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why.[..] It seems that LDC optimizer is smarter than I expected and it was able to optimize away comparison on constant elements of array a in my test: int test(alias F)(int N) { int a[10]; int n = 0; foreach (immutable _; 0..N) { a[4] = !a[4]; n += F(a); } return n; } I modified it, so there is no constant elements and all cases of pairwise comparison are exercised: int test(alias F)(int N) { enum L = 10; int a[L]; int n = 0; foreach (int i; 0..N) { auto j = (i%L + L/2)%L; a[j] = !a[j]; n += F(a); } return n; } Here are the results: Xubuntu 13.04 64-bit Core i5 3450S 2.8GHz (3.5GHz turbo): LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline -release 100000 5284 100000 4265 100000 4268 100000 5717 100000 4265 GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check -frename-registers -frelease -O3 100000 4612 100000 5717 100000 5718 100000 4984 100000 11546 DMD64 2.063.2: dmd -O -noboundscheck -inline -release 100000 8350 100000 14615 100000 14597 100000 14270 100000 18509 Windows 7 64-bit Core i5 2500 3.4GHz turbo: DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release 100000 9803 100000 17050 100000 17031 100000 20851 100000 20283 I'm stunned by LDC ability to optimize isPalindrome4 to the best performance level of the whole group. Other compilers placed it at the bottom of performance rank.
Aug 25 2013
Paul Jurczak:I have used _ as variable iteration name to remind the person that is reading the code that loop variable name is not used inside the loop itself. When you have more of those variables you can number them as _1, _2, or use no more than two underscores __. And there the usage of immutable is to disallow the mutation the iteration variable inside the loop. In my opinion, to keep code clean and less bug-prone D should on make foreach indexes immutable on default. This saves you from bugs when you iterate on struct items: struct Foo { int x; } auto foos = [Foo(10), Foo(20)]; foreach (f; foos) f.x++; It's not uncommon in those cases to forget a "ref" and think foos you from this class of bugs. I have asked this in past, but D lacks a "mutable" keyword for the situations when you want to actually mutate the foreach variable, and generally Walter didn't sound very interested on this. So as workaround for this language design problem, it's a good habit to always put a const/immutable on foreach variables, unless you don't wait it, and unless something forces you to not use a const.int test(alias F)(in int nLoops) /*pure nothrow*/ { int[10] a; typeof(return) n = 0; foreach (immutable _; 0 .. nLoops) { a[4] = !a[4]; n += F(a); } return n; }What is the purpose of "immutable _;" above? Why not "i;"?It's explained here: http://blog.knatten.org/2011/11/11/disempower-every-variable/ In short all your variables/function arguments should be tagged as const/immutable unless the semantics of your algorithm needs them to mutate, or unless some limit in D/Phobos forbids you to (like when you have created a lazy range, currently you can't assign it to const and then use it). Bye, bearophilevoid main() { enum size_t nLoops = 60_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { sw.reset; sw.start; immutable n = test!alg(nLoops); sw.stop; writeln(n, " ", sw.peek.msecs); } }Same question here: why immutable?
Aug 25 2013
On 25/08/13 22:22, bearophile wrote:In short all your variables/function arguments should be tagged as const/immutable unless the semantics of your algorithm needs them to mutate, or unless some limit in D/Phobos forbids you to (like when you have created a lazy range, currently you can't assign it to const and then use it).It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j. For example if you consider the loop, for (i = 10; i > 0; --i) { ... } You can't make i immutable for obvious reasons but it's not clear to me how you can get a performant version of that with immutable i. You could do something like [*], foreach (immutable i; iota(1, 11).retro) { ... } ... but that will be slow and inefficient. Ditto for cases like for (i = 0; i < n; i += 2) { ... } which you could write as foreach (immutable i; iota(0, n, 2)) { ... } ... but again, that will be slow compared to the for loop or a foreach across an interval. [* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]
Aug 25 2013
Joseph Rushton Wakeling:It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j. For example if you consider the loop, for (i = 10; i > 0; --i) { ... }Thankfully foreach_reverse was not deprecated: void main() { import std.stdio; for (auto i = 10; i > 0; --i) write(i, " "); writeln; foreach_reverse (immutable i; 1 .. 11) write(i, " "); writeln; }[* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile
Aug 25 2013
On Sunday, 25 August 2013 at 21:01:54 UTC, bearophile wrote:If somebody were to implement this, I would *love* to review it.[* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile
Aug 25 2013
On Sun, Aug 25, 2013 at 11:10:39PM +0200, monarch_dodra wrote:On Sunday, 25 August 2013 at 21:01:54 UTC, bearophile wrote:If somebody were to attempt to implement this, I'd recommend taking this issue into account as well: http://d.puremagic.com/issues/show_bug.cgi?id=10762 (which is a generalization of: http://d.puremagic.com/issues/show_bug.cgi?id=6447) I don't know if the total amount of changes would warrant rewriting iota entirely in order to keep the complexity of the implementation minimal. (The nice thing about D unittests is that rewriting code is much less scary because if the unittests are complete enough, they will catch any regressions right off the bat.) T -- WINDOWS = Will Install Needless Data On Whole System -- CompuManIf somebody were to implement this, I would *love* to review it.[* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile
Aug 25 2013
On 25/08/13 23:01, bearophile wrote:Thankfully foreach_reverse was not deprecated:Really? I thought it was viewed these days as some kind of awful abomination that should never have been in the language. (I'm not sure I understand why, but equally I'm not sure I care to open the can of worms of finding out why...)
Aug 25 2013
On 08/25/2013 10:42 PM, Joseph Rushton Wakeling wrote:foreach (immutable i; iota(0, n, 2)) { ... } ... but again, that will be slow compared to the for loop or a foreach across an interval.Why would that be the case?
Aug 25 2013
Timon Gehr:In the real world, where we live, there is a thing named "abstraction penalty", it's something that asks you to pay something when there is an interface between separated subsystems. To avoid paying that penalty at run-time in your programs you need a smarter compiler, that usually requires more time to be developed, and usually requires more time to do its work (this means slower compilations, see the LDC2 compiler compared to DMD. LDC2 was able to remove most of the run-time penalty, but when it compiles it's slower than DMD). So in the real life we have to take compromises all the time. Bye, bearophile... but again, that will be slow compared to the for loop or a foreach across an interval.Why would that be the case?
Aug 25 2013
On 08/25/2013 11:45 PM, bearophile wrote:Timon Gehr:This is uncalled for.In the real world, where we live, ...... but again, that will be slow compared to the for loop or a foreach across an interval.Why would that be the case?
Aug 25 2013
On 26/08/13 00:36, Timon Gehr wrote:This is uncalled for.I'm sure you didn't mean it, but your original remark, "Why would that be the case?", came across as flip and dismissive about a serious practical problem of working with D, that anyone caring about performance has run into. I don't mean to defend or attack anyone, but I'm not surprised you got a sarcastic response.
Aug 25 2013
On 08/26/2013 12:46 AM, Joseph Rushton Wakeling wrote:On 26/08/13 00:36, Timon Gehr wrote:It was a genuine question. I assumed that the inliner actually works. But you are right, DMD does not even seem to inline iota, and gdc appears to fail to vectorize some iota-loops even though it vectorizes the range-foreach equivalent.This is uncalled for.I'm sure you didn't mean it, but your original remark, "Why would that be the case?", came across as flip and dismissive about a serious practical problem
Aug 25 2013
Timon Gehr:This is uncalled for.I didn't mean to offend, sorry for the unneeded (light) sarcasm. While generally you seem expert in computer science and many other things, in several of your post you seem to dismiss a little too much practical considerations. Bye, bearophile
Aug 25 2013
On 08/26/2013 01:18 AM, bearophile wrote:Timon Gehr:No offence taken.This is uncalled for.I didn't mean to offend,sorry for the unneeded (light) sarcasm.Well, this kind of thing does not propagate too well over the Internet.
Aug 25 2013
On 25/08/13 23:11, Timon Gehr wrote:Why would that be the case?I don't know why it _need_ be the case, but it _is_ the case in practice right now. The precise example I cited of a for() loop with decreasing value, I tried replacing using retro and iota -- and it was significantly slower. Ditto foreach's over iota() -- even though in principle there is no conceptual difference between foreach(i; m .. n) and foreach(i; iota(m, n)), and it ought to be possible for the two to reduce to the same machine code, there is right now a performance hit.
Aug 25 2013