www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Good project: stride() with constant stride value

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Currently we have a very useful stride() function that allows spanning a 
random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3.

I've run some measurements recently and it turns out a 
compile-time-known stride is a lot faster than a variable. So I was 
thinking to improve Stride(R) to take an additional parameter: Stride(R, 
size_t step = 0). If step is 0, then use a runtime-valued stride as 
until now. If nonzero, Stride should use that compile-time step.

Takers?


Andrei
Mar 04 2016
next sibling parent Seb <seb wilzba.ch> writes:
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?


 Andrei
Sounds like fun :) - anything special that I should worry/care about?
Mar 04 2016
prev sibling next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?


 Andrei
Surely after inlining (I mean real inlining, not dmd) it makes no difference, a constant is a constant? I remember doing tests of things like that and finding that not only did it not make a difference to performance, ldc produced near-identical asm either way.
Mar 04 2016
parent reply kinke <noone nowhere.com> writes:
On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes 
 no difference, a constant is a constant?

 I remember doing tests of things like that and finding that not 
 only did it not make a difference to performance, ldc produced 
 near-identical asm either way.
Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
Mar 04 2016
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Friday, 4 March 2016 at 18:40:58 UTC, kinke wrote:
 Then let's not complicate Phobos please. I'm really no friend 
 of special semantics for `step == 0` and stuff like that. Let's 
 keep code as readable and simple as possible, especially in the 
 standard libraries, and let the compilers do their job at 
 optimizing low-level stuff for release builds.
 More templates surely impact compilation speed, and that's 
 where DMD shines.
Stride is already a template. The compiler would just pick the right template to instantiate. Can't imagine that would be a significant impact on compilation speed.
Mar 04 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
kinke <noone nowhere.com> wrote:
 On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes 
 no difference, a constant is a constant?
 
 I remember doing tests of things like that and finding that not 
 only did it not make a difference to performance, ldc produced 
 near-identical asm either way.
Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- Andrei
Mar 04 2016
next sibling parent reply Meta <jared771 gmail.com> writes:
On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu 
wrote:
 This is just speculation. When the stride is passed to larger 
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the 
 stdlib is not a good place for it - everything must be tightly 
 optimized. The value of the project stands. -- Andrei
It's easy to implement but isn't this an optimization that LDC/GDC would already do if the stride is known at compile time? Should we be optimizing the standard library for DMD when (speculation but probably true) it's the only one that can't perform such an optimization?
Mar 04 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/04/2016 04:19 PM, Meta wrote:
 On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu wrote:
 This is just speculation. When the stride is passed to larger
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the stdlib
 is not a good place for it - everything must be tightly optimized. The
 value of the project stands. -- Andrei
It's easy to implement but isn't this an optimization that LDC/GDC would already do if the stride is known at compile time?
Not generally, no. -- Andrei
Mar 04 2016
prev sibling next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Fri, Mar 04, 2016 at 08:14:41PM +0000, Andrei Alexandrescu via Digitalmars-d
wrote:
 kinke <noone nowhere.com> wrote:
 On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes no
 difference, a constant is a constant?
 
 I remember doing tests of things like that and finding that not
 only did it not make a difference to performance, ldc produced
 near-identical asm either way.
Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- Andrei
Why not rather improve dmd optimization, so that such manual optimizations are no longer necessary? T -- English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John Cowan
Mar 04 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?
As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- Andrei
Mar 04 2016
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu 
wrote:
 On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?
As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- Andrei
It seems to me that if the stride is available in the calling scope as usable for a stride template parameter (i.e. as a compile-time value ) then it would also be just as available to the optimiser after the trivial inlining of stride (note not any arbitrarily complex code that contains stride, just stride). Sure, if you nest it away in un-inlineable constructs then it won't be easily optimised, but you wouldn't be able to use it as a template parameter then anyway. Do you have a concrete example where the optimisation(s) you want to occur cannot be done with `stride` as it is?
Mar 04 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/04/2016 09:50 PM, John Colvin wrote:
 On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu wrote:
 On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?
As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- Andrei
It seems to me that if the stride is available in the calling scope as usable for a stride template parameter (i.e. as a compile-time value ) then it would also be just as available to the optimiser after the trivial inlining of stride (note not any arbitrarily complex code that contains stride, just stride). Sure, if you nest it away in un-inlineable constructs then it won't be easily optimised, but you wouldn't be able to use it as a template parameter then anyway.
Again, this is just speculation, and not very credible. Why push it? Of course we could sit all day discussing what a sufficiently smart or sufficiently dumb compiler is and is not liable to do. The matter of fact is (a) yes, specializing a function for a particular parameter value is a known optimization; (b) it is mostly used for virtual methods to avoid vcall overheads, and in particular the compiler won't generate two separate large-ish functions for two separate parameter values.
 Do you have a concrete example where the optimisation(s) you want to
 occur cannot be done with `stride` as it is?
Of course. I've been working on such for a month. I cannot make the samples public for the time being. Point is, there's no need to. Andrei
Mar 05 2016
prev sibling parent kinke <noone nowhere.com> writes:
On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu 
wrote:
 This is just speculation. When the stride is passed to larger 
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the 
 stdlib is not a good place for it - everything must be tightly 
 optimized. The value of the project stands. -- Andrei
With that argument, we might end up with druntime and Phobos completely in manually-tweaked inline assembly to compensate for simpler back-ends. I'm obviously exaggerating, but unless you can show that a compile-time version really provides a significant boost for optimized GDC/LDC builds too, I don't see the project's value.
Mar 05 2016
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?
IMHO, it would be cleaner to make them separate templates so that we don't have to give some special meaning to step == 0. And if it made sense for them to share their implementation, we could still have a helper template that did the step == 0 so that it was hidden from the user. - Jonathan M Davis
Mar 04 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/04/2016 04:32 PM, Jonathan M Davis wrote:
 IMHO, it would be cleaner to make them separate templates so that we
 don't have to give some special meaning to step == 0. And if it made
 sense for them to share their implementation, we could still have a
 helper template that did the step == 0 so that it was hidden from the user.
It's definitely simpler, easier to understand, and less code to specialize for step = 0. I do a lot of that in std.allocator. -- Andrei
Mar 04 2016
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?
This makes me wonder if something like iota would benefit from a similar optimization. It's frequently the case that all of the arguments to iota are known at compile time. Now, iota isn't wrapping another range (unlike stride), so maybe that would make the difference, and iota wouldn't particularly benefit from having its arguments be template arguments, but I have to wonder. - Jonathan M Davis
Mar 04 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/04/2016 11:34 PM, Jonathan M Davis wrote:
 This makes me wonder if something like iota would benefit from a similar
 optimization.
Certainly. -- Andrei
Mar 05 2016
parent Seb <seb wilzba.ch> writes:
On Saturday, 5 March 2016 at 13:15:46 UTC, Andrei Alexandrescu 
wrote:
 On 03/04/2016 11:34 PM, Jonathan M Davis wrote:
 This makes me wonder if something like iota would benefit from 
 a similar
 optimization.
Certainly. -- Andrei
I haven't done much with CTFE yet, but how would one get the type there right? iota accepts any Integral type, so the following seems a bit bulky.. ``` template Iota(T) { auto i(T end, T step = 1)() { return iota(end, step); } auto i(T begin, T end, T step = 1)() { return iota(begin, end, step); } } unittest { import std.algorithm: equal; static assert(Iota!int.i!(0, 10, 1).equal([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])); } ```
Mar 05 2016