digitalmars.D - Good project: stride() with constant stride value

Andrei Alexandrescu (9/9) Mar 04 2016 Currently we have a very useful stride() function that allows spanning a...

Seb (4/15) Mar 04 2016 Sounds like fun :) - anything special that I should worry/care
John Colvin (7/18) Mar 04 2016 Surely after inlining (I mean real inlining, not dmd) it makes no

kinke (8/13) Mar 04 2016 Then let's not complicate Phobos please. I'm really no friend of

jmh530 (4/11) Mar 04 2016 Stride is already a template. The compiler would just pick the
Andrei Alexandrescu (6/22) Mar 04 2016 This is just speculation. When the stride is passed to larger functions ...

Meta (7/12) Mar 04 2016 It's easy to implement but isn't this an optimization that

Andrei Alexandrescu (2/11) Mar 04 2016 Not generally, no. -- Andrei

H. S. Teoh via Digitalmars-d (6/29) Mar 04 2016 Why not rather improve dmd optimization, so that such manual

Andrei Alexandrescu (3/5) Mar 04 2016 As I mentioned, optimizing the use of stride in large (non-inlined)

John Colvin (12/17) Mar 04 2016 It seems to me that if the stride is available in the calling

Andrei Alexandrescu (11/28) Mar 05 2016 Again, this is just speculation, and not very credible. Why push it? Of

kinke (8/13) Mar 05 2016 With that argument, we might end up with druntime and Phobos

Jonathan M Davis (8/18) Mar 04 2016 IMHO, it would be cleaner to make them separate templates so that

Andrei Alexandrescu (3/7) Mar 04 2016 It's definitely simpler, easier to understand, and less code to

Jonathan M Davis (9/19) Mar 04 2016 This makes me wonder if something like iota would benefit from a

Andrei Alexandrescu (2/4) Mar 05 2016 Certainly. -- Andrei

Seb (25/30) Mar 05 2016 I haven't done much with CTFE yet, but how would one get the type

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Currently we have a very useful stride() function that allows spanning a 
random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3.

I've run some measurements recently and it turns out a 
compile-time-known stride is a lot faster than a variable. So I was 
thinking to improve Stride(R) to take an additional parameter: Stride(R, 
size_t step = 0). If step is 0, then use a runtime-valued stride as 
until now. If nonzero, Stride should use that compile-time step.

Takers?


Andrei

Mar 04 2016

Seb <seb wilzba.ch> writes:

On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?


 Andrei

Sounds like fun :) - anything special that I should worry/care 
about?

Mar 04 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?


 Andrei

Surely after inlining (I mean real inlining, not dmd) it makes no 
difference, a constant is a constant?

I remember doing tests of things like that and finding that not 
only did it not make a difference to performance, ldc produced 
near-identical asm either way.

Mar 04 2016

kinke <noone nowhere.com> writes:

On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes 
 no difference, a constant is a constant?

 I remember doing tests of things like that and finding that not 
 only did it not make a difference to performance, ldc produced 
 near-identical asm either way.

Then let's not complicate Phobos please. I'm really no friend of 
special semantics for `step == 0` and stuff like that. Let's keep 
code as readable and simple as possible, especially in the 
standard libraries, and let the compilers do their job at 
optimizing low-level stuff for release builds.
More templates surely impact compilation speed, and that's where 
DMD shines.

Mar 04 2016

jmh530 <john.michael.hall gmail.com> writes:

On Friday, 4 March 2016 at 18:40:58 UTC, kinke wrote:
 Then let's not complicate Phobos please. I'm really no friend 
 of special semantics for `step == 0` and stuff like that. Let's 
 keep code as readable and simple as possible, especially in the 
 standard libraries, and let the compilers do their job at 
 optimizing low-level stuff for release builds.
 More templates surely impact compilation speed, and that's 
 where DMD shines.

Stride is already a template. The compiler would just pick the 
right template to instantiate. Can't imagine that would be a 
significant impact on compilation speed.

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

kinke <noone nowhere.com> wrote:
 On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes 
 no difference, a constant is a constant?
 
 I remember doing tests of things like that and finding that not 
 only did it not make a difference to performance, ldc produced 
 near-identical asm either way.

 
 Then let's not complicate Phobos please. I'm really no friend of 
 special semantics for `step == 0` and stuff like that. Let's keep 
 code as readable and simple as possible, especially in the 
 standard libraries, and let the compilers do their job at 
 optimizing low-level stuff for release builds.
 More templates surely impact compilation speed, and that's where 
 DMD shines.
 

This is just speculation. When the stride is passed to larger functions the
value of the stride is long lost.

I understand the desire for nice and simple code but sadly the stdlib is
not a good place for it - everything must be tightly optimized. The value
of the project stands. -- Andrei

Mar 04 2016

Meta <jared771 gmail.com> writes:

On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu 
wrote:
 This is just speculation. When the stride is passed to larger 
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the 
 stdlib is not a good place for it - everything must be tightly 
 optimized. The value of the project stands. -- Andrei

It's easy to implement but isn't this an optimization that 
LDC/GDC would already do if the stride is known at compile time? 
Should we be optimizing the standard library for DMD when 
(speculation but probably true) it's the only one that can't 
perform such an optimization?

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/04/2016 04:19 PM, Meta wrote:
 On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu wrote:
 This is just speculation. When the stride is passed to larger
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the stdlib
 is not a good place for it - everything must be tightly optimized. The
 value of the project stands. -- Andrei

 It's easy to implement but isn't this an optimization that LDC/GDC would
 already do if the stride is known at compile time?

Not generally, no. -- Andrei

Mar 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Mar 04, 2016 at 08:14:41PM +0000, Andrei Alexandrescu via Digitalmars-d
wrote:
 kinke <noone nowhere.com> wrote:
 On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:
 Surely after inlining (I mean real inlining, not dmd) it makes no
 difference, a constant is a constant?
 
 I remember doing tests of things like that and finding that not
 only did it not make a difference to performance, ldc produced
 near-identical asm either way.

 
 Then let's not complicate Phobos please. I'm really no friend of
 special semantics for `step == 0` and stuff like that. Let's keep
 code as readable and simple as possible, especially in the standard
 libraries, and let the compilers do their job at optimizing
 low-level stuff for release builds.  More templates surely impact
 compilation speed, and that's where DMD shines.
 

 
 This is just speculation. When the stride is passed to larger
 functions the value of the stride is long lost.
 
 I understand the desire for nice and simple code but sadly the stdlib
 is not a good place for it - everything must be tightly optimized. The
 value of the project stands. -- Andrei

Why not rather improve dmd optimization, so that such manual
optimizations are no longer necessary?


T

-- 
English has the lovely word "defenestrate", meaning "to execute by throwing
someone out a window", or more recently "to remove Windows from a computer and
replace it with something useful". :-) -- John Cowan

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?

As I mentioned, optimizing the use of stride in large (non-inlined) 
functions is a tall order. -- Andrei

Mar 04 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu 
wrote:
 On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?

 As I mentioned, optimizing the use of stride in large 
 (non-inlined) functions is a tall order. -- Andrei

It seems to me that if the stride is available in the calling 
scope as usable for a stride template parameter (i.e. as a 
compile-time value ) then it would also be just as available to 
the optimiser after the trivial inlining of stride (note not any 
arbitrarily complex code that contains stride, just stride).

Sure, if you nest it away in un-inlineable constructs then it 
won't be easily optimised, but you wouldn't be able to use it as 
a template parameter then anyway.

Do you have a concrete example where the optimisation(s) you want 
to occur cannot be done with `stride` as it is?

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/04/2016 09:50 PM, John Colvin wrote:
 On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu wrote:
 On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:
 Why not rather improve dmd optimization, so that such manual
 optimizations are no longer necessary?

 As I mentioned, optimizing the use of stride in large (non-inlined)
 functions is a tall order. -- Andrei

 It seems to me that if the stride is available in the calling scope as
 usable for a stride template parameter (i.e. as a compile-time value )
 then it would also be just as available to the optimiser after the
 trivial inlining of stride (note not any arbitrarily complex code that
 contains stride, just stride).

 Sure, if you nest it away in un-inlineable constructs then it won't be
 easily optimised, but you wouldn't be able to use it as a template
 parameter then anyway.

Again, this is just speculation, and not very credible. Why push it? Of 
course we could sit all day discussing what a sufficiently smart or 
sufficiently dumb compiler is and is not liable to do. The matter of 
fact is (a) yes, specializing a function for a particular parameter 
value is a known optimization; (b) it is mostly used for virtual methods 
to avoid vcall overheads, and in particular the compiler won't generate 
two separate large-ish functions for two separate parameter values.

 Do you have a concrete example where the optimisation(s) you want to
 occur cannot be done with `stride` as it is?

Of course. I've been working on such for a month. I cannot make the 
samples public for the time being. Point is, there's no need to.


Andrei

Mar 05 2016

kinke <noone nowhere.com> writes:

On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu 
wrote:
 This is just speculation. When the stride is passed to larger 
 functions the value of the stride is long lost.

 I understand the desire for nice and simple code but sadly the 
 stdlib is not a good place for it - everything must be tightly 
 optimized. The value of the project stands. -- Andrei

With that argument, we might end up with druntime and Phobos 
completely in manually-tweaked inline assembly to compensate for 
simpler back-ends. I'm obviously exaggerating, but unless you can 
show that a compile-time version really provides a significant 
boost for optimized GDC/LDC builds too, I don't see the project's 
value.

Mar 05 2016

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?

IMHO, it would be cleaner to make them separate templates so that 
we don't have to give some special meaning to step == 0. And if 
it made sense for them to share their implementation, we could 
still have a helper template that did the step == 0 so that it 
was hidden from the user.

- Jonathan M Davis

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/04/2016 04:32 PM, Jonathan M Davis wrote:
 IMHO, it would be cleaner to make them separate templates so that we
 don't have to give some special meaning to step == 0. And if it made
 sense for them to share their implementation, we could still have a
 helper template that did the step == 0 so that it was hidden from the user.

It's definitely simpler, easier to understand, and less code to 
specialize for step = 0. I do a lot of that in std.allocator. -- Andrei

Mar 04 2016

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu 
wrote:
 Currently we have a very useful stride() function that allows 
 spanning a random access range with a specified step, e.g. 0, 
 3, 6, 9, ... for step 3.

 I've run some measurements recently and it turns out a 
 compile-time-known stride is a lot faster than a variable. So I 
 was thinking to improve Stride(R) to take an additional 
 parameter: Stride(R, size_t step = 0). If step is 0, then use a 
 runtime-valued stride as until now. If nonzero, Stride should 
 use that compile-time step.

 Takers?

This makes me wonder if something like iota would benefit from a 
similar optimization. It's frequently the case that all of the 
arguments to iota are known at compile time. Now, iota isn't 
wrapping another range (unlike stride), so maybe that would make 
the difference, and iota wouldn't particularly benefit from 
having its arguments be template arguments, but I have to wonder.

- Jonathan M Davis

Mar 04 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/04/2016 11:34 PM, Jonathan M Davis wrote:
 This makes me wonder if something like iota would benefit from a similar
 optimization.

Certainly. -- Andrei

Mar 05 2016

Seb <seb wilzba.ch> writes:

On Saturday, 5 March 2016 at 13:15:46 UTC, Andrei Alexandrescu 
wrote:
 On 03/04/2016 11:34 PM, Jonathan M Davis wrote:
 This makes me wonder if something like iota would benefit from 
 a similar
 optimization.

 Certainly. -- Andrei

I haven't done much with CTFE yet, but how would one get the type 
there right?
iota accepts any Integral type, so the following seems a bit 
bulky..

```
template Iota(T)
{
     auto i(T end, T step = 1)()
     {
         return iota(end, step);
     }
     auto i(T begin, T end, T step = 1)()
     {
         return iota(begin, end, step);
     }
}

unittest
{
     import std.algorithm: equal;
     static assert(Iota!int.i!(0, 10, 1).equal([0, 1, 2, 3, 4, 5, 
6, 7, 8, 9]));
}
```

Mar 05 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Good project: stride() with constant stride value