digitalmars.D - Good project: stride() with constant stride value
- Andrei Alexandrescu (9/9) Mar 04 2016 Currently we have a very useful stride() function that allows spanning a...
- Seb (4/15) Mar 04 2016 Sounds like fun :) - anything special that I should worry/care
- John Colvin (7/18) Mar 04 2016 Surely after inlining (I mean real inlining, not dmd) it makes no
- kinke (8/13) Mar 04 2016 Then let's not complicate Phobos please. I'm really no friend of
- jmh530 (4/11) Mar 04 2016 Stride is already a template. The compiler would just pick the
- Andrei Alexandrescu (6/22) Mar 04 2016 This is just speculation. When the stride is passed to larger functions ...
- Meta (7/12) Mar 04 2016 It's easy to implement but isn't this an optimization that
- Andrei Alexandrescu (2/11) Mar 04 2016 Not generally, no. -- Andrei
- H. S. Teoh via Digitalmars-d (6/29) Mar 04 2016 Why not rather improve dmd optimization, so that such manual
- Andrei Alexandrescu (3/5) Mar 04 2016 As I mentioned, optimizing the use of stride in large (non-inlined)
- John Colvin (12/17) Mar 04 2016 It seems to me that if the stride is available in the calling
- Andrei Alexandrescu (11/28) Mar 05 2016 Again, this is just speculation, and not very credible. Why push it? Of
- kinke (8/13) Mar 05 2016 With that argument, we might end up with druntime and Phobos
- Jonathan M Davis (8/18) Mar 04 2016 IMHO, it would be cleaner to make them separate templates so that
- Andrei Alexandrescu (3/7) Mar 04 2016 It's definitely simpler, easier to understand, and less code to
- Jonathan M Davis (9/19) Mar 04 2016 This makes me wonder if something like iota would benefit from a
- Andrei Alexandrescu (2/4) Mar 05 2016 Certainly. -- Andrei
- Seb (25/30) Mar 05 2016 I haven't done much with CTFE yet, but how would one get the type
Currently we have a very useful stride() function that allows spanning a random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3. I've run some measurements recently and it turns out a compile-time-known stride is a lot faster than a variable. So I was thinking to improve Stride(R) to take an additional parameter: Stride(R, size_t step = 0). If step is 0, then use a runtime-valued stride as until now. If nonzero, Stride should use that compile-time step. Takers? Andrei
Mar 04 2016
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu wrote:Currently we have a very useful stride() function that allows spanning a random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3. I've run some measurements recently and it turns out a compile-time-known stride is a lot faster than a variable. So I was thinking to improve Stride(R) to take an additional parameter: Stride(R, size_t step = 0). If step is 0, then use a runtime-valued stride as until now. If nonzero, Stride should use that compile-time step. Takers? AndreiSounds like fun :) - anything special that I should worry/care about?
Mar 04 2016
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu wrote:Currently we have a very useful stride() function that allows spanning a random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3. I've run some measurements recently and it turns out a compile-time-known stride is a lot faster than a variable. So I was thinking to improve Stride(R) to take an additional parameter: Stride(R, size_t step = 0). If step is 0, then use a runtime-valued stride as until now. If nonzero, Stride should use that compile-time step. Takers? AndreiSurely after inlining (I mean real inlining, not dmd) it makes no difference, a constant is a constant? I remember doing tests of things like that and finding that not only did it not make a difference to performance, ldc produced near-identical asm either way.
Mar 04 2016
On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:Surely after inlining (I mean real inlining, not dmd) it makes no difference, a constant is a constant? I remember doing tests of things like that and finding that not only did it not make a difference to performance, ldc produced near-identical asm either way.Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
Mar 04 2016
On Friday, 4 March 2016 at 18:40:58 UTC, kinke wrote:Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.Stride is already a template. The compiler would just pick the right template to instantiate. Can't imagine that would be a significant impact on compilation speed.
Mar 04 2016
kinke <noone nowhere.com> wrote:On Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- AndreiSurely after inlining (I mean real inlining, not dmd) it makes no difference, a constant is a constant? I remember doing tests of things like that and finding that not only did it not make a difference to performance, ldc produced near-identical asm either way.Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
Mar 04 2016
On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu wrote:This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- AndreiIt's easy to implement but isn't this an optimization that LDC/GDC would already do if the stride is known at compile time? Should we be optimizing the standard library for DMD when (speculation but probably true) it's the only one that can't perform such an optimization?
Mar 04 2016
On 03/04/2016 04:19 PM, Meta wrote:On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu wrote:Not generally, no. -- AndreiThis is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- AndreiIt's easy to implement but isn't this an optimization that LDC/GDC would already do if the stride is known at compile time?
Mar 04 2016
On Fri, Mar 04, 2016 at 08:14:41PM +0000, Andrei Alexandrescu via Digitalmars-d wrote:kinke <noone nowhere.com> wrote:Why not rather improve dmd optimization, so that such manual optimizations are no longer necessary? T -- English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John CowanOn Friday, 4 March 2016 at 17:49:09 UTC, John Colvin wrote:This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- AndreiSurely after inlining (I mean real inlining, not dmd) it makes no difference, a constant is a constant? I remember doing tests of things like that and finding that not only did it not make a difference to performance, ldc produced near-identical asm either way.Then let's not complicate Phobos please. I'm really no friend of special semantics for `step == 0` and stuff like that. Let's keep code as readable and simple as possible, especially in the standard libraries, and let the compilers do their job at optimizing low-level stuff for release builds. More templates surely impact compilation speed, and that's where DMD shines.
Mar 04 2016
On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:Why not rather improve dmd optimization, so that such manual optimizations are no longer necessary?As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- Andrei
Mar 04 2016
On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu wrote:On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:It seems to me that if the stride is available in the calling scope as usable for a stride template parameter (i.e. as a compile-time value ) then it would also be just as available to the optimiser after the trivial inlining of stride (note not any arbitrarily complex code that contains stride, just stride). Sure, if you nest it away in un-inlineable constructs then it won't be easily optimised, but you wouldn't be able to use it as a template parameter then anyway. Do you have a concrete example where the optimisation(s) you want to occur cannot be done with `stride` as it is?Why not rather improve dmd optimization, so that such manual optimizations are no longer necessary?As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- Andrei
Mar 04 2016
On 03/04/2016 09:50 PM, John Colvin wrote:On Friday, 4 March 2016 at 23:33:40 UTC, Andrei Alexandrescu wrote:Again, this is just speculation, and not very credible. Why push it? Of course we could sit all day discussing what a sufficiently smart or sufficiently dumb compiler is and is not liable to do. The matter of fact is (a) yes, specializing a function for a particular parameter value is a known optimization; (b) it is mostly used for virtual methods to avoid vcall overheads, and in particular the compiler won't generate two separate large-ish functions for two separate parameter values.On 03/04/2016 04:19 PM, H. S. Teoh via Digitalmars-d wrote:It seems to me that if the stride is available in the calling scope as usable for a stride template parameter (i.e. as a compile-time value ) then it would also be just as available to the optimiser after the trivial inlining of stride (note not any arbitrarily complex code that contains stride, just stride). Sure, if you nest it away in un-inlineable constructs then it won't be easily optimised, but you wouldn't be able to use it as a template parameter then anyway.Why not rather improve dmd optimization, so that such manual optimizations are no longer necessary?As I mentioned, optimizing the use of stride in large (non-inlined) functions is a tall order. -- AndreiDo you have a concrete example where the optimisation(s) you want to occur cannot be done with `stride` as it is?Of course. I've been working on such for a month. I cannot make the samples public for the time being. Point is, there's no need to. Andrei
Mar 05 2016
On Friday, 4 March 2016 at 20:14:41 UTC, Andrei Alexandrescu wrote:This is just speculation. When the stride is passed to larger functions the value of the stride is long lost. I understand the desire for nice and simple code but sadly the stdlib is not a good place for it - everything must be tightly optimized. The value of the project stands. -- AndreiWith that argument, we might end up with druntime and Phobos completely in manually-tweaked inline assembly to compensate for simpler back-ends. I'm obviously exaggerating, but unless you can show that a compile-time version really provides a significant boost for optimized GDC/LDC builds too, I don't see the project's value.
Mar 05 2016
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu wrote:Currently we have a very useful stride() function that allows spanning a random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3. I've run some measurements recently and it turns out a compile-time-known stride is a lot faster than a variable. So I was thinking to improve Stride(R) to take an additional parameter: Stride(R, size_t step = 0). If step is 0, then use a runtime-valued stride as until now. If nonzero, Stride should use that compile-time step. Takers?IMHO, it would be cleaner to make them separate templates so that we don't have to give some special meaning to step == 0. And if it made sense for them to share their implementation, we could still have a helper template that did the step == 0 so that it was hidden from the user. - Jonathan M Davis
Mar 04 2016
On 03/04/2016 04:32 PM, Jonathan M Davis wrote:IMHO, it would be cleaner to make them separate templates so that we don't have to give some special meaning to step == 0. And if it made sense for them to share their implementation, we could still have a helper template that did the step == 0 so that it was hidden from the user.It's definitely simpler, easier to understand, and less code to specialize for step = 0. I do a lot of that in std.allocator. -- Andrei
Mar 04 2016
On Friday, 4 March 2016 at 16:45:42 UTC, Andrei Alexandrescu wrote:Currently we have a very useful stride() function that allows spanning a random access range with a specified step, e.g. 0, 3, 6, 9, ... for step 3. I've run some measurements recently and it turns out a compile-time-known stride is a lot faster than a variable. So I was thinking to improve Stride(R) to take an additional parameter: Stride(R, size_t step = 0). If step is 0, then use a runtime-valued stride as until now. If nonzero, Stride should use that compile-time step. Takers?This makes me wonder if something like iota would benefit from a similar optimization. It's frequently the case that all of the arguments to iota are known at compile time. Now, iota isn't wrapping another range (unlike stride), so maybe that would make the difference, and iota wouldn't particularly benefit from having its arguments be template arguments, but I have to wonder. - Jonathan M Davis
Mar 04 2016
On 03/04/2016 11:34 PM, Jonathan M Davis wrote:This makes me wonder if something like iota would benefit from a similar optimization.Certainly. -- Andrei
Mar 05 2016
On Saturday, 5 March 2016 at 13:15:46 UTC, Andrei Alexandrescu wrote:On 03/04/2016 11:34 PM, Jonathan M Davis wrote:I haven't done much with CTFE yet, but how would one get the type there right? iota accepts any Integral type, so the following seems a bit bulky.. ``` template Iota(T) { auto i(T end, T step = 1)() { return iota(end, step); } auto i(T begin, T end, T step = 1)() { return iota(begin, end, step); } } unittest { import std.algorithm: equal; static assert(Iota!int.i!(0, 10, 1).equal([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])); } ```This makes me wonder if something like iota would benefit from a similar optimization.Certainly. -- Andrei
Mar 05 2016