digitalmars.D - Divide & Conquer divides, but doesn't conquer

Andrei Alexandrescu (67/67) May 24 2020 Consider staticMap's current version (there are a few upcoming changes

Timon Gehr (4/8) May 24 2020 Assuming the cost of a template instantiation is linear in the number of...

Andrei Alexandrescu (45/54) May 24 2020 Of course. However, I'm assuming most of a template instantiation's cost...

Timon Gehr (3/6) May 24 2020 I don't think this is a realistic assumption with the current compiler

Andrei Alexandrescu (23/23) May 24 2020 Another one:

Steven Schveighoffer (7/32) May 24 2020 Because it doesn't work for large sets unless you do this.

Andrei Alexandrescu (2/38) May 24 2020 I see, thanks.

Walter Bright (3/4) May 25 2020 So we don't waste time on this point again:

Adam D. Ruppe (43/43) May 24 2020 I ran a test to compare the two implementations. First was

Walter Bright (3/5) May 24 2020 Thank you. This is good information. I'm curious how the new staticMap c...

Adam D. Ruppe (8/11) May 25 2020 Using dmd master: 1.8s 1.2G

Walter Bright (2/4) May 25 2020 A 15% speed win is a big deal! Especially for such a simple change.

Stefan Koch (2/7) May 25 2020 How much of a deal is 100% ? Enough to talk about including ... ?

Nick Treleaven (3/5) May 27 2020 How would a type function compare to S... ? Ideally we'd just use

Stefan Koch (4/9) May 27 2020 It's gonna be a while until they can be used.

Stefan Koch (2/12) May 27 2020 Assuming you mean the benchmark I posted.

Andrei Alexandrescu (4/12) May 25 2020 This is reasonable. I've seen a *lot* worse in optimizing libraries. A
Stefan Koch (28/34) May 25 2020 This has brought down the compile time of my test indeed.

Adam D. Ruppe (3/6) May 25 2020 Does that include the other shortcut prs? I wonder if they would

Stefan Koch (4/10) May 25 2020 Open it I suspect.

FeepingCreature (23/31) May 24 2020 Could you also try with this version, please?

Max Samukha (4/8) May 25 2020 How could you miss the state-of-the-art?) Preallocate the result

Andrei Alexandrescu (2/13) May 25 2020 static foreach is quadratic? Is this a matter of principle, or QoI?

Stefan Koch (8/22) May 25 2020 Yes it is.

Stefan Koch (2/3) May 25 2020 Could have been 19 ?
Andrei Alexandrescu (2/24) May 25 2020 How do you mean that? static foreach does not create a scope.

Stefan Koch (11/40) May 25 2020 The scope I am talking about is what dmd calls Scope.

Adam D. Ruppe (12/14) May 25 2020 I have a hard time seeing why it HAS to be this way. Regular
Timon Gehr (13/29) May 25 2020 QoI. The original `static foreach` implementation in my compiler

Stefan Koch (2/9) May 25 2020 What do you do about the scopes?

Tove (2/13) May 25 2020 Can't they be elided if the body contains no declarations?
Timon Gehr (2/14) May 25 2020 Nothing special. Why would I have to?

Stefan Koch (3/17) May 25 2020 You need a manifest constant for you IV (induction variable), no?

Timon Gehr (3/23) May 25 2020 Sure, so I create one forwarding scope per iteration. This is not

Stefan Koch (4/21) May 25 2020 Since Forwarding scopes where not in the language before you

Timon Gehr (8/30) May 25 2020 It's a scope that forwards new symbols to its parent scope on insertion....

Stefan Koch (4/20) May 25 2020 Thank a lot for the explanation!
Adam D. Ruppe (7/10) May 25 2020 That would actually be really useful in some places. Like

Stefan Koch (4/14) May 25 2020 you want type functions :)

Andrei Alexandrescu (3/35) May 25 2020 Thank you! It would be great if you could contribute or shepherd a
Walter Bright (3/15) May 25 2020 This is worth looking in to. I've had customers tell me they don't use s...

Max Samukha (5/9) May 25 2020 State-of-the-art is to preallocate the result string and use the

Adam D. Ruppe (30/32) May 25 2020 In this case, it doesn't matter much because the string is

Max Samukha (4/28) May 25 2020 Yes, I overlooked the enum. My version of static map just calls

Adam D. Ruppe (11/15) May 25 2020 lol, also called it staticMapHelper when doing this myself!

Adam D. Ruppe (9/10) May 25 2020 That was using the new dmd master, more specifically it was 1.6s
Andrei Alexandrescu (2/17) May 25 2020 So it seems a mixin-based version + Max's precalculated length = win!

Max Samukha (3/5) May 25 2020 Precalculated length is Adam's, and the link is to his blog. I

Adam D. Ruppe (6/7) May 25 2020 I stole it from Stefan too IIRC.

Max Samukha (4/11) May 26 2020 Preallocating buffers for CTFE is not a new idea, but the string

Andrei Alexandrescu (10/43) May 25 2020 Simpler:

FeepingCreature (7/47) May 24 2020 To add to the good previous points: the divide-and-conquer

Andrei Alexandrescu (3/55) May 25 2020 Not sure I understand. A,...,H are assumed to be distinct. The number of...

Paul Backus (16/26) May 25 2020 I think the idea is that you can potentially re-use template

johnsmith101 (2/57) May 26 2020 Thank u so much for taking time to help.

Stefan Koch (3/6) May 25 2020 Manu's DIP does solve this problem.

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Consider staticMap's current version (there are a few upcoming changes 
that don't change the point):

template staticMap(alias F, T...)
{
     static if (T.length == 0)
     {
         alias staticMap = AliasSeq!();
     }
     else static if (T.length == 1)
     {
         alias staticMap = AliasSeq!(F!(T[0]));
     }
     else
     {
         alias staticMap =
             AliasSeq!(
                 staticMap!(F, T[ 0  .. $/2]),
                 staticMap!(F, T[$/2 ..  $ ]));
     }
}

In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion 
will instantiate:

staticMap!(pred, A, B, C, D)
staticMap!(pred, A, B)
staticMap!(pred, A)
staticMap!(pred, B)
staticMap!(pred, C, D)
staticMap!(pred, C)
staticMap!(pred, D)
staticMap!(pred, E. F, G, H)
staticMap!(pred, E, F)
staticMap!(pred, E)
staticMap!(pred, F)
staticMap!(pred, G, H)
staticMap!(pred, G)
staticMap!(pred, H)

Total: 15 instantiations including original. This is because a full tree 
with 8 leaves is being created. Generally, a tree with N leaves has 
about 2N - 1 nodes total (exactly so when N is a power of 2).

The more naive version would simply use linear decomposition:

template staticMap(alias F, T...)
{
     static if (T.length == 0)
     {
         alias staticMap = AliasSeq!();
     }
     else
     {
         alias staticMap =
             AliasSeq!(
                 F!(T[0]),
                 staticMap!(F, T[1 .. $]));
     }
}

There' no more need to handle the 1-element case, which is already a 
plus. But that's not the topic here. The instantiations created are:

staticMap!(pred, B, C, D, E, F, G, H)
staticMap!(pred, C, D, E, F, G, H)
staticMap!(pred, D, E, F, G, H)
staticMap!(pred, E, F, G, H)
staticMap!(pred, F, G, H)
staticMap!(pred, G, H)
staticMap!(pred, H)

Total: 8 instantiations including original. That's about half the 
current number of instantiations.

So it seems divide et impera doesn't quite help with certain recursive 
templates.

May 24 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 01:40, Andrei Alexandrescu wrote:
 ...
 
 So it seems divide et impera doesn't quite help with certain recursive 
 templates.

Assuming the cost of a template instantiation is linear in the number of 
template arguments, the first version has running time Θ(n log n) while 
the second version has running time Θ(n²).

May 24 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/24/20 8:44 PM, Timon Gehr wrote:
 On 25.05.20 01:40, Andrei Alexandrescu wrote:
 ...

 So it seems divide et impera doesn't quite help with certain recursive 
 templates.

 
 Assuming the cost of a template instantiation is linear in the number of 
 template arguments, the first version has running time Θ(n log n) while 
 the second version has running time Θ(n²).

Of course. However, I'm assuming most of a template instantiation's cost 
is fixed.

There are other algorithms looking at in std.meta. Consider Reverse:

template Reverse(TList...)
{
     static if (TList.length <= 1)
     {
         alias Reverse = TList;
     }
     else
     {
         alias Reverse =
             AliasSeq!(
                 Reverse!(TList[$/2 ..  $ ]),
                 Reverse!(TList[ 0  .. $/2]));
     }
}

The recurrence formula for complexity is:

T(0) = k;
T(1) = k;
T(n) = 2 * T(n/2) + k

Again, I assume instantiating AliasSeq with two lists of any lengths has 
the same cost k. That makes complexity linear. So is the complexity of 
the simpler implementation:

template Reverse(TList...)
{
     static if (TList.length <= 1)
     {
         alias Reverse = TList;
     }
     else
     {
         alias Reverse =
             AliasSeq!(Reverse!(TList[1  .. $]), TList[0]);
     }
}

But this insantiates half the templates, so it goes easier on resources. 
This makes the use of the halving strategy not only an unnecessary 
distraction, but a less efficient choice.

There are other unnecessary (and/or unnecessarily baroque) artifacts in 
std.meta, such as EraseAllN. Far as I can tell it's unnecessary because 
NoDuplicates can be implemented without it (and if I'm not wrong, at the 
same complexity).

std.meta is due for a serious review.

May 24 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 03:25, Andrei Alexandrescu wrote:
 
 Again, I assume instantiating AliasSeq with two lists of any lengths has 
 the same cost k.

I don't think this is a realistic assumption with the current compiler 
implementation. Expression tuples are stored as arrays.

May 24 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Another one:

template Filter(alias pred, TList...)
{
     static if (TList.length == 0)
     {
         alias Filter = AliasSeq!();
     }
     else static if (TList.length == 1)
     {
         static if (pred!(TList[0]))
             alias Filter = AliasSeq!(TList[0]);
         else
             alias Filter = AliasSeq!();
     }
     else
     {
         alias Filter =
             AliasSeq!(
                 Filter!(pred, TList[ 0  .. $/2]),
                 Filter!(pred, TList[$/2 ..  $ ]));
     }
}

Why cut the problem size in half if it doesn't improve complexity?

May 24 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:
 Another one:
 
 template Filter(alias pred, TList...)
 {
      static if (TList.length == 0)
      {
          alias Filter = AliasSeq!();
      }
      else static if (TList.length == 1)
      {
          static if (pred!(TList[0]))
              alias Filter = AliasSeq!(TList[0]);
          else
              alias Filter = AliasSeq!();
      }
      else
      {
          alias Filter =
              AliasSeq!(
                  Filter!(pred, TList[ 0  .. $/2]),
                  Filter!(pred, TList[$/2 ..  $ ]));
      }
 }
 
 Why cut the problem size in half if it doesn't improve complexity?

Because it doesn't work for large sets unless you do this.

If you do a "linear" recursive template, it fails after about 1000 
levels. A divide-and-conquer will not get that deep.

So in essence, it's a solution for a compiler restriction, not for 
performance.

-Steve

May 24 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/24/20 10:32 PM, Steven Schveighoffer wrote:
 On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:
 Another one:

 template Filter(alias pred, TList...)
 {
      static if (TList.length == 0)
      {
          alias Filter = AliasSeq!();
      }
      else static if (TList.length == 1)
      {
          static if (pred!(TList[0]))
              alias Filter = AliasSeq!(TList[0]);
          else
              alias Filter = AliasSeq!();
      }
      else
      {
          alias Filter =
              AliasSeq!(
                  Filter!(pred, TList[ 0  .. $/2]),
                  Filter!(pred, TList[$/2 ..  $ ]));
      }
 }

 Why cut the problem size in half if it doesn't improve complexity?

 
 Because it doesn't work for large sets unless you do this.
 
 If you do a "linear" recursive template, it fails after about 1000 
 levels. A divide-and-conquer will not get that deep.
 
 So in essence, it's a solution for a compiler restriction, not for 
 performance.
 
 -Steve

I see, thanks.

May 24 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2020 8:10 PM, Andrei Alexandrescu wrote:
 I see, thanks.

So we don't waste time on this point again:

https://github.com/dlang/phobos/pull/7497

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

I ran a test to compare the two implementations. First was 
comparing 10,000 references to the same instance.

0.47user 0.11system 0:00.58elapsed 100%CPU (0avgtext+0avgdata 
399688maxresident)k

No significant difference between implementations. This is 
expected since it caches based on arguments.

Second test compared Phobos' impl vs the simple impl but this 
time with 10,000 different random instantiations: 
http://arsdnet.net/dcode/ala.d

I'd comment one impl out, uncomment the other, rerun 
/usr/bin/test dmd ala.d as well as dmd -c ala.d.

                simple impl     phobos impl
compile time:         3.1s            3.7s
dmd peak RAM:         1.5G            1.9G
  binary size:         6.2M            6.2M

The generated files (bin and object) are the same between them, 
but the simple implementation beats the Phobos impl in both 
compile time and dmd memory.

Note that all the user-visible static maps had 9 arguments, but 
the two separate implementations do different things.

It is possible my test is flawed, I encourage you to create your 
own and see what happens.


I also tried a string mixin version inside the static map 
template and it took 4.0s and used 1.2G of memory. Win on RAM, 
but lost on speed. (It was not the CTFE that hit it - that got 
cached - it was the mixin being run on each unique argument list).

And a hand-written static map because the number of arguments is 
known ahead of time: 1.4s compile, 0.7G memory. Absolutely 
crushed it. It might be worth seeing what lengths of static map 
are most common in the wild and just having hand expansions. 
Adding static if has a cost though (upped my test to 1.5s/720MB). 
But not as high as template constraints (1.5s/740MB). But this is 
all getting increasingly artificial.

So I am willing to say number of template instantiations is the 
probable key metric.


All this was done on a stock DMD64 D Compiler v2.091.1 on linux. 
Several enhancements were merged to dmd master recently that may 
change the results btw.

quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G

hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M

Decent time improvement across the board... but the binary got 
bigger at 6.4M :( (prolly druntime's fault more than anything 
else but idk, im not focusing on that rn)

May 24 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:
 I ran a test to compare the two implementations. First was comparing 10,000 
 references to the same instance.

Thank you. This is good information. I'm curious how the new staticMap compares:

https://github.com/dlang/phobos/pull/7490

May 24 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 05:55:07 UTC, Walter Bright wrote:
 Thank you. This is good information. I'm curious how the new 
 staticMap compares:

 https://github.com/dlang/phobos/pull/7490

Using dmd master: 1.8s 1.2G
Using released dmd: 2.1s, 1.1G

(I think something in dmd master has a higher fixed memory cost, 
so remember to compare apples and apples.)

So yes, the static if version is an improvement over what it has 
now, at least for these cases. We should probably make another 
test rig to see more variety...

May 25 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 5/25/2020 6:27 AM, Adam D. Ruppe wrote:
 Using dmd master: 1.8s 1.2G
 Using released dmd: 2.1s, 1.1G

A 15% speed win is a big deal! Especially for such a simple change.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 21:25:33 UTC, Walter Bright wrote:
 On 5/25/2020 6:27 AM, Adam D. Ruppe wrote:
 Using dmd master: 1.8s 1.2G
 Using released dmd: 2.1s, 1.1G

 A 15% speed win is a big deal! Especially for such a simple 
 change.

How much of a deal is 100% ? Enough to talk about including ... ?

May 25 2020

Nick Treleaven <nick geany.org> writes:

On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:
 How much of a deal is 100% ? Enough to talk about including ... 
 ?

How would a type function compare to S... ? Ideally we'd just use 
a type function.

May 27 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 27 May 2020 at 19:05:51 UTC, Nick Treleaven wrote:
 On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:
 How much of a deal is 100% ? Enough to talk about including 
 ... ?

 How would a type function compare to S... ? Ideally we'd just 
 use a type function.

It's gonna be a while until they can be used.
For that usecase you couldn't use a typefunction at all.
type functions don't work on values.

May 27 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 27 May 2020 at 20:20:37 UTC, Stefan Koch wrote:
 On Wednesday, 27 May 2020 at 19:05:51 UTC, Nick Treleaven wrote:
 On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:
 How much of a deal is 100% ? Enough to talk about including 
 ... ?

 How would a type function compare to S... ? Ideally we'd just 
 use a type function.

 It's gonna be a while until they can be used.
 For that usecase you couldn't use a typefunction at all.
 type functions don't work on values.

Assuming you mean the benchmark I posted.

May 27 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 1:55 AM, Walter Bright wrote:
 On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:
 I ran a test to compare the two implementations. First was comparing 
 10,000 references to the same instance.

 
 Thank you. This is good information. I'm curious how the new staticMap 
 compares:
 
 https://github.com/dlang/phobos/pull/7490

This is reasonable. I've seen a *lot* worse in optimizing libraries. A 
*LOT*. Another possible improvement: binary search through the special 
cases.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 05:55:07 UTC, Walter Bright wrote:
 On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:
 I ran a test to compare the two implementations. First was 
 comparing 10,000 references to the same instance.

 Thank you. This is good information. I'm curious how the new 
 staticMap compares:

 https://github.com/dlang/phobos/pull/7490

This has brought down the compile time of my test indeed.
Almost but not quite down to `...`

Let me post my test.

I measured that for my test your staticMap is 1.2x faster than 
the regular the version before the pull.

--
   './dmd.sh sm.d -c -version=Walter' ran
     1.21 ± 0.10 times faster than './dmd.sh sm.d -c'
--


I measured that our `...` is about 1.5x faster than  than your 
improved staticMap

--
   './dmd.sh sm.d -c -version=DotDotDot' ran
     1.49 ± 0.18 times faster than './dmd.sh sm.d -c 
-version=Walter'
--

Which should make our version about 2 times faster than the 
baseline
and indeed

--
   './dmd.sh sm.d -c -version=DotDotDot' ran
     2.05 ± 0.23 times faster than './dmd.sh sm.d -c'
--



The source used file used for the benchmark is here:
https://gist.github.com/UplinkCoder/9772f27cf088f08901b44fb57935aaaf

The dmd used is at commit 2f23e3348e9a927c025637a4852d4d77beb997d3
which is on manu's fork of the compiler.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 15:59:43 UTC, Stefan Koch wrote:
 The dmd used is at commit 
 2f23e3348e9a927c025637a4852d4d77beb997d3
 which is on manu's fork of the compiler.

Does that include the other shortcut prs? I wonder if they would 
further open or close the gap

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 20:57:07 UTC, Adam D. Ruppe wrote:
 On Monday, 25 May 2020 at 15:59:43 UTC, Stefan Koch wrote:
 The dmd used is at commit 
 2f23e3348e9a927c025637a4852d4d77beb997d3
 which is on manu's fork of the compiler.

 Does that include the other shortcut prs? I wonder if they 
 would further open or close the gap

Open it I suspect.
The `...` version would be around 20x faster if it could create 
the new sequence as one thing.

May 25 2020

FeepingCreature <feepingcreature gmail.com> writes:

On Monday, 25 May 2020 at 03:19:51 UTC, Adam D. Ruppe wrote:
 All this was done on a stock DMD64 D Compiler v2.091.1 on 
 linux. Several enhancements were merged to dmd master recently 
 that may change the results btw.

 quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G

 hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M

 Decent time improvement across the board... but the binary got 
 bigger at 6.4M :( (prolly druntime's fault more than anything 
 else but idk, im not focusing on that rn)

Could you also try with this version, please?

template staticMap(alias F, T...)
{
     mixin(staticMapHelper!(T.length));
}

private enum staticMapHelper(size_t length) = 
staticMapHelperGen!(length);

private string staticMapHelperGen(size_t length)()
{
     string res = "alias staticMap = AliasSeq!(";
     static foreach (i; 0 .. length)
     {
         if (i) res ~= ", ";
         res ~= "F!(T[" ~ i.stringof ~ "])";
     }
     res ~= ");";
     return res;
}

static foreach and stringof were used so I didn't have to pull in 
`format` or `to!string`, which had issues in std.meta.

I'm seeing some improvement over 2.086 at least; it should be 
about equivalent to the hand-unrolled version in master.

May 24 2020

Max Samukha <maxsamukha gmail.com> writes:

On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

 static foreach and stringof were used so I didn't have to pull 
 in `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be 
 about equivalent to the hand-unrolled version in master.

How could you miss the state-of-the-art?) Preallocate the result 
string and use the string counter hack!
http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach

May 25 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:
 
 static foreach and stringof were used so I didn't have to pull in 
 `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be about 
 equivalent to the hand-unrolled version in master.

 
 How could you miss the state-of-the-art?) Preallocate the result string 
 and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm
-and-static-foreach 

static foreach is quadratic? Is this a matter of principle, or QoI?

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:
 On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:
 
 static foreach and stringof were used so I didn't have to 
 pull in `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be 
 about equivalent to the hand-unrolled version in master.

 
 How could you miss the state-of-the-art?) Preallocate the 
 result string and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach

 static foreach is quadratic? Is this a matter of principle, or 
 QoI?

Yes it is.
I did present this fact at the pre-event to Dconf 2018.

As for your other question it's both. The QoI is not stellar.
But it is fundamentally limited by having to create independent 
scopes.

--
Stefan

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 16:07:56 UTC, Stefan Koch wrote:

 I did present this fact at the pre-event to Dconf 2018.

Could have been 19 ?

May 25 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 12:07 PM, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:
 On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

 static foreach and stringof were used so I didn't have to pull in 
 `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be about 
 equivalent to the hand-unrolled version in master.

 How could you miss the state-of-the-art?) Preallocate the result 
 string and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm
-and-static-foreach 

 static foreach is quadratic? Is this a matter of principle, or QoI?

 
 Yes it is.
 I did present this fact at the pre-event to Dconf 2018.
 
 As for your other question it's both. The QoI is not stellar.
 But it is fundamentally limited by having to create independent scopes.

How do you mean that? static foreach does not create a scope.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 18:33:13 UTC, Andrei Alexandrescu wrote:
 On 5/25/20 12:07 PM, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu 
 wrote:
 On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature 
 wrote:

 static foreach and stringof were used so I didn't have to 
 pull in `format` or `to!string`, which had issues in 
 std.meta.

 I'm seeing some improvement over 2.086 at least; it should 
 be about equivalent to the hand-unrolled version in master.

 How could you miss the state-of-the-art?) Preallocate the 
 result string and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach

 static foreach is quadratic? Is this a matter of principle, 
 or QoI?

 
 Yes it is.
 I did present this fact at the pre-event to Dconf 2018.
 
 As for your other question it's both. The QoI is not stellar.
 But it is fundamentally limited by having to create 
 independent scopes.

 How do you mean that? static foreach does not create a scope.

The scope I am talking about is what dmd calls Scope.
This has to do with compiler internals.
static foreach has a manifest variable as IV.
That means it introduces a static name.
That name for example I has to resolve to a different value on 
every iteration.
This violates the rules of the language outside of static foreach.
Therefore the a construct has to be introduces which allows 
special behavior
inside a foreach body.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:
 static foreach is quadratic? Is this a matter of principle, or 
 QoI?

I have a hard time seeing why it HAS to be this way. Regular 
tuple foreach is better in most cases, string mixins are better 
in most cases (if written with appropriate care).

There might be some cases where it is fundamental - I just 
honestly don't know - and it is possible automatically 
differentiating those cases is impossible and/or more trouble 
than it is worth, but I continue to believe our primary problem 
is the implementation right now.

The problem is idk what the fix is. I spent a couple hours 
looking at the code a couple weeks ago and didn't find a 
solution. But that doesn't mean there isn't one.

May 25 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 18:04, Andrei Alexandrescu wrote:
 On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

 static foreach and stringof were used so I didn't have to pull in 
 `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be about 
 equivalent to the hand-unrolled version in master.

 How could you miss the state-of-the-art?) Preallocate the result 
 string and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm
-and-static-foreach 

 
 
 static foreach is quadratic? Is this a matter of principle, or QoI?
 

QoI. The original `static foreach` implementation in my compiler 
frontend runs in linear time, even though it is implemented in a very 
similar way. Note that even `iota(n).array` has superlinear running time 
with DMD's CTFE.

The implementation of `static foreach` in DMD is based on lowering, and 
the existing "tuple" foreach implementation. I suspect the culprit for 
bad performance for very large `static foreach` is a slow expression 
"tuple" implementation, but I am not sure.
Therefore, maybe performance could be improved by modifying the foreach 
expansion code such that it can accept a CTFE array instead of an 
expression tuple. One could also add a special-case implementation for 
range `static foreach` to work around inefficient CTFE.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 [...]

 QoI. The original `static foreach` implementation in my 
 compiler frontend runs in linear time, even though it is 
 implemented in a very similar way. Note that even 
 `iota(n).array` has superlinear running time with DMD's CTFE.

 [...]

What do you do about the scopes?

May 25 2020

Tove <tove fransson.se> writes:

On Monday, 25 May 2020 at 16:57:48 UTC, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 [...]

 QoI. The original `static foreach` implementation in my 
 compiler frontend runs in linear time, even though it is 
 implemented in a very similar way. Note that even 
 `iota(n).array` has superlinear running time with DMD's CTFE.

 [...]

 What do you do about the scopes?

Can't they be elided if the body contains no declarations?

May 25 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 18:57, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 [...]

 QoI. The original `static foreach` implementation in my compiler 
 frontend runs in linear time, even though it is implemented in a very 
 similar way. Note that even `iota(n).array` has superlinear running 
 time with DMD's CTFE.

 [...]

 
 What do you do about the scopes?

Nothing special. Why would I have to?

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:
 On 25.05.20 18:57, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 [...]

 QoI. The original `static foreach` implementation in my 
 compiler frontend runs in linear time, even though it is 
 implemented in a very similar way. Note that even 
 `iota(n).array` has superlinear running time with DMD's CTFE.

 [...]

 
 What do you do about the scopes?

 Nothing special. Why would I have to?

You need a manifest constant for you IV (induction variable), no?
Which would clash with the one previously introduced.

May 25 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 19:21, Stefan Koch wrote:
 On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:
 On 25.05.20 18:57, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 [...]

 QoI. The original `static foreach` implementation in my compiler 
 frontend runs in linear time, even though it is implemented in a 
 very similar way. Note that even `iota(n).array` has superlinear 
 running time with DMD's CTFE.

 [...]

 What do you do about the scopes?

 Nothing special. Why would I have to?

 
 You need a manifest constant for you IV (induction variable), no?
 Which would clash with the one previously introduced.
 

Sure, so I create one forwarding scope per iteration. This is not 
particularly expensive, it's constant overhead per iteration.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:
 On 25.05.20 19:21, Stefan Koch wrote:
 On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:
 On 25.05.20 18:57, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 [...]

 What do you do about the scopes?

 Nothing special. Why would I have to?

 
 You need a manifest constant for you IV (induction variable), 
 no?
 Which would clash with the one previously introduced.
 

 Sure, so I create one forwarding scope per iteration. This is 
 not particularly expensive, it's constant overhead per 
 iteration.

Since Forwarding scopes where not in the language before you 
introduced static if.
Could you give a quick explanation how they work?

May 25 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.05.20 19:25, Stefan Koch wrote:
 On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:
 On 25.05.20 19:21, Stefan Koch wrote:
 On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:
 On 25.05.20 18:57, Stefan Koch wrote:
 On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:
 [...]

 What do you do about the scopes?

 Nothing special. Why would I have to?

 You need a manifest constant for you IV (induction variable), no?
 Which would clash with the one previously introduced.

 Sure, so I create one forwarding scope per iteration. This is not 
 particularly expensive, it's constant overhead per iteration.

 
 Since Forwarding scopes where not in the language before you introduced 
 static if.
 Could you give a quick explanation how they work?
 

It's a scope that forwards new symbols to its parent scope on insertion. 
This way, each iteration of the `static foreach` will have its own local 
scope with distinct loop variables, but any declarations that are 
generated in the `static foreach` body will skip this scope and instead 
populate the parent scope. (As an experiment, the original 
implementation also had `__local` declarations to insert symbols into 
the local scope without forwarding.)

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:
 On 25.05.20 19:25, Stefan Koch wrote:
 On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:
 [...]

 
 Since Forwarding scopes where not in the language before you 
 introduced static if.
 Could you give a quick explanation how they work?
 

 It's a scope that forwards new symbols to its parent scope on 
 insertion. This way, each iteration of the `static foreach` 
 will have its own local scope with distinct loop variables, but 
 any declarations that are generated in the `static foreach` 
 body will skip this scope and instead populate the parent 
 scope. (As an experiment, the original implementation also had 
 `__local` declarations to insert symbols into the local scope 
 without forwarding.)

Thank a lot for the explanation!
I was wondering about this for a while.
That's a clever solution.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:
 (As an experiment, the original implementation also had 
 `__local` declarations to insert symbols into the local scope 
 without forwarding.)

That would actually be really useful in some places. Like

static foreach(memberName; __traits(allMembers, f)) {
      __local alias member = __traits(getMember, f, memberName);
}

Though of course it is also possible to do such with staticMap in 
places.

May 25 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 25 May 2020 at 20:47:42 UTC, Adam D. Ruppe wrote:
 On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:
 (As an experiment, the original implementation also had 
 `__local` declarations to insert symbols into the local scope 
 without forwarding.)

 That would actually be really useful in some places. Like

 static foreach(memberName; __traits(allMembers, f)) {
      __local alias member = __traits(getMember, f, memberName);
 }

 Though of course it is also possible to do such with staticMap 
 in places.

you want type functions :)
What you are doing there works by default since declarations are 
temporary.

May 25 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 12:46 PM, Timon Gehr wrote:
 On 25.05.20 18:04, Andrei Alexandrescu wrote:
 On 5/25/20 3:15 AM, Max Samukha wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

 static foreach and stringof were used so I didn't have to pull in 
 `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be about 
 equivalent to the hand-unrolled version in master.

 How could you miss the state-of-the-art?) Preallocate the result 
 string and use the string counter hack!
 http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm
-and-static-foreach 



 static foreach is quadratic? Is this a matter of principle, or QoI?

 
 QoI. The original `static foreach` implementation in my compiler 
 frontend runs in linear time, even though it is implemented in a very 
 similar way. Note that even `iota(n).array` has superlinear running time 
 with DMD's CTFE.
 
 The implementation of `static foreach` in DMD is based on lowering, and 
 the existing "tuple" foreach implementation. I suspect the culprit for 
 bad performance for very large `static foreach` is a slow expression 
 "tuple" implementation, but I am not sure.
 Therefore, maybe performance could be improved by modifying the foreach 
 expansion code such that it can accept a CTFE array instead of an 
 expression tuple. One could also add a special-case implementation for 
 range `static foreach` to work around inefficient CTFE.

Thank you! It would be great if you could contribute or shepherd a 
contribution to dmd. Anything quadratic fails at scale.

May 25 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 5/25/2020 9:46 AM, Timon Gehr wrote:
 QoI. The original `static foreach` implementation in my compiler frontend runs 
 in linear time, even though it is implemented in a very similar way. Note that 
 even `iota(n).array` has superlinear running time with DMD's CTFE.
 
 The implementation of `static foreach` in DMD is based on lowering, and the 
 existing "tuple" foreach implementation. I suspect the culprit for bad 
 performance for very large `static foreach` is a slow expression "tuple" 
 implementation, but I am not sure.
 Therefore, maybe performance could be improved by modifying the foreach 
 expansion code such that it can accept a CTFE array instead of an expression 
 tuple. One could also add a special-case implementation for range `static 
 foreach` to work around inefficient CTFE.

This is worth looking in to. I've had customers tell me they don't use static 
foreach due to performance problems.

May 25 2020

Max Samukha <maxsamukha gmail.com> writes:

On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:

 static foreach and stringof were used so I didn't have to pull 
 in `format` or `to!string`, which had issues in std.meta.

 I'm seeing some improvement over 2.086 at least; it should be 
 about equivalent to the hand-unrolled version in master.

State-of-the-art is to preallocate the result string and use the 
string counter hack)) 
http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach

Stefan's work could make all this unnecessary but alas.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 07:26:52 UTC, Max Samukha wrote:
 State-of-the-art is to preallocate the result string and use 
 the string counter hack))

In this case, it doesn't matter much because the string is 
generated only once per unique length, then it is cached thanks 
to the template arg in the helper enum.

So the CTFE only actually happens once, however you do it this 
way.

The preallocation becomes important when the generated string 
becomes large because it avoids the compiler leaking 
intermediates and wasting a LOT of time realloc and copying again 
- when it is just a few hundred bytes like this it doesn't really 
matter.

When the CTFE expression finishes btw the compiler does free the 
memory pool. So even if you did this little thing over and over, 
it can be OK. But again when the generated string becomes very 
large, the peak memory can be prohibitive (it reaches tens of 
gigabytes of pure waste surprisingly quickly).

BTW on ctfe cache vs recalculate it depends on how much reuse you 
have. In my jni.d, the first version had an inline string replace 
in a loop. It took 30 GB of memory [!!!] and took several minutes 
to compile.

Simply moving that out and pregenerating a `static immutable` so 
CTFE was only invoked once, took it down to 3 GB and 25 seconds. 
Still slow, but huge, huge improvement.

But that's because there were only actually two variants of that 
string and it was fairly large. So caching made sense (static 
immutable btw is a bit better than a helper template holding the 
result - both cache but it is like a single variable vs a hash 
table; in fact that's how it is implemented). But if there were 
hundreds of variants instead of two, the cache itself would get 
very expensive.

May 25 2020

Max Samukha <maxsamukha gmail.com> writes:

On Monday, 25 May 2020 at 13:13:11 UTC, Adam D. Ruppe wrote:

 In this case, it doesn't matter much because the string is 
 generated only once per unique length, then it is cached thanks 
 to the template arg in the helper enum.

Yes, I overlooked the enum. My version of static map just calls 
the generating function directly.

 So the CTFE only actually happens once, however you do it this 
 way.
+ When the CTFE expression finishes btw the compiler does free
 the memory pool. So even if you did this little thing over and 
 over, it can be OK. But again when the generated string becomes 
 very large, the peak memory can be prohibitive (it reaches tens 
 of gigabytes of pure waste surprisingly quickly).

 BTW on ctfe cache vs recalculate it depends on how much reuse 
 you have. In my jni.d, the first version had an inline string 
 replace in a loop. It took 30 GB of memory [!!!] and took 
 several minutes to compile.

 Simply moving that out and pregenerating a `static immutable` 
 so CTFE was only invoked once, took it down to 3 GB and 25 
 seconds. Still slow, but huge, huge improvement.

 But that's because there were only actually two variants of 
 that string and it was fairly large. So caching made sense 
 (static immutable btw is a bit better than a helper template 
 holding the result - both cache but it is like a single 
 variable vs a hash table; in fact that's how it is implemented).
 But if there were hundreds of variants instead of two, the 
 cache itself would get very expensive.

Right. This is applicable to memoization in general.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:
 private enum staticMapHelper(size_t length) = 
 staticMapHelperGen!(length);

lol, also called it staticMapHelper when doing this myself!

Solid result though: 2s, 810MB.

I'm kinda surprised it beat the crap out of my mixin version.

Mine was 3s, 1.2G using a mixin expression (alias staticMap = 
mixin(helper()). Changing it to a mixin declaration (the alias is 
in the mixin, like yours) brought it to 2.2s, 924MB. Interesting. 
And changing the helper from a template to a function gives 
another small win: 2.2s but 906MB.


 static foreach and stringof were used so I didn't have to pull 
 in `format` or `to!string`, which had issues in std.meta.

Yeah, format and to!string are both surprisingly heavy. Just 
using to!string(int) even in runtime stuff has a large hit.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 13:00:59 UTC, Adam D. Ruppe wrote:
 Solid result though: 2s, 810MB.

That was using the new dmd master, more specifically it was 1.6s 
that I rounded up. With the released dmd, 1.9s, 750MB.

either way, the mixin helps here - but that may not be true if 
there was a variety of lengths. This test did fixed length 
because I wanted to specifically prove that arg length isn't as 
important as number of instantiations here...

we should prolly test a variety of lengths too, it could change a 
lot.

May 25 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 9:00 AM, Adam D. Ruppe wrote:
 On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:
 private enum staticMapHelper(size_t length) = 
 staticMapHelperGen!(length);

 
 lol, also called it staticMapHelper when doing this myself!
 
 Solid result though: 2s, 810MB.
 
 I'm kinda surprised it beat the crap out of my mixin version.
 
 Mine was 3s, 1.2G using a mixin expression (alias staticMap = 
 mixin(helper()). Changing it to a mixin declaration (the alias is in the 
 mixin, like yours) brought it to 2.2s, 924MB. Interesting. And changing 
 the helper from a template to a function gives another small win: 2.2s 
 but 906MB.

So it seems a mixin-based version + Max's precalculated length = win!

May 25 2020

Max Samukha <maxsamukha gmail.com> writes:

On Monday, 25 May 2020 at 16:07:06 UTC, Andrei Alexandrescu wrote:

 So it seems a mixin-based version + Max's precalculated length 
 = win!

Precalculated length is Adam's, and the link is to his blog. I 
should have stated that clearly in my post. Sorry.

May 25 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 25 May 2020 at 17:08:32 UTC, Max Samukha wrote:
 Precalculated length is Adam's

I stole it from Stefan too IIRC.

The way I see it, it is all just D community knowledge. A lot of 
the things I know just come from helping other users through 
their problems too - they point out something that is cool or is 
broken and by working on it together we both learn.

May 25 2020

Max Samukha <maxsamukha gmail.com> writes:

On Monday, 25 May 2020 at 20:50:19 UTC, Adam D. Ruppe wrote:
 On Monday, 25 May 2020 at 17:08:32 UTC, Max Samukha wrote:
 Precalculated length is Adam's

 I stole it from Stefan too IIRC.

Preallocating buffers for CTFE is not a new idea, but the string 
counter hack was new to me.

 The way I see it, it is all just D community knowledge. A lot 
 of the things I know just come from helping other users through 
 their problems too - they point out something that is cool or 
 is broken and by working on it together we both learn.

Agree.

May 26 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 2:18 AM, FeepingCreature wrote:
 On Monday, 25 May 2020 at 03:19:51 UTC, Adam D. Ruppe wrote:
 All this was done on a stock DMD64 D Compiler v2.091.1 on linux. 
 Several enhancements were merged to dmd master recently that may 
 change the results btw.

 quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G

 hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M

 Decent time improvement across the board... but the binary got bigger 
 at 6.4M :( (prolly druntime's fault more than anything else but idk, 
 im not focusing on that rn)

 
 Could you also try with this version, please?
 
 template staticMap(alias F, T...)
 {
      mixin(staticMapHelper!(T.length));
 }
 
 private enum staticMapHelper(size_t length) = staticMapHelperGen!(length);
 
 private string staticMapHelperGen(size_t length)()
 {
      string res = "alias staticMap = AliasSeq!(";
      static foreach (i; 0 .. length)
      {
          if (i) res ~= ", ";
          res ~= "F!(T[" ~ i.stringof ~ "])";
      }
      res ~= ");";
      return res;
 }

Simpler:

private string staticMapHelperGen(size_t length)()
{
     string res = "alias staticMap = AliasSeq!(";
     static foreach (i; 0 .. length)
         res ~= "F!(T[" ~ i.stringof ~ "]),";
     res ~= ");";
     return res;
}

May 25 2020

FeepingCreature <feepingcreature gmail.com> writes:

On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:
 Consider staticMap's current version (there are a few upcoming 
 changes that don't change the point):
 
...

 In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), 
 recursion will instantiate:

 staticMap!(pred, A, B, C, D)
 staticMap!(pred, A, B)
 staticMap!(pred, A)
 staticMap!(pred, B)
 staticMap!(pred, C, D)
 staticMap!(pred, C)
 staticMap!(pred, D)
 staticMap!(pred, E. F, G, H)
 staticMap!(pred, E, F)
 staticMap!(pred, E)
 staticMap!(pred, F)
 staticMap!(pred, G, H)
 staticMap!(pred, G)
 staticMap!(pred, H)

 Total: 15 instantiations including original. This is because a 
 full tree with 8 leaves is being created. Generally, a tree 
 with N leaves has about 2N - 1 nodes total (exactly so when N 
 is a power of 2).

 The more naive version would simply use linear decomposition:
 ...
 There' no more need to handle the 1-element case, which is 
 already a plus. But that's not the topic here. The 
 instantiations created are:

 staticMap!(pred, B, C, D, E, F, G, H)
 staticMap!(pred, C, D, E, F, G, H)
 staticMap!(pred, D, E, F, G, H)
 staticMap!(pred, E, F, G, H)
 staticMap!(pred, F, G, H)
 staticMap!(pred, G, H)
 staticMap!(pred, H)

 Total: 8 instantiations including original. That's about half 
 the current number of instantiations.

 So it seems divide et impera doesn't quite help with certain 
 recursive templates.

To add to the good previous points: the divide-and-conquer 
strategy can take advantage of caching for the leaf 
instantiations with one parameter. That is, 8 template 
instantiations can trivially be reused, and the others can 
sometimes be reused if they hit a common subtree. The linear 
instantiation can only be reused once or twice at the very end.

May 24 2020

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/25/20 1:59 AM, FeepingCreature wrote:
 On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:
 Consider staticMap's current version (there are a few upcoming changes 
 that don't change the point):

 ...

 In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), 
 recursion will instantiate:

 staticMap!(pred, A, B, C, D)
 staticMap!(pred, A, B)
 staticMap!(pred, A)
 staticMap!(pred, B)
 staticMap!(pred, C, D)
 staticMap!(pred, C)
 staticMap!(pred, D)
 staticMap!(pred, E. F, G, H)
 staticMap!(pred, E, F)
 staticMap!(pred, E)
 staticMap!(pred, F)
 staticMap!(pred, G, H)
 staticMap!(pred, G)
 staticMap!(pred, H)

 Total: 15 instantiations including original. This is because a full 
 tree with 8 leaves is being created. Generally, a tree with N leaves 
 has about 2N - 1 nodes total (exactly so when N is a power of 2).

 The more naive version would simply use linear decomposition:
 ...
 There' no more need to handle the 1-element case, which is already a 
 plus. But that's not the topic here. The instantiations created are:

 staticMap!(pred, B, C, D, E, F, G, H)
 staticMap!(pred, C, D, E, F, G, H)
 staticMap!(pred, D, E, F, G, H)
 staticMap!(pred, E, F, G, H)
 staticMap!(pred, F, G, H)
 staticMap!(pred, G, H)
 staticMap!(pred, H)

 Total: 8 instantiations including original. That's about half the 
 current number of instantiations.

 So it seems divide et impera doesn't quite help with certain recursive 
 templates.

 
 To add to the good previous points: the divide-and-conquer strategy can 
 take advantage of caching for the leaf instantiations with one 
 parameter. That is, 8 template instantiations can trivially be reused, 
 and the others can sometimes be reused if they hit a common subtree. The 
 linear instantiation can only be reused once or twice at the very end.

Not sure I understand. A,...,H are assumed to be distinct. The number of 
templates generated is the same regardless of memoization.

May 25 2020

Paul Backus <snarwin gmail.com> writes:

On Monday, 25 May 2020 at 15:59:57 UTC, Andrei Alexandrescu wrote:
 On 5/25/20 1:59 AM, FeepingCreature wrote:
 To add to the good previous points: the divide-and-conquer 
 strategy can take advantage of caching for the leaf 
 instantiations with one parameter. That is, 8 template 
 instantiations can trivially be reused, and the others can 
 sometimes be reused if they hit a common subtree. The linear 
 instantiation can only be reused once or twice at the very end.

 Not sure I understand. A,...,H are assumed to be distinct. The 
 number of templates generated is the same regardless of 
 memoization.

I think the idea is that you can potentially re-use template 
instances across separate calls to staticMap. So for example, if 
you have `staticMap!(A, B, C, D)` and `staticMap!(A, B, E, F)`, 
you get:

staticMap!(pred, A, B, C, D)
staticMap!(pred, A, B)
staticMap!(pred, A)
staticMap!(pred, B)
staticMap!(pred, C, D)
staticMap!(pred, C)
staticMap!(pred, D)

staticMap!(pred, A, B, E, F)
staticMap!(pred, E, F)
staticMap!(pred, E)
staticMap!(pred, F)

May 25 2020

johnsmith101 <johnsmith908412 gmail.com> writes:

On Monday, 25 May 2020 at 05:59:41 UTC, FeepingCreature wrote:
 On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu 
 wrote:
 Consider staticMap's current version (there are a few upcoming 
 changes that don't change the point):
 
...

 In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), 
 recursion will instantiate:

 staticMap!(pred, A, B, C, D)
 staticMap!(pred, A, B)
 staticMap!(pred, A)
 staticMap!(pred, B)
 staticMap!(pred, C, D)
 staticMap!(pred, C)
 staticMap!(pred, D)
 staticMap!(pred, E. F, G, H)
 staticMap!(pred, E, F)
 staticMap!(pred, E)
 staticMap!(pred, F)
 staticMap!(pred, G, H)
 staticMap!(pred, G)
 staticMap!(pred, H)

 Total: 15 instantiations including original. This is because a 
 full tree with 8 leaves is being created. Generally, a tree 
 with N leaves has about 2N - 1 nodes total (exactly so when N 
 is a power of 2).

 The more naive version would simply use linear decomposition:
 ...
 There' no more need to handle the 1-element case, which is 
 already a plus. But that's not the topic here. The 
 instantiations created are:

 staticMap!(pred, B, C, D, E, F, G, H)
 staticMap!(pred, C, D, E, F, G, H)
 staticMap!(pred, D, E, F, G, H)
 staticMap!(pred, E, F, G, H)
 staticMap!(pred, F, G, H)
 staticMap!(pred, G, H)
 staticMap!(pred, H)

 Total: 8 instantiations including original. That's about half 
 the current number of instantiations.

 So it seems divide et impera doesn't quite help with certain 
 recursive templates.

 To add to the good previous points: the divide-and-conquer 
 strategy can take advantage of caching for the leaf 
 instantiations with one parameter. That is, 8 template 
 instantiations can trivially be reused, and the others can 
 sometimes be reused if they hit a common subtree. The linear 
 instantiation can only be reused once or twice at the very end.

Thank u so much for taking time to help.

May 26 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:
 Consider staticMap's current version (there are a few upcoming 
 changes that don't change the point):

 [...]

Manu's DIP does solve this problem.
In O(n) guaranteed.

May 25 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Divide & Conquer divides, but doesn't conquer