digitalmars.D - Divide & Conquer divides, but doesn't conquer
- Andrei Alexandrescu (67/67) May 24 2020 Consider staticMap's current version (there are a few upcoming changes
- Timon Gehr (4/8) May 24 2020 Assuming the cost of a template instantiation is linear in the number of...
- Andrei Alexandrescu (45/54) May 24 2020 Of course. However, I'm assuming most of a template instantiation's cost...
- Timon Gehr (3/6) May 24 2020 I don't think this is a realistic assumption with the current compiler
- Andrei Alexandrescu (23/23) May 24 2020 Another one:
- Steven Schveighoffer (7/32) May 24 2020 Because it doesn't work for large sets unless you do this.
- Andrei Alexandrescu (2/38) May 24 2020 I see, thanks.
- Walter Bright (3/4) May 25 2020 So we don't waste time on this point again:
- Adam D. Ruppe (43/43) May 24 2020 I ran a test to compare the two implementations. First was
- Walter Bright (3/5) May 24 2020 Thank you. This is good information. I'm curious how the new staticMap c...
- Adam D. Ruppe (8/11) May 25 2020 Using dmd master: 1.8s 1.2G
- Walter Bright (2/4) May 25 2020 A 15% speed win is a big deal! Especially for such a simple change.
- Stefan Koch (2/7) May 25 2020 How much of a deal is 100% ? Enough to talk about including ... ?
- Nick Treleaven (3/5) May 27 2020 How would a type function compare to S... ? Ideally we'd just use
- Stefan Koch (4/9) May 27 2020 It's gonna be a while until they can be used.
- Stefan Koch (2/12) May 27 2020 Assuming you mean the benchmark I posted.
- Andrei Alexandrescu (4/12) May 25 2020 This is reasonable. I've seen a *lot* worse in optimizing libraries. A
- Stefan Koch (28/34) May 25 2020 This has brought down the compile time of my test indeed.
- Adam D. Ruppe (3/6) May 25 2020 Does that include the other shortcut prs? I wonder if they would
- Stefan Koch (4/10) May 25 2020 Open it I suspect.
- FeepingCreature (23/31) May 24 2020 Could you also try with this version, please?
- Max Samukha (4/8) May 25 2020 How could you miss the state-of-the-art?) Preallocate the result
- Andrei Alexandrescu (2/13) May 25 2020 static foreach is quadratic? Is this a matter of principle, or QoI?
- Stefan Koch (8/22) May 25 2020 Yes it is.
- Stefan Koch (2/3) May 25 2020 Could have been 19 ?
- Andrei Alexandrescu (2/24) May 25 2020 How do you mean that? static foreach does not create a scope.
- Stefan Koch (11/40) May 25 2020 The scope I am talking about is what dmd calls Scope.
- Adam D. Ruppe (12/14) May 25 2020 I have a hard time seeing why it HAS to be this way. Regular
- Timon Gehr (13/29) May 25 2020 QoI. The original `static foreach` implementation in my compiler
- Stefan Koch (2/9) May 25 2020 What do you do about the scopes?
- Tove (2/13) May 25 2020 Can't they be elided if the body contains no declarations?
- Timon Gehr (2/14) May 25 2020 Nothing special. Why would I have to?
- Stefan Koch (3/17) May 25 2020 You need a manifest constant for you IV (induction variable), no?
- Timon Gehr (3/23) May 25 2020 Sure, so I create one forwarding scope per iteration. This is not
- Stefan Koch (4/21) May 25 2020 Since Forwarding scopes where not in the language before you
- Timon Gehr (8/30) May 25 2020 It's a scope that forwards new symbols to its parent scope on insertion....
- Stefan Koch (4/20) May 25 2020 Thank a lot for the explanation!
- Adam D. Ruppe (7/10) May 25 2020 That would actually be really useful in some places. Like
- Stefan Koch (4/14) May 25 2020 you want type functions :)
- Andrei Alexandrescu (3/35) May 25 2020 Thank you! It would be great if you could contribute or shepherd a
- Walter Bright (3/15) May 25 2020 This is worth looking in to. I've had customers tell me they don't use s...
- Max Samukha (5/9) May 25 2020 State-of-the-art is to preallocate the result string and use the
- Adam D. Ruppe (30/32) May 25 2020 In this case, it doesn't matter much because the string is
- Max Samukha (4/28) May 25 2020 Yes, I overlooked the enum. My version of static map just calls
- Adam D. Ruppe (11/15) May 25 2020 lol, also called it staticMapHelper when doing this myself!
- Adam D. Ruppe (9/10) May 25 2020 That was using the new dmd master, more specifically it was 1.6s
- Andrei Alexandrescu (2/17) May 25 2020 So it seems a mixin-based version + Max's precalculated length = win!
- Max Samukha (3/5) May 25 2020 Precalculated length is Adam's, and the link is to his blog. I
- Adam D. Ruppe (6/7) May 25 2020 I stole it from Stefan too IIRC.
- Max Samukha (4/11) May 26 2020 Preallocating buffers for CTFE is not a new idea, but the string
- Andrei Alexandrescu (10/43) May 25 2020 Simpler:
- FeepingCreature (7/47) May 24 2020 To add to the good previous points: the divide-and-conquer
- Andrei Alexandrescu (3/55) May 25 2020 Not sure I understand. A,...,H are assumed to be distinct. The number of...
- Paul Backus (16/26) May 25 2020 I think the idea is that you can potentially re-use template
- johnsmith101 (2/57) May 26 2020 Thank u so much for taking time to help.
- Stefan Koch (3/6) May 25 2020 Manu's DIP does solve this problem.
Consider staticMap's current version (there are a few upcoming changes that don't change the point): template staticMap(alias F, T...) { static if (T.length == 0) { alias staticMap = AliasSeq!(); } else static if (T.length == 1) { alias staticMap = AliasSeq!(F!(T[0])); } else { alias staticMap = AliasSeq!( staticMap!(F, T[ 0 .. $/2]), staticMap!(F, T[$/2 .. $ ])); } } In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate: staticMap!(pred, A, B, C, D) staticMap!(pred, A, B) staticMap!(pred, A) staticMap!(pred, B) staticMap!(pred, C, D) staticMap!(pred, C) staticMap!(pred, D) staticMap!(pred, E. F, G, H) staticMap!(pred, E, F) staticMap!(pred, E) staticMap!(pred, F) staticMap!(pred, G, H) staticMap!(pred, G) staticMap!(pred, H) Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2). The more naive version would simply use linear decomposition: template staticMap(alias F, T...) { static if (T.length == 0) { alias staticMap = AliasSeq!(); } else { alias staticMap = AliasSeq!( F!(T[0]), staticMap!(F, T[1 .. $])); } } There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are: staticMap!(pred, B, C, D, E, F, G, H) staticMap!(pred, C, D, E, F, G, H) staticMap!(pred, D, E, F, G, H) staticMap!(pred, E, F, G, H) staticMap!(pred, F, G, H) staticMap!(pred, G, H) staticMap!(pred, H) Total: 8 instantiations including original. That's about half the current number of instantiations. So it seems divide et impera doesn't quite help with certain recursive templates.
May 24 2020
On 25.05.20 01:40, Andrei Alexandrescu wrote:... So it seems divide et impera doesn't quite help with certain recursive templates.Assuming the cost of a template instantiation is linear in the number of template arguments, the first version has running time Θ(n log n) while the second version has running time Θ(n²).
May 24 2020
On 5/24/20 8:44 PM, Timon Gehr wrote:On 25.05.20 01:40, Andrei Alexandrescu wrote:Of course. However, I'm assuming most of a template instantiation's cost is fixed. There are other algorithms looking at in std.meta. Consider Reverse: template Reverse(TList...) { static if (TList.length <= 1) { alias Reverse = TList; } else { alias Reverse = AliasSeq!( Reverse!(TList[$/2 .. $ ]), Reverse!(TList[ 0 .. $/2])); } } The recurrence formula for complexity is: T(0) = k; T(1) = k; T(n) = 2 * T(n/2) + k Again, I assume instantiating AliasSeq with two lists of any lengths has the same cost k. That makes complexity linear. So is the complexity of the simpler implementation: template Reverse(TList...) { static if (TList.length <= 1) { alias Reverse = TList; } else { alias Reverse = AliasSeq!(Reverse!(TList[1 .. $]), TList[0]); } } But this insantiates half the templates, so it goes easier on resources. This makes the use of the halving strategy not only an unnecessary distraction, but a less efficient choice. There are other unnecessary (and/or unnecessarily baroque) artifacts in std.meta, such as EraseAllN. Far as I can tell it's unnecessary because NoDuplicates can be implemented without it (and if I'm not wrong, at the same complexity). std.meta is due for a serious review.... So it seems divide et impera doesn't quite help with certain recursive templates.Assuming the cost of a template instantiation is linear in the number of template arguments, the first version has running time Θ(n log n) while the second version has running time Θ(n²).
May 24 2020
On 25.05.20 03:25, Andrei Alexandrescu wrote:Again, I assume instantiating AliasSeq with two lists of any lengths has the same cost k.I don't think this is a realistic assumption with the current compiler implementation. Expression tuples are stored as arrays.
May 24 2020
Another one: template Filter(alias pred, TList...) { static if (TList.length == 0) { alias Filter = AliasSeq!(); } else static if (TList.length == 1) { static if (pred!(TList[0])) alias Filter = AliasSeq!(TList[0]); else alias Filter = AliasSeq!(); } else { alias Filter = AliasSeq!( Filter!(pred, TList[ 0 .. $/2]), Filter!(pred, TList[$/2 .. $ ])); } } Why cut the problem size in half if it doesn't improve complexity?
May 24 2020
On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:Another one: template Filter(alias pred, TList...) { static if (TList.length == 0) { alias Filter = AliasSeq!(); } else static if (TList.length == 1) { static if (pred!(TList[0])) alias Filter = AliasSeq!(TList[0]); else alias Filter = AliasSeq!(); } else { alias Filter = AliasSeq!( Filter!(pred, TList[ 0 .. $/2]), Filter!(pred, TList[$/2 .. $ ])); } } Why cut the problem size in half if it doesn't improve complexity?Because it doesn't work for large sets unless you do this. If you do a "linear" recursive template, it fails after about 1000 levels. A divide-and-conquer will not get that deep. So in essence, it's a solution for a compiler restriction, not for performance. -Steve
May 24 2020
On 5/24/20 10:32 PM, Steven Schveighoffer wrote:On 5/24/20 9:48 PM, Andrei Alexandrescu wrote:I see, thanks.Another one: template Filter(alias pred, TList...) { static if (TList.length == 0) { alias Filter = AliasSeq!(); } else static if (TList.length == 1) { static if (pred!(TList[0])) alias Filter = AliasSeq!(TList[0]); else alias Filter = AliasSeq!(); } else { alias Filter = AliasSeq!( Filter!(pred, TList[ 0 .. $/2]), Filter!(pred, TList[$/2 .. $ ])); } } Why cut the problem size in half if it doesn't improve complexity?Because it doesn't work for large sets unless you do this. If you do a "linear" recursive template, it fails after about 1000 levels. A divide-and-conquer will not get that deep. So in essence, it's a solution for a compiler restriction, not for performance. -Steve
May 24 2020
On 5/24/2020 8:10 PM, Andrei Alexandrescu wrote:I see, thanks.So we don't waste time on this point again: https://github.com/dlang/phobos/pull/7497
May 25 2020
I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance. 0.47user 0.11system 0:00.58elapsed 100%CPU (0avgtext+0avgdata 399688maxresident)k No significant difference between implementations. This is expected since it caches based on arguments. Second test compared Phobos' impl vs the simple impl but this time with 10,000 different random instantiations: http://arsdnet.net/dcode/ala.d I'd comment one impl out, uncomment the other, rerun /usr/bin/test dmd ala.d as well as dmd -c ala.d. simple impl phobos impl compile time: 3.1s 3.7s dmd peak RAM: 1.5G 1.9G binary size: 6.2M 6.2M The generated files (bin and object) are the same between them, but the simple implementation beats the Phobos impl in both compile time and dmd memory. Note that all the user-visible static maps had 9 arguments, but the two separate implementations do different things. It is possible my test is flawed, I encourage you to create your own and see what happens. I also tried a string mixin version inside the static map template and it took 4.0s and used 1.2G of memory. Win on RAM, but lost on speed. (It was not the CTFE that hit it - that got cached - it was the mixin being run on each unique argument list). And a hand-written static map because the number of arguments is known ahead of time: 1.4s compile, 0.7G memory. Absolutely crushed it. It might be worth seeing what lengths of static map are most common in the wild and just having hand expansions. Adding static if has a cost though (upped my test to 1.5s/720MB). But not as high as template constraints (1.5s/740MB). But this is all getting increasingly artificial. So I am willing to say number of template instantiations is the probable key metric. All this was done on a stock DMD64 D Compiler v2.091.1 on linux. Several enhancements were merged to dmd master recently that may change the results btw. quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M Decent time improvement across the board... but the binary got bigger at 6.4M :( (prolly druntime's fault more than anything else but idk, im not focusing on that rn)
May 24 2020
On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.Thank you. This is good information. I'm curious how the new staticMap compares: https://github.com/dlang/phobos/pull/7490
May 24 2020
On Monday, 25 May 2020 at 05:55:07 UTC, Walter Bright wrote:Thank you. This is good information. I'm curious how the new staticMap compares: https://github.com/dlang/phobos/pull/7490Using dmd master: 1.8s 1.2G Using released dmd: 2.1s, 1.1G (I think something in dmd master has a higher fixed memory cost, so remember to compare apples and apples.) So yes, the static if version is an improvement over what it has now, at least for these cases. We should probably make another test rig to see more variety...
May 25 2020
On 5/25/2020 6:27 AM, Adam D. Ruppe wrote:Using dmd master: 1.8s 1.2G Using released dmd: 2.1s, 1.1GA 15% speed win is a big deal! Especially for such a simple change.
May 25 2020
On Monday, 25 May 2020 at 21:25:33 UTC, Walter Bright wrote:On 5/25/2020 6:27 AM, Adam D. Ruppe wrote:How much of a deal is 100% ? Enough to talk about including ... ?Using dmd master: 1.8s 1.2G Using released dmd: 2.1s, 1.1GA 15% speed win is a big deal! Especially for such a simple change.
May 25 2020
On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:How much of a deal is 100% ? Enough to talk about including ... ?How would a type function compare to S... ? Ideally we'd just use a type function.
May 27 2020
On Wednesday, 27 May 2020 at 19:05:51 UTC, Nick Treleaven wrote:On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:It's gonna be a while until they can be used. For that usecase you couldn't use a typefunction at all. type functions don't work on values.How much of a deal is 100% ? Enough to talk about including ... ?How would a type function compare to S... ? Ideally we'd just use a type function.
May 27 2020
On Wednesday, 27 May 2020 at 20:20:37 UTC, Stefan Koch wrote:On Wednesday, 27 May 2020 at 19:05:51 UTC, Nick Treleaven wrote:Assuming you mean the benchmark I posted.On Monday, 25 May 2020 at 21:29:00 UTC, Stefan Koch wrote:It's gonna be a while until they can be used. For that usecase you couldn't use a typefunction at all. type functions don't work on values.How much of a deal is 100% ? Enough to talk about including ... ?How would a type function compare to S... ? Ideally we'd just use a type function.
May 27 2020
On 5/25/20 1:55 AM, Walter Bright wrote:On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:This is reasonable. I've seen a *lot* worse in optimizing libraries. A *LOT*. Another possible improvement: binary search through the special cases.I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.Thank you. This is good information. I'm curious how the new staticMap compares: https://github.com/dlang/phobos/pull/7490
May 25 2020
On Monday, 25 May 2020 at 05:55:07 UTC, Walter Bright wrote:On 5/24/2020 8:19 PM, Adam D. Ruppe wrote:This has brought down the compile time of my test indeed. Almost but not quite down to `...` Let me post my test. I measured that for my test your staticMap is 1.2x faster than the regular the version before the pull. -- './dmd.sh sm.d -c -version=Walter' ran 1.21 ± 0.10 times faster than './dmd.sh sm.d -c' -- I measured that our `...` is about 1.5x faster than than your improved staticMap -- './dmd.sh sm.d -c -version=DotDotDot' ran 1.49 ± 0.18 times faster than './dmd.sh sm.d -c -version=Walter' -- Which should make our version about 2 times faster than the baseline and indeed -- './dmd.sh sm.d -c -version=DotDotDot' ran 2.05 ± 0.23 times faster than './dmd.sh sm.d -c' -- The source used file used for the benchmark is here: https://gist.github.com/UplinkCoder/9772f27cf088f08901b44fb57935aaaf The dmd used is at commit 2f23e3348e9a927c025637a4852d4d77beb997d3 which is on manu's fork of the compiler.I ran a test to compare the two implementations. First was comparing 10,000 references to the same instance.Thank you. This is good information. I'm curious how the new staticMap compares: https://github.com/dlang/phobos/pull/7490
May 25 2020
On Monday, 25 May 2020 at 15:59:43 UTC, Stefan Koch wrote:The dmd used is at commit 2f23e3348e9a927c025637a4852d4d77beb997d3 which is on manu's fork of the compiler.Does that include the other shortcut prs? I wonder if they would further open or close the gap
May 25 2020
On Monday, 25 May 2020 at 20:57:07 UTC, Adam D. Ruppe wrote:On Monday, 25 May 2020 at 15:59:43 UTC, Stefan Koch wrote:Open it I suspect. The `...` version would be around 20x faster if it could create the new sequence as one thing.The dmd used is at commit 2f23e3348e9a927c025637a4852d4d77beb997d3 which is on manu's fork of the compiler.Does that include the other shortcut prs? I wonder if they would further open or close the gap
May 25 2020
On Monday, 25 May 2020 at 03:19:51 UTC, Adam D. Ruppe wrote:All this was done on a stock DMD64 D Compiler v2.091.1 on linux. Several enhancements were merged to dmd master recently that may change the results btw. quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M Decent time improvement across the board... but the binary got bigger at 6.4M :( (prolly druntime's fault more than anything else but idk, im not focusing on that rn)Could you also try with this version, please? template staticMap(alias F, T...) { mixin(staticMapHelper!(T.length)); } private enum staticMapHelper(size_t length) = staticMapHelperGen!(length); private string staticMapHelperGen(size_t length)() { string res = "alias staticMap = AliasSeq!("; static foreach (i; 0 .. length) { if (i) res ~= ", "; res ~= "F!(T[" ~ i.stringof ~ "])"; } res ~= ");"; return res; } static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.
May 24 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach
May 25 2020
On 5/25/20 3:15 AM, Max Samukha wrote:On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm -and-static-foreach
May 25 2020
On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:On 5/25/20 3:15 AM, Max Samukha wrote:Yes it is. I did present this fact at the pre-event to Dconf 2018. As for your other question it's both. The QoI is not stellar. But it is fundamentally limited by having to create independent scopes. -- StefanOn Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach
May 25 2020
On Monday, 25 May 2020 at 16:07:56 UTC, Stefan Koch wrote:I did present this fact at the pre-event to Dconf 2018.Could have been 19 ?
May 25 2020
On 5/25/20 12:07 PM, Stefan Koch wrote:On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:How do you mean that? static foreach does not create a scope.On 5/25/20 3:15 AM, Max Samukha wrote:Yes it is. I did present this fact at the pre-event to Dconf 2018. As for your other question it's both. The QoI is not stellar. But it is fundamentally limited by having to create independent scopes.On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm -and-static-foreach
May 25 2020
On Monday, 25 May 2020 at 18:33:13 UTC, Andrei Alexandrescu wrote:On 5/25/20 12:07 PM, Stefan Koch wrote:The scope I am talking about is what dmd calls Scope. This has to do with compiler internals. static foreach has a manifest variable as IV. That means it introduces a static name. That name for example I has to resolve to a different value on every iteration. This violates the rules of the language outside of static foreach. Therefore the a construct has to be introduces which allows special behavior inside a foreach body.On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:How do you mean that? static foreach does not create a scope.On 5/25/20 3:15 AM, Max Samukha wrote:Yes it is. I did present this fact at the pre-event to Dconf 2018. As for your other question it's both. The QoI is not stellar. But it is fundamentally limited by having to create independent scopes.On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach
May 25 2020
On Monday, 25 May 2020 at 16:04:24 UTC, Andrei Alexandrescu wrote:static foreach is quadratic? Is this a matter of principle, or QoI?I have a hard time seeing why it HAS to be this way. Regular tuple foreach is better in most cases, string mixins are better in most cases (if written with appropriate care). There might be some cases where it is fundamental - I just honestly don't know - and it is possible automatically differentiating those cases is impossible and/or more trouble than it is worth, but I continue to believe our primary problem is the implementation right now. The problem is idk what the fix is. I spent a couple hours looking at the code a couple weeks ago and didn't find a solution. But that doesn't mean there isn't one.
May 25 2020
On 25.05.20 18:04, Andrei Alexandrescu wrote:On 5/25/20 3:15 AM, Max Samukha wrote:QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. The implementation of `static foreach` in DMD is based on lowering, and the existing "tuple" foreach implementation. I suspect the culprit for bad performance for very large `static foreach` is a slow expression "tuple" implementation, but I am not sure. Therefore, maybe performance could be improved by modifying the foreach expansion code such that it can accept a CTFE array instead of an expression tuple. One could also add a special-case implementation for range `static foreach` to work around inefficient CTFE.On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm -and-static-foreach
May 25 2020
On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:On 25.05.20 18:04, Andrei Alexandrescu wrote:What do you do about the scopes?[...]QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. [...]
May 25 2020
On Monday, 25 May 2020 at 16:57:48 UTC, Stefan Koch wrote:On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Can't they be elided if the body contains no declarations?On 25.05.20 18:04, Andrei Alexandrescu wrote:What do you do about the scopes?[...]QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. [...]
May 25 2020
On 25.05.20 18:57, Stefan Koch wrote:On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Nothing special. Why would I have to?On 25.05.20 18:04, Andrei Alexandrescu wrote:What do you do about the scopes?[...]QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. [...]
May 25 2020
On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:On 25.05.20 18:57, Stefan Koch wrote:You need a manifest constant for you IV (induction variable), no? Which would clash with the one previously introduced.On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Nothing special. Why would I have to?On 25.05.20 18:04, Andrei Alexandrescu wrote:What do you do about the scopes?[...]QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. [...]
May 25 2020
On 25.05.20 19:21, Stefan Koch wrote:On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:Sure, so I create one forwarding scope per iteration. This is not particularly expensive, it's constant overhead per iteration.On 25.05.20 18:57, Stefan Koch wrote:You need a manifest constant for you IV (induction variable), no? Which would clash with the one previously introduced.On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Nothing special. Why would I have to?On 25.05.20 18:04, Andrei Alexandrescu wrote:What do you do about the scopes?[...]QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. [...]
May 25 2020
On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:On 25.05.20 19:21, Stefan Koch wrote:Since Forwarding scopes where not in the language before you introduced static if. Could you give a quick explanation how they work?On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:Sure, so I create one forwarding scope per iteration. This is not particularly expensive, it's constant overhead per iteration.On 25.05.20 18:57, Stefan Koch wrote:You need a manifest constant for you IV (induction variable), no? Which would clash with the one previously introduced.On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Nothing special. Why would I have to?[...]What do you do about the scopes?
May 25 2020
On 25.05.20 19:25, Stefan Koch wrote:On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:It's a scope that forwards new symbols to its parent scope on insertion. This way, each iteration of the `static foreach` will have its own local scope with distinct loop variables, but any declarations that are generated in the `static foreach` body will skip this scope and instead populate the parent scope. (As an experiment, the original implementation also had `__local` declarations to insert symbols into the local scope without forwarding.)On 25.05.20 19:21, Stefan Koch wrote:Since Forwarding scopes where not in the language before you introduced static if. Could you give a quick explanation how they work?On Monday, 25 May 2020 at 17:16:12 UTC, Timon Gehr wrote:Sure, so I create one forwarding scope per iteration. This is not particularly expensive, it's constant overhead per iteration.On 25.05.20 18:57, Stefan Koch wrote:You need a manifest constant for you IV (induction variable), no? Which would clash with the one previously introduced.On Monday, 25 May 2020 at 16:46:16 UTC, Timon Gehr wrote:Nothing special. Why would I have to?[...]What do you do about the scopes?
May 25 2020
On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:On 25.05.20 19:25, Stefan Koch wrote:Thank a lot for the explanation! I was wondering about this for a while. That's a clever solution.On Monday, 25 May 2020 at 17:23:50 UTC, Timon Gehr wrote:It's a scope that forwards new symbols to its parent scope on insertion. This way, each iteration of the `static foreach` will have its own local scope with distinct loop variables, but any declarations that are generated in the `static foreach` body will skip this scope and instead populate the parent scope. (As an experiment, the original implementation also had `__local` declarations to insert symbols into the local scope without forwarding.)[...]Since Forwarding scopes where not in the language before you introduced static if. Could you give a quick explanation how they work?
May 25 2020
On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:(As an experiment, the original implementation also had `__local` declarations to insert symbols into the local scope without forwarding.)That would actually be really useful in some places. Like static foreach(memberName; __traits(allMembers, f)) { __local alias member = __traits(getMember, f, memberName); } Though of course it is also possible to do such with staticMap in places.
May 25 2020
On Monday, 25 May 2020 at 20:47:42 UTC, Adam D. Ruppe wrote:On Monday, 25 May 2020 at 17:34:36 UTC, Timon Gehr wrote:you want type functions :) What you are doing there works by default since declarations are temporary.(As an experiment, the original implementation also had `__local` declarations to insert symbols into the local scope without forwarding.)That would actually be really useful in some places. Like static foreach(memberName; __traits(allMembers, f)) { __local alias member = __traits(getMember, f, memberName); } Though of course it is also possible to do such with staticMap in places.
May 25 2020
On 5/25/20 12:46 PM, Timon Gehr wrote:On 25.05.20 18:04, Andrei Alexandrescu wrote:Thank you! It would be great if you could contribute or shepherd a contribution to dmd. Anything quadratic fails at scale.On 5/25/20 3:15 AM, Max Samukha wrote:QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. The implementation of `static foreach` in DMD is based on lowering, and the existing "tuple" foreach implementation. I suspect the culprit for bad performance for very large `static foreach` is a slow expression "tuple" implementation, but I am not sure. Therefore, maybe performance could be improved by modifying the foreach expansion code such that it can accept a CTFE array instead of an expression tuple. One could also add a special-case implementation for range `static foreach` to work around inefficient CTFE.On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach is quadratic? Is this a matter of principle, or QoI?static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.How could you miss the state-of-the-art?) Preallocate the result string and use the string counter hack! http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dm -and-static-foreach
May 25 2020
On 5/25/2020 9:46 AM, Timon Gehr wrote:QoI. The original `static foreach` implementation in my compiler frontend runs in linear time, even though it is implemented in a very similar way. Note that even `iota(n).array` has superlinear running time with DMD's CTFE. The implementation of `static foreach` in DMD is based on lowering, and the existing "tuple" foreach implementation. I suspect the culprit for bad performance for very large `static foreach` is a slow expression "tuple" implementation, but I am not sure. Therefore, maybe performance could be improved by modifying the foreach expansion code such that it can accept a CTFE array instead of an expression tuple. One could also add a special-case implementation for range `static foreach` to work around inefficient CTFE.This is worth looking in to. I've had customers tell me they don't use static foreach due to performance problems.
May 25 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta. I'm seeing some improvement over 2.086 at least; it should be about equivalent to the hand-unrolled version in master.State-of-the-art is to preallocate the result string and use the string counter hack)) http://dpldocs.info/this-week-in-d/Blog.Posted_2020_05_11.html#dmd-and-static-foreach Stefan's work could make all this unnecessary but alas.
May 25 2020
On Monday, 25 May 2020 at 07:26:52 UTC, Max Samukha wrote:State-of-the-art is to preallocate the result string and use the string counter hack))In this case, it doesn't matter much because the string is generated only once per unique length, then it is cached thanks to the template arg in the helper enum. So the CTFE only actually happens once, however you do it this way. The preallocation becomes important when the generated string becomes large because it avoids the compiler leaking intermediates and wasting a LOT of time realloc and copying again - when it is just a few hundred bytes like this it doesn't really matter. When the CTFE expression finishes btw the compiler does free the memory pool. So even if you did this little thing over and over, it can be OK. But again when the generated string becomes very large, the peak memory can be prohibitive (it reaches tens of gigabytes of pure waste surprisingly quickly). BTW on ctfe cache vs recalculate it depends on how much reuse you have. In my jni.d, the first version had an inline string replace in a loop. It took 30 GB of memory [!!!] and took several minutes to compile. Simply moving that out and pregenerating a `static immutable` so CTFE was only invoked once, took it down to 3 GB and 25 seconds. Still slow, but huge, huge improvement. But that's because there were only actually two variants of that string and it was fairly large. So caching made sense (static immutable btw is a bit better than a helper template holding the result - both cache but it is like a single variable vs a hash table; in fact that's how it is implemented). But if there were hundreds of variants instead of two, the cache itself would get very expensive.
May 25 2020
On Monday, 25 May 2020 at 13:13:11 UTC, Adam D. Ruppe wrote:In this case, it doesn't matter much because the string is generated only once per unique length, then it is cached thanks to the template arg in the helper enum.Yes, I overlooked the enum. My version of static map just calls the generating function directly.So the CTFE only actually happens once, however you do it this way. + When the CTFE expression finishes btw the compiler does free the memory pool. So even if you did this little thing over and over, it can be OK. But again when the generated string becomes very large, the peak memory can be prohibitive (it reaches tens of gigabytes of pure waste surprisingly quickly). BTW on ctfe cache vs recalculate it depends on how much reuse you have. In my jni.d, the first version had an inline string replace in a loop. It took 30 GB of memory [!!!] and took several minutes to compile. Simply moving that out and pregenerating a `static immutable` so CTFE was only invoked once, took it down to 3 GB and 25 seconds. Still slow, but huge, huge improvement. But that's because there were only actually two variants of that string and it was fairly large. So caching made sense (static immutable btw is a bit better than a helper template holding the result - both cache but it is like a single variable vs a hash table; in fact that's how it is implemented). But if there were hundreds of variants instead of two, the cache itself would get very expensive.Right. This is applicable to memoization in general.
May 25 2020
On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:private enum staticMapHelper(size_t length) = staticMapHelperGen!(length);lol, also called it staticMapHelper when doing this myself! Solid result though: 2s, 810MB. I'm kinda surprised it beat the crap out of my mixin version. Mine was 3s, 1.2G using a mixin expression (alias staticMap = mixin(helper()). Changing it to a mixin declaration (the alias is in the mixin, like yours) brought it to 2.2s, 924MB. Interesting. And changing the helper from a template to a function gives another small win: 2.2s but 906MB.static foreach and stringof were used so I didn't have to pull in `format` or `to!string`, which had issues in std.meta.Yeah, format and to!string are both surprisingly heavy. Just using to!string(int) even in runtime stuff has a large hit.
May 25 2020
On Monday, 25 May 2020 at 13:00:59 UTC, Adam D. Ruppe wrote:Solid result though: 2s, 810MB.That was using the new dmd master, more specifically it was 1.6s that I rounded up. With the released dmd, 1.9s, 750MB. either way, the mixin helps here - but that may not be true if there was a variety of lengths. This test did fixed length because I wanted to specifically prove that arg length isn't as important as number of instantiations here... we should prolly test a variety of lengths too, it could change a lot.
May 25 2020
On 5/25/20 9:00 AM, Adam D. Ruppe wrote:On Monday, 25 May 2020 at 06:18:13 UTC, FeepingCreature wrote:So it seems a mixin-based version + Max's precalculated length = win!private enum staticMapHelper(size_t length) = staticMapHelperGen!(length);lol, also called it staticMapHelper when doing this myself! Solid result though: 2s, 810MB. I'm kinda surprised it beat the crap out of my mixin version. Mine was 3s, 1.2G using a mixin expression (alias staticMap = mixin(helper()). Changing it to a mixin declaration (the alias is in the mixin, like yours) brought it to 2.2s, 924MB. Interesting. And changing the helper from a template to a function gives another small win: 2.2s but 906MB.
May 25 2020
On Monday, 25 May 2020 at 16:07:06 UTC, Andrei Alexandrescu wrote:So it seems a mixin-based version + Max's precalculated length = win!Precalculated length is Adam's, and the link is to his blog. I should have stated that clearly in my post. Sorry.
May 25 2020
On Monday, 25 May 2020 at 17:08:32 UTC, Max Samukha wrote:Precalculated length is Adam'sI stole it from Stefan too IIRC. The way I see it, it is all just D community knowledge. A lot of the things I know just come from helping other users through their problems too - they point out something that is cool or is broken and by working on it together we both learn.
May 25 2020
On Monday, 25 May 2020 at 20:50:19 UTC, Adam D. Ruppe wrote:On Monday, 25 May 2020 at 17:08:32 UTC, Max Samukha wrote:Preallocating buffers for CTFE is not a new idea, but the string counter hack was new to me.Precalculated length is Adam'sI stole it from Stefan too IIRC.The way I see it, it is all just D community knowledge. A lot of the things I know just come from helping other users through their problems too - they point out something that is cool or is broken and by working on it together we both learn.Agree.
May 26 2020
On 5/25/20 2:18 AM, FeepingCreature wrote:On Monday, 25 May 2020 at 03:19:51 UTC, Adam D. Ruppe wrote:Simpler: private string staticMapHelperGen(size_t length)() { string res = "alias staticMap = AliasSeq!("; static foreach (i; 0 .. length) res ~= "F!(T[" ~ i.stringof ~ "]),"; res ~= ");"; return res; }All this was done on a stock DMD64 D Compiler v2.091.1 on linux. Several enhancements were merged to dmd master recently that may change the results btw. quick test: simple impl: 2s, 1.57G. phobos impl: 2.3s, 1.97G hand impl: 1.1s, 725M. w/ static if: 1.2s, 775M Decent time improvement across the board... but the binary got bigger at 6.4M :( (prolly druntime's fault more than anything else but idk, im not focusing on that rn)Could you also try with this version, please? template staticMap(alias F, T...) { mixin(staticMapHelper!(T.length)); } private enum staticMapHelper(size_t length) = staticMapHelperGen!(length); private string staticMapHelperGen(size_t length)() { string res = "alias staticMap = AliasSeq!("; static foreach (i; 0 .. length) { if (i) res ~= ", "; res ~= "F!(T[" ~ i.stringof ~ "])"; } res ~= ");"; return res; }
May 25 2020
On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:Consider staticMap's current version (there are a few upcoming changes that don't change the point): ... In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate: staticMap!(pred, A, B, C, D) staticMap!(pred, A, B) staticMap!(pred, A) staticMap!(pred, B) staticMap!(pred, C, D) staticMap!(pred, C) staticMap!(pred, D) staticMap!(pred, E. F, G, H) staticMap!(pred, E, F) staticMap!(pred, E) staticMap!(pred, F) staticMap!(pred, G, H) staticMap!(pred, G) staticMap!(pred, H) Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2). The more naive version would simply use linear decomposition: ... There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are: staticMap!(pred, B, C, D, E, F, G, H) staticMap!(pred, C, D, E, F, G, H) staticMap!(pred, D, E, F, G, H) staticMap!(pred, E, F, G, H) staticMap!(pred, F, G, H) staticMap!(pred, G, H) staticMap!(pred, H) Total: 8 instantiations including original. That's about half the current number of instantiations. So it seems divide et impera doesn't quite help with certain recursive templates.To add to the good previous points: the divide-and-conquer strategy can take advantage of caching for the leaf instantiations with one parameter. That is, 8 template instantiations can trivially be reused, and the others can sometimes be reused if they hit a common subtree. The linear instantiation can only be reused once or twice at the very end.
May 24 2020
On 5/25/20 1:59 AM, FeepingCreature wrote:On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:Not sure I understand. A,...,H are assumed to be distinct. The number of templates generated is the same regardless of memoization.Consider staticMap's current version (there are a few upcoming changes that don't change the point): ... In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate: staticMap!(pred, A, B, C, D) staticMap!(pred, A, B) staticMap!(pred, A) staticMap!(pred, B) staticMap!(pred, C, D) staticMap!(pred, C) staticMap!(pred, D) staticMap!(pred, E. F, G, H) staticMap!(pred, E, F) staticMap!(pred, E) staticMap!(pred, F) staticMap!(pred, G, H) staticMap!(pred, G) staticMap!(pred, H) Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2). The more naive version would simply use linear decomposition: ... There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are: staticMap!(pred, B, C, D, E, F, G, H) staticMap!(pred, C, D, E, F, G, H) staticMap!(pred, D, E, F, G, H) staticMap!(pred, E, F, G, H) staticMap!(pred, F, G, H) staticMap!(pred, G, H) staticMap!(pred, H) Total: 8 instantiations including original. That's about half the current number of instantiations. So it seems divide et impera doesn't quite help with certain recursive templates.To add to the good previous points: the divide-and-conquer strategy can take advantage of caching for the leaf instantiations with one parameter. That is, 8 template instantiations can trivially be reused, and the others can sometimes be reused if they hit a common subtree. The linear instantiation can only be reused once or twice at the very end.
May 25 2020
On Monday, 25 May 2020 at 15:59:57 UTC, Andrei Alexandrescu wrote:On 5/25/20 1:59 AM, FeepingCreature wrote:I think the idea is that you can potentially re-use template instances across separate calls to staticMap. So for example, if you have `staticMap!(A, B, C, D)` and `staticMap!(A, B, E, F)`, you get: staticMap!(pred, A, B, C, D) staticMap!(pred, A, B) staticMap!(pred, A) staticMap!(pred, B) staticMap!(pred, C, D) staticMap!(pred, C) staticMap!(pred, D) staticMap!(pred, A, B, E, F) staticMap!(pred, E, F) staticMap!(pred, E) staticMap!(pred, F)To add to the good previous points: the divide-and-conquer strategy can take advantage of caching for the leaf instantiations with one parameter. That is, 8 template instantiations can trivially be reused, and the others can sometimes be reused if they hit a common subtree. The linear instantiation can only be reused once or twice at the very end.Not sure I understand. A,...,H are assumed to be distinct. The number of templates generated is the same regardless of memoization.
May 25 2020
On Monday, 25 May 2020 at 05:59:41 UTC, FeepingCreature wrote:On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:Thank u so much for taking time to help.Consider staticMap's current version (there are a few upcoming changes that don't change the point): ... In the instantiation staticMap!(pred, A, B, C, D, E, F, G, H), recursion will instantiate: staticMap!(pred, A, B, C, D) staticMap!(pred, A, B) staticMap!(pred, A) staticMap!(pred, B) staticMap!(pred, C, D) staticMap!(pred, C) staticMap!(pred, D) staticMap!(pred, E. F, G, H) staticMap!(pred, E, F) staticMap!(pred, E) staticMap!(pred, F) staticMap!(pred, G, H) staticMap!(pred, G) staticMap!(pred, H) Total: 15 instantiations including original. This is because a full tree with 8 leaves is being created. Generally, a tree with N leaves has about 2N - 1 nodes total (exactly so when N is a power of 2). The more naive version would simply use linear decomposition: ... There' no more need to handle the 1-element case, which is already a plus. But that's not the topic here. The instantiations created are: staticMap!(pred, B, C, D, E, F, G, H) staticMap!(pred, C, D, E, F, G, H) staticMap!(pred, D, E, F, G, H) staticMap!(pred, E, F, G, H) staticMap!(pred, F, G, H) staticMap!(pred, G, H) staticMap!(pred, H) Total: 8 instantiations including original. That's about half the current number of instantiations. So it seems divide et impera doesn't quite help with certain recursive templates.To add to the good previous points: the divide-and-conquer strategy can take advantage of caching for the leaf instantiations with one parameter. That is, 8 template instantiations can trivially be reused, and the others can sometimes be reused if they hit a common subtree. The linear instantiation can only be reused once or twice at the very end.
May 26 2020
On Sunday, 24 May 2020 at 23:40:48 UTC, Andrei Alexandrescu wrote:Consider staticMap's current version (there are a few upcoming changes that don't change the point): [...]Manu's DIP does solve this problem. In O(n) guaranteed.
May 25 2020