www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - join() in CTFE very low performance

reply realhet <real_het hotmail.com> writes:
Hello,

I have an array of array of strings, a 2D table encapsulated in a 
struct:

The first few rows look like this.
```d
enum TBL_niceExpressionTemplates =
(表([
	[q{/+Note: Name+/},q{/+Note: Example+/},q{/+Note: 
Pattern+/},q{/+Note: op+/},q{/+Note: Style+/},q{/+Note: 
Syntax+/},q{/+Note: Class+/},q{/+Note: Scripts  init:  text  node 
 draw  ui+/}],
	[q{null_},q{},q{/+Code:+/},q{""},q{dim},q{Whitespace},q{NiceExpression},q{}],
	[q{magnitude},q{(magnitude(a))},q{/+Code: 
(op(expr))+/},q{"magnitude"},q{dim},q{Symbol},q{NiceExpression},q{ text:
put(operator); op(0);  node: put('|'); op(0); put('|'); }],
	[q{normalize},q{(normalize(a))},q{/+Code: 
(op(expr))+/},q{"normalize"},q{dim},q{Symbol},q{NiceExpression},q{ text:
put(operator); op(0);  node: put('‖'); op(0); put('‖'); }],
...
]));
```
I have around 50 rows total, not much for a computer.

Then I use the 'table' and try to generate an actual static 
immutable array of runtime structs.

I use an own makeNiceExpressionTemplate(string[] args) function 
to convert those table rows into the runtime used structs.

```d
static immutable niceExpressionTemplates = 
TBL_niceExpressionTemplates.rows.map!makeNiceExpressionTemplate.array;
```
This method is perfectly fine, it must be under a millisecond, I 
can't see it in the ftime-trace.

But when I try to put this table together using a string mixin, 
it goes extremelyi slow:
```d
mixin(iq{static immutable niceExpressionTemplates = 
[$(TBL_niceExpressionTemplates.rows.map!((r)=>(iq{makeNiceExpressionTemplate($(r.text))}
text)).join(','))]; }.text)
```

It took 2.6 seconds!!!

It concatenates 50+ strings like this -> 
makeNiceExpressionTemplate(["null_", "", "/+Code:+/", "\"\"", 
"dim", "Whitespace", "NiceExpression", ""]),
into a single long string.
Put an array declaration 'container' around it, and then gives it 
to mixin()

Then the weird thing happens:
Even the makeNiceExpressionTemplate() is called with the sampe 
parameter: a string array, it generates all the code for it again 
and again.
Exactly that many times as the join() template is executed on it.

I narrowed down the code as much as possible:
```d
static immutable very_slow_operation = 
TBL_niceExpressionTemplates.rows.map!text.join(',');
```
It is a combination of text(), join(), and formatting string 
arrays to text.

My only question is why?
What is the exact thing I should avoid, why join() recompiles 
their iterations from zero all the time?

Thank You in advance!
Jan 04
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Saturday, 4 January 2025 at 13:56:47 UTC, realhet wrote:
 Hello,

 I have an array of array of strings, a 2D table encapsulated in 
 a struct:

 The first few rows look like this.
 ```d
 enum TBL_niceExpressionTemplates =
 (表([
 	[q{/+Note: Name+/},q{/+Note: Example+/},q{/+Note: 
 Pattern+/},q{/+Note: op+/},q{/+Note: Style+/},q{/+Note: 
 Syntax+/},q{/+Note: Class+/},q{/+Note: Scripts  init:  text 
  node  draw  ui+/}],
 	[q{null_},q{},q{/+Code:+/},q{""},q{dim},q{Whitespace},q{NiceExpression},q{}],
 	[q{magnitude},q{(magnitude(a))},q{/+Code: 
 (op(expr))+/},q{"magnitude"},q{dim},q{Symbol},q{NiceExpression},q{ text:
put(operator); op(0);  node: put('|'); op(0); put('|'); }],
 	[q{normalize},q{(normalize(a))},q{/+Code: 
 (op(expr))+/},q{"normalize"},q{dim},q{Symbol},q{NiceExpression},q{ text:
put(operator); op(0);  node: put('‖'); op(0); put('‖'); }],
 ...
 ]));
 ```
 I have around 50 rows total, not much for a computer.

 Then I use the 'table' and try to generate an actual static 
 immutable array of runtime structs.

 I use an own makeNiceExpressionTemplate(string[] args) function 
 to convert those table rows into the runtime used structs.

 ```d
 static immutable niceExpressionTemplates = 
 TBL_niceExpressionTemplates.rows.map!makeNiceExpressionTemplate.array;
 ```
 This method is perfectly fine, it must be under a millisecond, 
 I can't see it in the ftime-trace.

 But when I try to put this table together using a string mixin, 
 it goes extremelyi slow:
 ```d
 mixin(iq{static immutable niceExpressionTemplates = 
 [$(TBL_niceExpressionTemplates.rows.map!((r)=>(iq{makeNiceExpressionTemplate($(r.text))}
text)).join(','))]; }.text)
 ```

 It took 2.6 seconds!!!

 It concatenates 50+ strings like this -> 
 makeNiceExpressionTemplate(["null_", "", "/+Code:+/", "\"\"", 
 "dim", "Whitespace", "NiceExpression", ""]),
 into a single long string.
 Put an array declaration 'container' around it, and then gives 
 it to mixin()

 Then the weird thing happens:
 Even the makeNiceExpressionTemplate() is called with the sampe 
 parameter: a string array, it generates all the code for it 
 again and again.
 Exactly that many times as the join() template is executed on 
 it.

 I narrowed down the code as much as possible:
 ```d
 static immutable very_slow_operation = 
 TBL_niceExpressionTemplates.rows.map!text.join(',');
 ```
 It is a combination of text(), join(), and formatting string 
 arrays to text.

 My only question is why?
 What is the exact thing I should avoid, why join() recompiles 
 their iterations from zero all the time?

 Thank You in advance!
Id check that it is a string and not some sort of lazy wrapper type doing worse of both world things; I usually use `enum string[]` for mixin-y things when possible, idk what style your doing and the q{} may have some extra logic.
Jan 04
parent reply realhet <real_het hotmail.com> writes:
On Saturday, 4 January 2025 at 15:34:27 UTC, monkyyy wrote:
 On Saturday, 4 January 2025 at 13:56:47 UTC, realhet wrote:
 Id check that it is a string and not some sort of lazy wrapper 
 type doing worse of both world things; I usually use `enum 
 string[]` for mixin-y things when possible, idk what style your 
 doing and the q{} may have some extra logic.
That style indeed makes no sense on text mode. Here's what it looks in graphic mode: https://youtu.be/8brvCoMaWyQ At the end of the video I've tried out 4 versions. The first one is super fast (--ftime-trace) -> It puts the table in the global scope, then the mixin just inject a simple transformation expression on it so the string[][] table is transformed into a Struct[] in compile time. But the big difference is that the later 3 versions are putting data onto the string surface of the mixin. They use the universal std text() template function to do that. And that is working extremely slow inside the context of the Compile Time. When I look at the --ftime-trace, I see text() formatValue() everywhere. It feels like the CT version of text() does everything by the limited but safe tools of CT environment, something like CtRegexpr. It looks like they discover their parameter signatures every time from zero. They can't remember that they already compiled text(string[]) formatValue(string), etc. Both of those CT things (text, format, regexpr) are awesome but I guess I should avoid them while using string mixins. Maybe those 'lazy wrappers' you mentioned can be inside text()? My wrapper struct named with a chinese character is rellly simple: struct S{ string[][] rows; } first member is the table rows with the cells. q{} always worked perfectly. I have no fear of that. Also no problems with the new goodies: iq{} and ${} I'm testing them like crazy, I really like them. Only the very complex stuff works weird in CT -> text, format...
Jan 04
next sibling parent realhet <real_het hotmail.com> writes:
On Saturday, 4 January 2025 at 19:54:19 UTC, realhet wrote:
 On Saturday, 4 January 2025 at 15:34:27 UTC, monkyyy wrote:
 On Saturday, 4 January 2025 at 13:56:47 UTC, realhet wrote:
Only the very complex stuff works weird in CT -> text, format...
I think I've found the solution, it was so simple, that's why I wasn't able to see it :D ```d mixin template INJECTOR_TEMPLATE(表 table, string script) { mixin(script); } ``` The proper way to pass large amount of data through the 'membrane' of compile-time and run-time is a *mixin template*. NOT a string mixin combined with the safest and most platform independent version of the text() function (for arrays and structs). With the mixin template arguments, the data will always stay in binary form, no slow textual form is needed. It seems like I like to learn the hard way. But it's so difficult to stop thinking with the good old C preprocessor way. The classic leg shooting way of thinking is still strong in me :D
Jan 04
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Saturday, 4 January 2025 at 19:54:19 UTC, realhet wrote:
 
 It looks like they discover their parameter signatures every 
 time from zero.

 Maybe those 'lazy wrappers' you mentioned can be inside text()?
While it usually would be correct to assume Im being informal, in this case I wasn't; form functional programming "lazy" v "eager" are formal terms in english; idk what they are in chinese. That is the definition. I suggest adding in memorized: and for any computation you can have the following outcomes: lazy: may run 0, or infite amount of times, doesnt allocate eager: will run once, will always allocate memo: may run 0 or once, may allocate There are times when ct is worse when lazy but the std is(and should stay) default lazy, so Id suggest a pattern of typing your enums `enum string foo=...`, to attempt to get an eager result
Jan 04