www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dconf dod parser is 20% slower thoery

reply monkyyy <crazymonkyyy gmail.com> writes:
https://www.youtube.com/watch?v=s_1OG9GwyOw

https://github.com/burner/Darser/blob/dbe6269aba4abce3b0c6ac07d00da6c86ee7c758/presentation/example.d#L150

```d
struct Parser {
	Document[] documents;
	Definitions[] definitionss;
	Definition[] definitions;
	OperationDefinition[] operationDefinitions;
	SelectionSet[] selectionSets;
	OperationType[] operationTypes;
	Selections[] selectionss;
	Selection[] selections;
	FragmentSpread[] fragmentSpreads;
	InlineFragment[] inlineFragments;
	Field[] fields;
	FieldName[] fieldNames;
	Arguments[] argumentss;
	ArgumentList[] argumentLists;
	Argument[] arguments;
	FragmentDefinition[] fragmentDefinitions;
	Directives[] directivess;
	Directive[] directives;
	VariableDefinitions[] variableDefinitionss;
	VariableDefinitionList[] variableDefinitionLists;
	VariableDefinition[] variableDefinitions;
	Variable[] variables;
	DefaultValue[] defaultValues;
	ValueOrVariable[] valueOrVariables;
	Value[] values;
	Type[] types;
```

this isnt quite right for soa

this is adding an extra layer of indirection

if you have some n^2 iteration lets say
```d
foreach(i;0..selections.length){
foreach(j;0..fields.length){
   ref selection()=>selections[i];
   ref field()=>fields[j];
   foo(selection,fields);
}}
```

The pointer math isnt being automagicly done *compile time* by 
the optimizer in the way this would:


```d
struct Parser(int length){
   Selection[length] selections;
   Field[length] fields;
   ...
```

(it cant be fully optimized at compile time 
selections[3].offset-fields[3].offset changes at runtime)

in the original code theres extra indirection, a bounds check 
that may not disappear(because *.length== all other *.length is 
not generteed), etc.

Ive gave up on aosoa, it probaly needs compiler support, but 
thats the thoery for a reason; I think lib level we should have a 
different target for making some bread and butter dod patterns
Jul 14
parent reply Robert Schadek <rburners gmail.com> writes:
On Monday, 14 July 2025 at 08:01:42 UTC, monkyyy wrote:

Thank you for taking an interest.

I thought about this a bit more and I think one issue is also 
that stuff is loaded one L1 cache line at a time, and the .sizeof 
of each struct is a lot small than one L1 cache line. Leading to 
waste. The parse tree being a tree will likely require quite a 
bit of index/pointer chasing. Resulting in more loads.

That begin said, what would be a nice try is to turn this.

```d
struct Parser {
	Document[] documents;
	Definitions[] definitionss;
	Definition[] definitions;
	OperationDefinition[] operationDefinitions;
	SelectionSet[] selectionSets;
}
```

into

```d
alias Nodes = SumType!(Document
, Definitions
, Definition
, OperationDefinition
, SelectionSet);

struct Parser {
   Nodes[] nodes;
}
```

The problem then is that the tree turned into an array should be 
accessed basically front to back to minimize cache misses. That 
might be difficult as its a tree, but hey, there is always 
something. If you're interested the parser generator branch that 
gives the individual structs is here 
https://github.com/burner/Darser/tree/ddd_parser
Jul 18
parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 18 July 2025 at 08:53:23 UTC, Robert Schadek wrote:
 
 SumType
 struct Parser {
   Nodes[] nodes;
Im still seeing an extra layer of indirection. consider int[100], int[] and int[int] and the data 1->3, 3->2, 2->4, 4->(-1) the ideal is int[100], it will allocate either on the stack or global scope fast has as possible and indexing it should always be inlinable, and I trust even the dumbest prefetcher to see "oh a 1 wants to see[3]" int[int] is far from ideal but the hash maybe the prefetcher decides to untangle it and I imagine you used it in places for your nondata oriented way int[] has 3 pieces that needs to be calculated at runtime `int[].ptr`, `int[].length` and `int[3]`, the pefetcher has to a) think that "3 is probably in that" b) the asm has to have a int[].ptr registor, taking up some cache line d) the prefetcher has to be able to combine `(int[].ptr + 3*int.sizeof)` which may as well be the hash of the hashtable thats math with several steps, e) then a swap context to druntimes assert(3<int[].length) f) then swap back --- int[] is fast enough *as an range* because it has 2 iterations to find [0] but then its ++ing while checking a length, an ancient c for loop, the aa's are likely better for trees unless you go all the way --- You have a big old block of memory, put it in a big static array or inline void* math for it to be data oriented design Im struggling with getting it to work with you know none of my dips ever being accepted espeally none of the meta programming ones(my current best solution is opend dependent); but I think youd need a pattern of structs that accept pointer abstractions from an array ```d struct foo(P){ ... P left; P right; } struct array(alias T,int N){ alias P=... T!P[N] data; ... } array!(foo,100) bar; ```
Jul 18