digitalmars.D - dconf dod parser is 20% slower thoery
- monkyyy (58/58) Jul 14 https://www.youtube.com/watch?v=s_1OG9GwyOw
- Robert Schadek (34/34) Jul 18 On Monday, 14 July 2025 at 08:01:42 UTC, monkyyy wrote:
- monkyyy (45/49) Jul 18 Im still seeing an extra layer of indirection.
https://www.youtube.com/watch?v=s_1OG9GwyOw https://github.com/burner/Darser/blob/dbe6269aba4abce3b0c6ac07d00da6c86ee7c758/presentation/example.d#L150 ```d struct Parser { Document[] documents; Definitions[] definitionss; Definition[] definitions; OperationDefinition[] operationDefinitions; SelectionSet[] selectionSets; OperationType[] operationTypes; Selections[] selectionss; Selection[] selections; FragmentSpread[] fragmentSpreads; InlineFragment[] inlineFragments; Field[] fields; FieldName[] fieldNames; Arguments[] argumentss; ArgumentList[] argumentLists; Argument[] arguments; FragmentDefinition[] fragmentDefinitions; Directives[] directivess; Directive[] directives; VariableDefinitions[] variableDefinitionss; VariableDefinitionList[] variableDefinitionLists; VariableDefinition[] variableDefinitions; Variable[] variables; DefaultValue[] defaultValues; ValueOrVariable[] valueOrVariables; Value[] values; Type[] types; ``` this isnt quite right for soa this is adding an extra layer of indirection if you have some n^2 iteration lets say ```d foreach(i;0..selections.length){ foreach(j;0..fields.length){ ref selection()=>selections[i]; ref field()=>fields[j]; foo(selection,fields); }} ``` The pointer math isnt being automagicly done *compile time* by the optimizer in the way this would: ```d struct Parser(int length){ Selection[length] selections; Field[length] fields; ... ``` (it cant be fully optimized at compile time selections[3].offset-fields[3].offset changes at runtime) in the original code theres extra indirection, a bounds check that may not disappear(because *.length== all other *.length is not generteed), etc. Ive gave up on aosoa, it probaly needs compiler support, but thats the thoery for a reason; I think lib level we should have a different target for making some bread and butter dod patterns
Jul 14
On Monday, 14 July 2025 at 08:01:42 UTC, monkyyy wrote: Thank you for taking an interest. I thought about this a bit more and I think one issue is also that stuff is loaded one L1 cache line at a time, and the .sizeof of each struct is a lot small than one L1 cache line. Leading to waste. The parse tree being a tree will likely require quite a bit of index/pointer chasing. Resulting in more loads. That begin said, what would be a nice try is to turn this. ```d struct Parser { Document[] documents; Definitions[] definitionss; Definition[] definitions; OperationDefinition[] operationDefinitions; SelectionSet[] selectionSets; } ``` into ```d alias Nodes = SumType!(Document , Definitions , Definition , OperationDefinition , SelectionSet); struct Parser { Nodes[] nodes; } ``` The problem then is that the tree turned into an array should be accessed basically front to back to minimize cache misses. That might be difficult as its a tree, but hey, there is always something. If you're interested the parser generator branch that gives the individual structs is here https://github.com/burner/Darser/tree/ddd_parser
Jul 18
On Friday, 18 July 2025 at 08:53:23 UTC, Robert Schadek wrote:SumTypestruct Parser { Nodes[] nodes;Im still seeing an extra layer of indirection. consider int[100], int[] and int[int] and the data 1->3, 3->2, 2->4, 4->(-1) the ideal is int[100], it will allocate either on the stack or global scope fast has as possible and indexing it should always be inlinable, and I trust even the dumbest prefetcher to see "oh a 1 wants to see[3]" int[int] is far from ideal but the hash maybe the prefetcher decides to untangle it and I imagine you used it in places for your nondata oriented way int[] has 3 pieces that needs to be calculated at runtime `int[].ptr`, `int[].length` and `int[3]`, the pefetcher has to a) think that "3 is probably in that" b) the asm has to have a int[].ptr registor, taking up some cache line d) the prefetcher has to be able to combine `(int[].ptr + 3*int.sizeof)` which may as well be the hash of the hashtable thats math with several steps, e) then a swap context to druntimes assert(3<int[].length) f) then swap back --- int[] is fast enough *as an range* because it has 2 iterations to find [0] but then its ++ing while checking a length, an ancient c for loop, the aa's are likely better for trees unless you go all the way --- You have a big old block of memory, put it in a big static array or inline void* math for it to be data oriented design Im struggling with getting it to work with you know none of my dips ever being accepted espeally none of the meta programming ones(my current best solution is opend dependent); but I think youd need a pattern of structs that accept pointer abstractions from an array ```d struct foo(P){ ... P left; P right; } struct array(alias T,int N){ alias P=... T!P[N] data; ... } array!(foo,100) bar; ```
Jul 18