digitalmars.D.learn - Method template optimization that works in C++, but not very well D
- Vladimir (42/42) Feb 19 2014 I`m learning templates in D language and I want to thank the
- bearophile (35/59) Feb 19 2014 The L1 code cache is small and it's easy to create cache misses,
- Vladimir (8/11) Feb 19 2014 I apologize that I did not checked the style guide. I rewrote my
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (10/32) Feb 19 2014 Can you show the C++ code perhaps on a simpler task? I would expect D's
- Vladimir (17/28) Feb 19 2014 Yes, isUseField is generated at runtime. For example, user can
- Vladimir (2/3) Feb 19 2014 I forgot to say that if you want to compile this program with
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (27/29) Feb 20 2014 Notes:
- Vladimir (9/25) Feb 21 2014 This version of the program looks more graceful than mine, and
I`m learning templates in D language and I want to thank the creators of the language for this. There is an educational task: process table entries, if you know which column to use. Simple way to do this: int ProcessTable(.....) { int result = 0; foreach (ref row; table) result += ProccessRow(row, isUseField); return result; } int ProccessRow(.....) { .... foreach (i, value; TableRow.Fields) if (isUseField[i]) //doing something .... } In c++ I can create in compile-time special vesrion ProccessRow function for various combinations of the array "isUseField" values. This reduces the total number of run-time operations in most cases. I implemented such idea, but result confused me. I used several compilers: dmd(version: DMD32 D Compiler v2.063.2, options: -inline -release -O -noboundscheck), ldmd (version: the LLVM D compiler (67a1a3) based on DMD v2.063.2 and LLVM 3.2svn; options: -inline -release -O -noboundscheck) and gdc (version: 4.8.1, options: -O3 -march=native -frelease -fno-bounds-check). Result dmd : simple processing - 39 ms; template processing - 20 ms. Result ldmd: simple processing - 8 ms; template processing - 12 ms. Result gdc : simple processing - 8 ms; template processing - 14 ms. Where I'm wrong? How I can rewrite the program to make effect more evident? Or may be, this optimization is meaningless in D...? Full version my very simplified example such task: https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5 Thanks for your advice.
Feb 19 2014
Vladimir:Where I'm wrong? How I can rewrite the program to make effect more evident? Or may be, this optimization is meaningless in D...?The L1 code cache is small and it's easy to create cache misses, so increasing the code size leads to nonlinear changes in the run-time.https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5Small general suggestions for your code (untested!):enum TableSize = 5000000;=> In generale in D the name of functions and variabiles start with a lower case. enum tableSize = 5_000_000;struct Row //of table=> struct Row /// Row of table.enum NumberOptions = 1 << NumberField; // = pow(2, NumberField)=> enum NumberOptions = 2 ^^ numberField;uint [NumberField] Fields;=> uint[numberField] Fields;Row GenerateRow() { Row row; row.Fields[0] = rand() % 10; row.Fields[1] = (rand() % 1)*(rand() % 10); return row;=> Row generateRow() { return Row(uniform(0, 10), uniform(0, 2) * uniform(0, 10));int ProccessRow(in Row TableRow, ref bool[] isUseField) { int sum = 0; foreach (i, value; TableRow.Fields) { if (isUseField[i]) sum += value * value - i;=> int proccessRow(in Row TableRow, in bool[] isUseField) pure nothrow { int sum = 0; foreach (immutable i, immutable value; tableRow.fields) { if (isUseField[i]) sum += value ^^ 2 - i;template TypeTuple(T...) { alias T TypeTuple; }=> alias TypeTuple(Types...) = Types;ProccessRowPtr [Row.NumberOptions] filters;=> __gshared proccessRowPtr[row.numberOptions] filters; Bye, bearophile
Feb 19 2014
On Wednesday, 19 February 2014 at 15:40:41 UTC, bearophile wrote:... In generale in D the name of functions and variabiles start with a lower case....I apologize that I did not checked the style guide. I rewrote my example according yours recommendation. Thank you.alias TypeTuple(Types...) = Types;This does not work for me. Error: function declaration without return type. (Note that constructors are always named 'this') Error: no identifier for declarator TypeTuple(T...) Error: alias cannot have initializer
Feb 19 2014
On 02/19/2014 07:13 AM, Vladimir wrote:I`m learning templates in D language and I want to thank the creators of the language for this. There is an educational task: process table entries, if you know which column to use.[...]In c++ I can create in compile-time special vesrion ProccessRow function for various combinations of the array "isUseField" values. This reduces the total number of run-time operations in most cases.Can you show the C++ code perhaps on a simpler task? I would expect D's compile-time features to be superior to C++'s in all cases. (And I would like to be educated if it is not so. :) )I implemented such idea, but result confused me. I used several compilers: dmd(version: DMD32 D Compiler v2.063.2, options: -inline -release -O -noboundscheck), ldmd (version: the LLVM D compiler (67a1a3) based on DMD v2.063.2 and LLVM 3.2svn; options: -inline -release -O -noboundscheck) and gdc (version: 4.8.1, options: -O3 -march=native -frelease -fno-bounds-check). Result dmd : simple processing - 39 ms; template processing - 20 ms. Result ldmd: simple processing - 8 ms; template processing - 12 ms. Result gdc : simple processing - 8 ms; template processing - 14 ms. Where I'm wrong? How I can rewrite the program to make effect more evident?I would like to understand the request first. Neither ProccessRow() nor OptimizedProcessTable() take isUseField as template parameters. Besides, isUseField is generated at runtime. No, I don't understand. :)Or may be, this optimization is meaningless in D...?I doubt it.Full version my very simplified example such task: https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5 Thanks for your advice.Ali
Feb 19 2014
I would like to understand the request first. Neither ProccessRow() nor OptimizedProcessTable() take isUseField as template parameters. Besides, isUseField is generated at runtime. No, I don't understand. :)Yes, isUseField is generated at runtime. For example, user can select the column that he want to sort. And he chose the first column. This means that isUseField[0] = true; isUseField[1] = false. The function getIndex must return 1. And want to call compile-time generated function proccessRow_1 (..) { //do something only with the field1 without additional checks. } instead of full version proccessRow. I admit that too much simplified example.Can you show the C++ code perhaps on a simpler task? I would expect D's compile-time features to be superior to C++'s in all cases. (And I would like to be educated if it is not so. :) )My example is a simplification of this https://github.com/AlexeyAB/cpp_find_order/blob/06a80340ff403c4693bfe0ff8b80584f029c71a3/main.cpp. I gave a link to the most simple implementation of the three versions of the program. Code looked slightly horrible for me at first:)I also doubt it.Or may be, this optimization is meaningless in D...?I doubt it.Full version my very simplified example such task: https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5
Feb 19 2014
https://github.com/AlexeyAB/cpp_find_order/blob/06a80340ff403c4693bfe0ff8b80584f029c71a3/main.cpp.I forgot to say that if you want to compile this program with help gcc you must use options: -O3 -march=native -std=gnu++11
Feb 19 2014
On 02/19/2014 10:18 PM, Vladimir wrote:Notes: * It is "process", not "proccess". :) * isUseField is not used by proccessRowTemplate(). Removing it from the parameter list (and doing necessary modifications elsewhere) reduced the optimized time from 18ms to 15ms for me. * Applied ^^ operator for proccessRowTemplate as well. (No effect on performance.) * I liked the simple template syntax for proccessRowTemplate: int proccessRowTemplate(size_t optionVariant)(in Row table) { int sum = 0; foreach(size_t i; StaticRange!(Row.numberField)) { static if (optionVariant & 1<<i) { sum += table.Fields[i] ^^ 2 - i; } } return sum; } Results on my system after that change: Process table result: 142459540 Simple processing: 35 ms Process table result: 142459540 Template processing: 15 ms AliFull version my very simplified example such task: https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5
Feb 20 2014
On Thursday, 20 February 2014 at 22:36:58 UTC, Ali Çehreli wrote:int proccessRowTemplate(size_t optionVariant)(in Row table) { int sum = 0; foreach(size_t i; StaticRange!(Row.numberField)) { static if (optionVariant & 1<<i) { sum += table.Fields[i] ^^ 2 - i; } } return sum; }This version of the program looks more graceful than mine, and works faster. My example in github is updated.Results on my system after that change:... Simple processing: 35 ms ... Template processing: 15 msI got same result in my system for dmd. It is good. But for ldc and gdc the result is almost unchanged. May be there are more ideas why this optimization does not help in cases gdc and ldc? I looked at the llvm-ir code my example ... At first glance, there is exactly what it should be. In any case, thank you
Feb 21 2014