www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Method template optimization that works in C++, but not very well D

reply "Vladimir" <v.s.zverev gmail.com> writes:
I`m learning templates in D language and I want to thank the 
creators of the language for this.
  	
There is an educational task: process table entries, if you know 
which column to use.

Simple way to do this:
int ProcessTable(.....)
{
     int result = 0;

     foreach (ref row; table)
         result += ProccessRow(row, isUseField);

     return result;
}

int ProccessRow(.....)
{
     ....
     foreach (i, value; TableRow.Fields)
         if (isUseField[i])
            //doing something
     ....
}

In c++ I can create in compile-time special vesrion ProccessRow 
function for various combinations of the array "isUseField" 
values. This reduces the total number of run-time operations in 
most cases.

I implemented such idea, but result confused me. I used several 
compilers: dmd(version: DMD32 D Compiler v2.063.2, options: 
-inline -release -O -noboundscheck), ldmd (version: the LLVM D 
compiler (67a1a3)  based on DMD v2.063.2 and LLVM 3.2svn; 
options: -inline -release -O -noboundscheck) and gdc (version: 
4.8.1, options: -O3 -march=native -frelease -fno-bounds-check).

     Result dmd : simple processing - 39 ms; template processing - 
20 ms.
     Result ldmd: simple processing -  8 ms; template processing - 
12 ms.
     Result gdc : simple processing -  8 ms; template processing - 
14 ms.

Where I'm wrong? How  I  can rewrite the program to make effect 
more evident? Or may be, this optimization is meaningless in D...?

Full version my very simplified example such task: 
https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5

Thanks for your advice.
Feb 19 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Vladimir:

 Where I'm wrong? How  I  can rewrite the program to make effect 
 more evident? Or may be, this optimization is meaningless in 
 D...?
The L1 code cache is small and it's easy to create cache misses, so increasing the code size leads to nonlinear changes in the run-time.
 https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5
Small general suggestions for your code (untested!):
 enum TableSize = 5000000;
=> In generale in D the name of functions and variabiles start with a lower case. enum tableSize = 5_000_000;
 struct Row //of table
=> struct Row /// Row of table.
    enum NumberOptions = 1 << NumberField; // = pow(2, 
 NumberField)
=> enum NumberOptions = 2 ^^ numberField;
    uint [NumberField] Fields;
=> uint[numberField] Fields;
 Row GenerateRow()
 {
     Row row;
     row.Fields[0] = rand() % 10;
     row.Fields[1] = (rand() % 1)*(rand() % 10);
     return row;
=> Row generateRow() { return Row(uniform(0, 10), uniform(0, 2) * uniform(0, 10));
 int ProccessRow(in Row TableRow, ref bool[] isUseField)
 {
     int sum = 0;
     foreach (i, value; TableRow.Fields)
     {
         if (isUseField[i])
             sum += value * value - i;
=> int proccessRow(in Row TableRow, in bool[] isUseField) pure nothrow { int sum = 0; foreach (immutable i, immutable value; tableRow.fields) { if (isUseField[i]) sum += value ^^ 2 - i;
 template TypeTuple(T...) { alias T TypeTuple; }
=> alias TypeTuple(Types...) = Types;
 ProccessRowPtr [Row.NumberOptions] filters;
=> __gshared proccessRowPtr[row.numberOptions] filters; Bye, bearophile
Feb 19 2014
parent "Vladimir" <v.s.zverev gmail.com> writes:
On Wednesday, 19 February 2014 at 15:40:41 UTC, bearophile wrote:

 ... In generale in D the name of functions and variabiles start 
 with a lower case....
I apologize that I did not checked the style guide. I rewrote my example according yours recommendation. Thank you.
 alias TypeTuple(Types...) = Types;
This does not work for me. Error: function declaration without return type. (Note that constructors are always named 'this') Error: no identifier for declarator TypeTuple(T...) Error: alias cannot have initializer
Feb 19 2014
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/19/2014 07:13 AM, Vladimir wrote:

 I`m learning templates in D language and I want to thank the creators of
 the language for this.

 There is an educational task: process table entries, if you know which
 column to use.
[...]
 In c++ I can create in compile-time special vesrion ProccessRow function
 for various combinations of the array "isUseField" values. This reduces
 the total number of run-time operations in most cases.
Can you show the C++ code perhaps on a simpler task? I would expect D's compile-time features to be superior to C++'s in all cases. (And I would like to be educated if it is not so. :) )
 I implemented such idea, but result confused me. I used several
 compilers: dmd(version: DMD32 D Compiler v2.063.2, options: -inline
 -release -O -noboundscheck), ldmd (version: the LLVM D compiler
 (67a1a3)  based on DMD v2.063.2 and LLVM 3.2svn; options: -inline
 -release -O -noboundscheck) and gdc (version: 4.8.1, options: -O3
 -march=native -frelease -fno-bounds-check).

      Result dmd : simple processing - 39 ms; template processing - 20 ms.
      Result ldmd: simple processing -  8 ms; template processing - 12 ms.
      Result gdc : simple processing -  8 ms; template processing - 14 ms.

 Where I'm wrong? How  I  can rewrite the program to make effect more
 evident?
I would like to understand the request first. Neither ProccessRow() nor OptimizedProcessTable() take isUseField as template parameters. Besides, isUseField is generated at runtime. No, I don't understand. :)
 Or may be, this optimization is meaningless in D...?
I doubt it.
 Full version my very simplified example such task:
 https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5

 Thanks for your advice.
Ali
Feb 19 2014
parent reply "Vladimir" <v.s.zverev gmail.com> writes:
 I would like to understand the request first. Neither 
 ProccessRow() nor OptimizedProcessTable() take isUseField as 
 template parameters. Besides, isUseField is generated at 
 runtime. No, I don't understand. :)
Yes, isUseField is generated at runtime. For example, user can select the column that he want to sort. And he chose the first column. This means that isUseField[0] = true; isUseField[1] = false. The function getIndex must return 1. And want to call compile-time generated function proccessRow_1 (..) { //do something only with the field1 without additional checks. } instead of full version proccessRow. I admit that too much simplified example.
 Can you show the C++ code perhaps on a simpler task? I would 
 expect D's compile-time features to be superior to C++'s in all 
 cases. (And I would like to be educated if it is not so. :) )
My example is a simplification of this https://github.com/AlexeyAB/cpp_find_order/blob/06a80340ff403c4693bfe0ff8b80584f029c71a3/main.cpp. I gave a link to the most simple implementation of the three versions of the program. Code looked slightly horrible for me at first:)
 Or may be, this optimization is meaningless in D...?
I doubt it.
I also doubt it.
Full version my very simplified example such task:  
https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5
Feb 19 2014
next sibling parent "Vladimir" <v.s.zverev gmail.com> writes:
 https://github.com/AlexeyAB/cpp_find_order/blob/06a80340ff403c4693bfe0ff8b80584f029c71a3/main.cpp.
I forgot to say that if you want to compile this program with help gcc you must use options: -O3 -march=native -std=gnu++11
Feb 19 2014
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/19/2014 10:18 PM, Vladimir wrote:

 Full version my very simplified example such task:
 https://gist.github.com/Vladimir-Z/1a1755ce91cb0e7636b5
Notes: * It is "process", not "proccess". :) * isUseField is not used by proccessRowTemplate(). Removing it from the parameter list (and doing necessary modifications elsewhere) reduced the optimized time from 18ms to 15ms for me. * Applied ^^ operator for proccessRowTemplate as well. (No effect on performance.) * I liked the simple template syntax for proccessRowTemplate: int proccessRowTemplate(size_t optionVariant)(in Row table) { int sum = 0; foreach(size_t i; StaticRange!(Row.numberField)) { static if (optionVariant & 1<<i) { sum += table.Fields[i] ^^ 2 - i; } } return sum; } Results on my system after that change: Process table result: 142459540 Simple processing: 35 ms Process table result: 142459540 Template processing: 15 ms Ali
Feb 20 2014
parent "Vladimir" <v.s.zverev gmail.com> writes:
On Thursday, 20 February 2014 at 22:36:58 UTC, Ali Çehreli wrote:
 int proccessRowTemplate(size_t optionVariant)(in Row table)
 {
     int sum = 0;
     foreach(size_t i; StaticRange!(Row.numberField))
     {
         static if (optionVariant & 1<<i)
         {
             sum += table.Fields[i] ^^ 2 - i;
         }
     }
     return sum;
 }
This version of the program looks more graceful than mine, and works faster. My example in github is updated.
 Results on my system after that change:...
 Simple processing: 35 ms
 ...
 Template processing: 15 ms
I got same result in my system for dmd. It is good. But for ldc and gdc the result is almost unchanged. May be there are more ideas why this optimization does not help in cases gdc and ldc? I looked at the llvm-ir code my example ... At first glance, there is exactly what it should be. In any case, thank you
Feb 21 2014