www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Alias/template for value

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hi all,

When playing with the graph library code, I noticed something odd.

Here's a function to calculate the neighbours of a vertex:

        auto neighbours(immutable size_t v)
        {
            immutable size_t start = _sumTail[v] + _sumHead[v];
            immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
            if(!_cacheNeighbours[v])
            {
                size_t j = start;
                foreach (i; _sumTail[v] .. _sumTail[v + 1])
                {
                    _neighbours[j] = _head[_indexTail[i]];
                    ++j;
                }
                foreach (i; _sumHead[v] .. _sumHead[v + 1])
                {
                    _neighbours[j] = _tail[_indexHead[i]];
                    ++j;
                }
                assert(j == end);
                _cacheNeighbours[v] = true;
            }
            return _neighbours[start .. end];
        }

Now, I noticed that if instead of declaring the variables start, end, I instead
manually write out these expressions in the code, I get a small but consistent
speedup in the program.

So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away
unnecessary variables like this, and (ii) is there a way of declaring start/end
in the code such that at compile time the correct expression will be put in
place where it's needed?

I'm guessing some kind of template solution, but I didn't get it working (I
probably need to study templates more:-).

(... knocks head against wall to try and dislodge current micro-optimization
obsession ...)
Jul 31 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away unnecessary variables like this,
Take a look at the asm. And try to compile with ldc2 too. If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm). Bye, bearophile
Jul 31 2013
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/31/2013 01:53 PM, bearophile wrote:
 Take a look at the asm. And try to compile with ldc2 too.
In this case I observed it with both GDC and LDC.
 If gdc is missing such small optimization then I suggest to write a small
 program that shows the problem, and write a gdc bug report (that also contains
 the resulting asm).
Good call, I'll try and do this.
Jul 31 2013
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton 
Wakeling wrote:
 Hi all,

 When playing with the graph library code, I noticed something 
 odd.

 Here's a function to calculate the neighbours of a vertex:

         auto neighbours(immutable size_t v)
         {
             immutable size_t start = _sumTail[v] + _sumHead[v];
             immutable size_t end = _sumTail[v + 1] + _sumHead[v 
 + 1];
             if(!_cacheNeighbours[v])
             {
                 size_t j = start;
                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
                 {
                     _neighbours[j] = _head[_indexTail[i]];
                     ++j;
                 }
                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
                 {
                     _neighbours[j] = _tail[_indexHead[i]];
                     ++j;
                 }
                 assert(j == end);
                 _cacheNeighbours[v] = true;
             }
             return _neighbours[start .. end];
         }

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away
 unnecessary variables like this, and (ii) is there a way of 
 declaring start/end
 in the code such that at compile time the correct expression 
 will be put in
 place where it's needed?

 I'm guessing some kind of template solution, but I didn't get 
 it working (I
 probably need to study templates more:-).

 (... knocks head against wall to try and dislodge current 
 micro-optimization
 obsession ...)
compiler/version/flags? The answer to what's happening might be obvious from the assembly code for the function, if you posted it. Also, have you tried it on a different cpu?
Jul 31 2013
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/31/2013 01:56 PM, John Colvin wrote:
 compiler/version/flags?
gdmd and ldmd2, both with -O -release -inline.
 The answer to what's happening might be obvious from the assembly code for the
 function, if you posted it.
Yes, I'll get onto that. I'm completely inexperienced with assembly, but I think it's clear I have to learn ...
 Also, have you tried it on a different cpu?
No, but can and will.
Jul 31 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 I'm completely inexperienced with assembly, but I
 think it's clear I have to learn ...
Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days. Bye, bearophile
Jul 31 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 31, 2013 at 04:54:40PM +0200, bearophile wrote:
 Joseph Rushton Wakeling:
 
I'm completely inexperienced with assembly, but I
think it's clear I have to learn ...
Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days.
[...] It's completely worth the effort though, IMO. I know many people will disagree with me, but Knuth puts it best: By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. Knuth T -- My program has no bugs! Only unintentional features...
Jul 31 2013
parent reply "Dicebot" <public dicebot.lv> writes:
On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
 	People who are more than casually interested in computers 
 should
 	have at least some idea of what the underlying hardware is 
 like.
 	Otherwise the programs they write will be pretty weird.
 	-- D. Knuth
Well, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
Jul 31 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 31, 2013 at 05:07:26PM +0200, Dicebot wrote:
 On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird.
	-- D. Knuth
Well, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
I still find the ability to read machine code invaluable when debugging, esp. when trying to determine the cause of segfaults. At least at my day job, segfaults do log a stacktrace to a crash log, but there is no core dump, and on customer installations, there are no debugging symbols either, so all you have is a list of register values and stack return addresses. Being able to map that to the code by reading the disassembly of the executable allows you to pinpoint the cause of the problem, which otherwise would take many days of building and rebuilding test images, or worse, if the problem is only intermittent, it might take 5 months before QA is able to reproduce the problem reliably. In the past month or so, I've fixed at least two bugs this way, one of which only happens in a very specific network congestion state that it's pretty much impossible to reproduce without knowing what the problem was. Reading the disassembly allowed me to determine exactly which statement was dereferencing a null pointer, so that later, when I was tracing through the execution path, I immediately recognized a particular race condition that would trigger the precise symptoms I observed. Plus, I've identified the cause of a handful DMD wrong-code bugs by reading the disassembly of said wrong code. ;-) It's a pity I don't have the time to familiarize myself with DMD internals, otherwise I'd submit bug fixes too. T -- "You know, maybe we don't *need* enemies." "Yeah, best friends are about all I can take." -- Calvin & Hobbes
Jul 31 2013
prev sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 31 July 2013 at 15:07:28 UTC, Dicebot wrote:
 On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
 	People who are more than casually interested in computers 
 should
 	have at least some idea of what the underlying hardware is 
 like.
 	Otherwise the programs they write will be pretty weird.
 	-- D. Knuth
Well, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
I disagree. There is nothing in asm except how the machine works (plus a few convenience features e.g. not having to read actual opCodes in octal/hex). If you understand how the h/w works then you understand assembly code. It would simply be a matter of getting used to the notation. Plus, I would argue that learning assembly is a great path to understanding the machine itself. To be honest I can't imagine it any other way.
Jul 31 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 07/31/2013 04:56 PM, H. S. Teoh wrote:
 It's completely worth the effort though, IMO.
It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)
Jul 31 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 31, 2013 at 05:08:01PM +0200, Joseph Rushton Wakeling wrote:
 On 07/31/2013 04:56 PM, H. S. Teoh wrote:
 It's completely worth the effort though, IMO.
It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)
Really? There's still a market for that stuff? I want in! ;-) T -- Try to keep an open mind, but not so open your brain falls out. -- theboz
Jul 31 2013
prev sibling parent reply "JS" <js.mdnq gmail.com> writes:
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton 
Wakeling wrote:
 Hi all,

 When playing with the graph library code, I noticed something 
 odd.

 Here's a function to calculate the neighbours of a vertex:

         auto neighbours(immutable size_t v)
         {
             immutable size_t start = _sumTail[v] + _sumHead[v];
             immutable size_t end = _sumTail[v + 1] + _sumHead[v 
 + 1];
             if(!_cacheNeighbours[v])
             {
                 size_t j = start;
                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
                 {
                     _neighbours[j] = _head[_indexTail[i]];
                     ++j;
                 }
                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
                 {
                     _neighbours[j] = _tail[_indexHead[i]];
                     ++j;
                 }
                 assert(j == end);
                 _cacheNeighbours[v] = true;
             }
             return _neighbours[start .. end];
         }

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away
 unnecessary variables like this, and (ii) is there a way of 
 declaring start/end
 in the code such that at compile time the correct expression 
 will be put in
 place where it's needed?

 I'm guessing some kind of template solution, but I didn't get 
 it working (I
 probably need to study templates more:-).

 (... knocks head against wall to try and dislodge current 
 micro-optimization
 obsession ...)
There is no telling, as bear said, compare the disassemblies(you can try it on dpaste and compare). I do not believe D's code generation is optimal from what I have seen(at least for DMD). It could be cache misses due to pipelining issues(the order of instructions matter) or other weird stuff. It could be that when you alias start and end, D ends up using a different algorithm that somehow changes the code generation. I don't think D's code generation is even close to being as mature as many of the common C++ compilers.
Jul 31 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:
 I don't think D's code generation is even close to being as 
 mature as many of the common C++ compilers.
dmd's code generation isn't as good as e.g. gcc/clang/icc ldc and gdc use the clang and gcc backends respectively and therefore, by definition, benefit from the same code generation.
Jul 31 2013
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 31, 2013 at 06:08:23PM +0200, John Colvin wrote:
 On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:
I don't think D's code generation is even close to being as mature
as many of the common C++ compilers.
dmd's code generation isn't as good as e.g. gcc/clang/icc ldc and gdc use the clang and gcc backends respectively and therefore, by definition, benefit from the same code generation.
Keep in mind, that the clang and gcc backends have *huge* teams of developers contributing their expertise to improve the respective optimizers, whereas there are significantly fewer people working on dmd's optimizer. No matter how skill Walter is (and I don't doubt his skill -- I mean, just look at D), he can only do so much as one person. T -- Political correctness: socially-sanctioned hypocrisy.
Jul 31 2013