digitalmars.D.learn - Alias/template for value

Joseph Rushton Wakeling (36/36) Jul 31 2013 Hi all,

bearophile (7/14) Jul 31 2013 Take a look at the asm. And try to compile with ldc2 too.

Joseph Rushton Wakeling (3/7) Jul 31 2013 Good call, I'll try and do this.

John Colvin (6/51) Jul 31 2013 compiler/version/flags?

Joseph Rushton Wakeling (5/9) Jul 31 2013 Yes, I'll get onto that. I'm completely inexperienced with assembly, bu...

bearophile (7/9) Jul 31 2013 Learning to write very efficient asm takes several months or a

H. S. Teoh (15/24) Jul 31 2013 [...]

Dicebot (4/10) Jul 31 2013 Well, while I do agree in general, there is a huge difference

H. S. Teoh (26/35) Jul 31 2013 I still find the ability to read machine code invaluable when debugging,
John Colvin (9/19) Jul 31 2013 I disagree. There is nothing in asm except how the machine works

Joseph Rushton Wakeling (3/4) Jul 31 2013 It will mean I can finally talk programming with my dad, who _only_ know...
H. S. Teoh (5/10) Jul 31 2013 Really? There's still a market for that stuff? I want in! ;-)

JS (12/57) Jul 31 2013 There is no telling, as bear said, compare the disassemblies(you

John Colvin (4/6) Jul 31 2013 dmd's code generation isn't as good as e.g. gcc/clang/icc

H. S. Teoh (9/17) Jul 31 2013 Keep in mind, that the clang and gcc backends have *huge* teams of

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Hi all,

When playing with the graph library code, I noticed something odd.

Here's a function to calculate the neighbours of a vertex:

        auto neighbours(immutable size_t v)
        {
            immutable size_t start = _sumTail[v] + _sumHead[v];
            immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1];
            if(!_cacheNeighbours[v])
            {
                size_t j = start;
                foreach (i; _sumTail[v] .. _sumTail[v + 1])
                {
                    _neighbours[j] = _head[_indexTail[i]];
                    ++j;
                }
                foreach (i; _sumHead[v] .. _sumHead[v + 1])
                {
                    _neighbours[j] = _tail[_indexHead[i]];
                    ++j;
                }
                assert(j == end);
                _cacheNeighbours[v] = true;
            }
            return _neighbours[start .. end];
        }

Now, I noticed that if instead of declaring the variables start, end, I instead
manually write out these expressions in the code, I get a small but consistent
speedup in the program.

So, I'm curious (i) Why?  As I'd have assumed the compiler could optimize away
unnecessary variables like this, and (ii) is there a way of declaring start/end
in the code such that at compile time the correct expression will be put in
place where it's needed?

I'm guessing some kind of template solution, but I didn't get it working (I
probably need to study templates more:-).

(... knocks head against wall to try and dislodge current micro-optimization
obsession ...)

Jul 31 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Joseph Rushton Wakeling:

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away unnecessary variables like this,

Take a look at the asm. And try to compile with ldc2 too.

If gdc is missing such small optimization then I suggest to write 
a small program that shows the problem, and write a gdc bug 
report (that also contains the resulting asm).

Bye,
bearophile

Jul 31 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 07/31/2013 01:53 PM, bearophile wrote:
 Take a look at the asm. And try to compile with ldc2 too.

In this case I observed it with both GDC and LDC.

 If gdc is missing such small optimization then I suggest to write a small
 program that shows the problem, and write a gdc bug report (that also contains
 the resulting asm).

Good call, I'll try and do this.

Jul 31 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton 
Wakeling wrote:
 Hi all,

 When playing with the graph library code, I noticed something 
 odd.

 Here's a function to calculate the neighbours of a vertex:

         auto neighbours(immutable size_t v)
         {
             immutable size_t start = _sumTail[v] + _sumHead[v];
             immutable size_t end = _sumTail[v + 1] + _sumHead[v 
 + 1];
             if(!_cacheNeighbours[v])
             {
                 size_t j = start;
                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
                 {
                     _neighbours[j] = _head[_indexTail[i]];
                     ++j;
                 }
                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
                 {
                     _neighbours[j] = _tail[_indexHead[i]];
                     ++j;
                 }
                 assert(j == end);
                 _cacheNeighbours[v] = true;
             }
             return _neighbours[start .. end];
         }

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away
 unnecessary variables like this, and (ii) is there a way of 
 declaring start/end
 in the code such that at compile time the correct expression 
 will be put in
 place where it's needed?

 I'm guessing some kind of template solution, but I didn't get 
 it working (I
 probably need to study templates more:-).

 (... knocks head against wall to try and dislodge current 
 micro-optimization
 obsession ...)

compiler/version/flags?

The answer to what's happening might be obvious from the assembly 
code for the function, if you posted it.

Also, have you tried it on a different cpu?

Jul 31 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 07/31/2013 01:56 PM, John Colvin wrote:
 compiler/version/flags?

gdmd and ldmd2, both with -O -release -inline.

 The answer to what's happening might be obvious from the assembly code for the
 function, if you posted it.

Yes, I'll get onto that.  I'm completely inexperienced with assembly, but I
think it's clear I have to learn ...

 Also, have you tried it on a different cpu?

No, but can and will.

Jul 31 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Joseph Rushton Wakeling:

 I'm completely inexperienced with assembly, but I
 think it's clear I have to learn ...

Learning to write very efficient asm takes several months or a 
couple years or more. Learning to write working asm takes weeks 
or few months. Learning to read a little some asm takes hours or 
two days.

Bye,
bearophile

Jul 31 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 31, 2013 at 04:54:40PM +0200, bearophile wrote:
 Joseph Rushton Wakeling:
 
I'm completely inexperienced with assembly, but I
think it's clear I have to learn ...

 
 Learning to write very efficient asm takes several months or a
 couple years or more. Learning to write working asm takes weeks or
 few months. Learning to read a little some asm takes hours or two
 days.

[...]

It's completely worth the effort though, IMO. I know many people will
disagree with me, but Knuth puts it best:

	By understanding a machine-oriented language, the programmer
	will tend to use a much more efficient method; it is much closer
	to reality.
	-- D. Knuth

	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird.
	-- D. Knuth


T

-- 
My program has no bugs! Only unintentional features...

Jul 31 2013

"Dicebot" <public dicebot.lv> writes:

On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
 	People who are more than casually interested in computers 
 should
 	have at least some idea of what the underlying hardware is 
 like.
 	Otherwise the programs they write will be pretty weird.
 	-- D. Knuth

Well, while I do agree in general, there is a huge difference 
between understanding how h/w executes machine code and casually 
reading assembly listings ;)

Jul 31 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 31, 2013 at 05:07:26PM +0200, Dicebot wrote:
 On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird.
	-- D. Knuth

 
 Well, while I do agree in general, there is a huge difference
 between understanding how h/w executes machine code and casually
 reading assembly listings ;)

I still find the ability to read machine code invaluable when debugging,
esp. when trying to determine the cause of segfaults. At least at my day
job, segfaults do log a stacktrace to a crash log, but there is no core
dump, and on customer installations, there are no debugging symbols
either, so all you have is a list of register values and stack return
addresses. Being able to map that to the code by reading the disassembly
of the executable allows you to pinpoint the cause of the problem, which
otherwise would take many days of building and rebuilding test images,
or worse, if the problem is only intermittent, it might take 5 months
before QA is able to reproduce the problem reliably.

In the past month or so, I've fixed at least two bugs this way, one of
which only happens in a very specific network congestion state that it's
pretty much impossible to reproduce without knowing what the problem
was. Reading the disassembly allowed me to determine exactly which
statement was dereferencing a null pointer, so that later, when I was
tracing through the execution path, I immediately recognized a
particular race condition that would trigger the precise symptoms I
observed.

Plus, I've identified the cause of a handful DMD wrong-code bugs by
reading the disassembly of said wrong code. ;-) It's a pity I don't have
the time to familiarize myself with DMD internals, otherwise I'd submit
bug fixes too.


T

-- 
"You know, maybe we don't *need* enemies." "Yeah, best friends are about all I
can take." -- Calvin & Hobbes

Jul 31 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 31 July 2013 at 15:07:28 UTC, Dicebot wrote:
 On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:
 	People who are more than casually interested in computers 
 should
 	have at least some idea of what the underlying hardware is 
 like.
 	Otherwise the programs they write will be pretty weird.
 	-- D. Knuth

 Well, while I do agree in general, there is a huge difference 
 between understanding how h/w executes machine code and 
 casually reading assembly listings ;)

I disagree. There is nothing in asm except how the machine works 
(plus a few convenience features e.g. not having to read actual 
opCodes in octal/hex). If you understand how the h/w works then 
you understand assembly code. It would simply be a matter of 
getting used to the notation.

Plus, I would argue that learning assembly is a great path to 
understanding the machine itself. To be honest I can't imagine it 
any other way.

Jul 31 2013

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 07/31/2013 04:56 PM, H. S. Teoh wrote:
 It's completely worth the effort though, IMO.

It will mean I can finally talk programming with my dad, who _only_ knows how to
program in assembly (and still does so regularly:-)

Jul 31 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 31, 2013 at 05:08:01PM +0200, Joseph Rushton Wakeling wrote:
 On 07/31/2013 04:56 PM, H. S. Teoh wrote:
 It's completely worth the effort though, IMO.

 
 It will mean I can finally talk programming with my dad, who _only_
 knows how to program in assembly (and still does so regularly:-)

Really? There's still a market for that stuff? I want in! ;-)


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz

Jul 31 2013

"JS" <js.mdnq gmail.com> writes:

On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton 
Wakeling wrote:
 Hi all,

 When playing with the graph library code, I noticed something 
 odd.

 Here's a function to calculate the neighbours of a vertex:

         auto neighbours(immutable size_t v)
         {
             immutable size_t start = _sumTail[v] + _sumHead[v];
             immutable size_t end = _sumTail[v + 1] + _sumHead[v 
 + 1];
             if(!_cacheNeighbours[v])
             {
                 size_t j = start;
                 foreach (i; _sumTail[v] .. _sumTail[v + 1])
                 {
                     _neighbours[j] = _head[_indexTail[i]];
                     ++j;
                 }
                 foreach (i; _sumHead[v] .. _sumHead[v + 1])
                 {
                     _neighbours[j] = _tail[_indexHead[i]];
                     ++j;
                 }
                 assert(j == end);
                 _cacheNeighbours[v] = true;
             }
             return _neighbours[start .. end];
         }

 Now, I noticed that if instead of declaring the variables 
 start, end, I instead
 manually write out these expressions in the code, I get a small 
 but consistent
 speedup in the program.

 So, I'm curious (i) Why?  As I'd have assumed the compiler 
 could optimize away
 unnecessary variables like this, and (ii) is there a way of 
 declaring start/end
 in the code such that at compile time the correct expression 
 will be put in
 place where it's needed?

 I'm guessing some kind of template solution, but I didn't get 
 it working (I
 probably need to study templates more:-).

 (... knocks head against wall to try and dislodge current 
 micro-optimization
 obsession ...)

There is no telling, as bear said, compare the disassemblies(you 
can try it on dpaste and compare).

I do not believe D's code generation is optimal from what I have 
seen(at least for DMD). It could be cache misses due to 
pipelining issues(the order of instructions matter) or other 
weird stuff.

It could be that when you alias start and end, D ends up using a 
different algorithm that somehow changes the code generation.

I don't think D's code generation is even close to being as 
mature as many of the common C++ compilers.

Jul 31 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:
 I don't think D's code generation is even close to being as 
 mature as many of the common C++ compilers.

dmd's code generation isn't as good as e.g. gcc/clang/icc

ldc and gdc use the clang and gcc backends respectively and 
therefore, by definition, benefit from the same code generation.

Jul 31 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 31, 2013 at 06:08:23PM +0200, John Colvin wrote:
 On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:
I don't think D's code generation is even close to being as mature
as many of the common C++ compilers.

 
 dmd's code generation isn't as good as e.g. gcc/clang/icc
 
 ldc and gdc use the clang and gcc backends respectively and
 therefore, by definition, benefit from the same code generation.

Keep in mind, that the clang and gcc backends have *huge* teams of
developers contributing their expertise to improve the respective
optimizers, whereas there are significantly fewer people working on
dmd's optimizer.  No matter how skill Walter is (and I don't doubt his
skill -- I mean, just look at D), he can only do so much as one person.


T

-- 
Political correctness: socially-sanctioned hypocrisy.

Jul 31 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Alias/template for value