digitalmars.D.learn - Alias/template for value
- Joseph Rushton Wakeling (36/36) Jul 31 2013 Hi all,
- bearophile (7/14) Jul 31 2013 Take a look at the asm. And try to compile with ldc2 too.
- Joseph Rushton Wakeling (3/7) Jul 31 2013 Good call, I'll try and do this.
- John Colvin (6/51) Jul 31 2013 compiler/version/flags?
- Joseph Rushton Wakeling (5/9) Jul 31 2013 Yes, I'll get onto that. I'm completely inexperienced with assembly, bu...
- bearophile (7/9) Jul 31 2013 Learning to write very efficient asm takes several months or a
- H. S. Teoh (15/24) Jul 31 2013 [...]
- Dicebot (4/10) Jul 31 2013 Well, while I do agree in general, there is a huge difference
- H. S. Teoh (26/35) Jul 31 2013 I still find the ability to read machine code invaluable when debugging,
- John Colvin (9/19) Jul 31 2013 I disagree. There is nothing in asm except how the machine works
- Joseph Rushton Wakeling (3/4) Jul 31 2013 It will mean I can finally talk programming with my dad, who _only_ know...
- H. S. Teoh (5/10) Jul 31 2013 Really? There's still a market for that stuff? I want in! ;-)
- JS (12/57) Jul 31 2013 There is no telling, as bear said, compare the disassemblies(you
- John Colvin (4/6) Jul 31 2013 dmd's code generation isn't as good as e.g. gcc/clang/icc
- H. S. Teoh (9/17) Jul 31 2013 Keep in mind, that the clang and gcc backends have *huge* teams of
Hi all, When playing with the graph library code, I noticed something odd. Here's a function to calculate the neighbours of a vertex: auto neighbours(immutable size_t v) { immutable size_t start = _sumTail[v] + _sumHead[v]; immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1]; if(!_cacheNeighbours[v]) { size_t j = start; foreach (i; _sumTail[v] .. _sumTail[v + 1]) { _neighbours[j] = _head[_indexTail[i]]; ++j; } foreach (i; _sumHead[v] .. _sumHead[v + 1]) { _neighbours[j] = _tail[_indexHead[i]]; ++j; } assert(j == end); _cacheNeighbours[v] = true; } return _neighbours[start .. end]; } Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program. So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this, and (ii) is there a way of declaring start/end in the code such that at compile time the correct expression will be put in place where it's needed? I'm guessing some kind of template solution, but I didn't get it working (I probably need to study templates more:-). (... knocks head against wall to try and dislodge current micro-optimization obsession ...)
Jul 31 2013
Joseph Rushton Wakeling:Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program. So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this,Take a look at the asm. And try to compile with ldc2 too. If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm). Bye, bearophile
Jul 31 2013
On 07/31/2013 01:53 PM, bearophile wrote:Take a look at the asm. And try to compile with ldc2 too.In this case I observed it with both GDC and LDC.If gdc is missing such small optimization then I suggest to write a small program that shows the problem, and write a gdc bug report (that also contains the resulting asm).Good call, I'll try and do this.
Jul 31 2013
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote:Hi all, When playing with the graph library code, I noticed something odd. Here's a function to calculate the neighbours of a vertex: auto neighbours(immutable size_t v) { immutable size_t start = _sumTail[v] + _sumHead[v]; immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1]; if(!_cacheNeighbours[v]) { size_t j = start; foreach (i; _sumTail[v] .. _sumTail[v + 1]) { _neighbours[j] = _head[_indexTail[i]]; ++j; } foreach (i; _sumHead[v] .. _sumHead[v + 1]) { _neighbours[j] = _tail[_indexHead[i]]; ++j; } assert(j == end); _cacheNeighbours[v] = true; } return _neighbours[start .. end]; } Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program. So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this, and (ii) is there a way of declaring start/end in the code such that at compile time the correct expression will be put in place where it's needed? I'm guessing some kind of template solution, but I didn't get it working (I probably need to study templates more:-). (... knocks head against wall to try and dislodge current micro-optimization obsession ...)compiler/version/flags? The answer to what's happening might be obvious from the assembly code for the function, if you posted it. Also, have you tried it on a different cpu?
Jul 31 2013
On 07/31/2013 01:56 PM, John Colvin wrote:compiler/version/flags?gdmd and ldmd2, both with -O -release -inline.The answer to what's happening might be obvious from the assembly code for the function, if you posted it.Yes, I'll get onto that. I'm completely inexperienced with assembly, but I think it's clear I have to learn ...Also, have you tried it on a different cpu?No, but can and will.
Jul 31 2013
Joseph Rushton Wakeling:I'm completely inexperienced with assembly, but I think it's clear I have to learn ...Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days. Bye, bearophile
Jul 31 2013
On Wed, Jul 31, 2013 at 04:54:40PM +0200, bearophile wrote:Joseph Rushton Wakeling:[...] It's completely worth the effort though, IMO. I know many people will disagree with me, but Knuth puts it best: By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. Knuth T -- My program has no bugs! Only unintentional features...I'm completely inexperienced with assembly, but I think it's clear I have to learn ...Learning to write very efficient asm takes several months or a couple years or more. Learning to write working asm takes weeks or few months. Learning to read a little some asm takes hours or two days.
Jul 31 2013
On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. KnuthWell, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
Jul 31 2013
On Wed, Jul 31, 2013 at 05:07:26PM +0200, Dicebot wrote:On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:I still find the ability to read machine code invaluable when debugging, esp. when trying to determine the cause of segfaults. At least at my day job, segfaults do log a stacktrace to a crash log, but there is no core dump, and on customer installations, there are no debugging symbols either, so all you have is a list of register values and stack return addresses. Being able to map that to the code by reading the disassembly of the executable allows you to pinpoint the cause of the problem, which otherwise would take many days of building and rebuilding test images, or worse, if the problem is only intermittent, it might take 5 months before QA is able to reproduce the problem reliably. In the past month or so, I've fixed at least two bugs this way, one of which only happens in a very specific network congestion state that it's pretty much impossible to reproduce without knowing what the problem was. Reading the disassembly allowed me to determine exactly which statement was dereferencing a null pointer, so that later, when I was tracing through the execution path, I immediately recognized a particular race condition that would trigger the precise symptoms I observed. Plus, I've identified the cause of a handful DMD wrong-code bugs by reading the disassembly of said wrong code. ;-) It's a pity I don't have the time to familiarize myself with DMD internals, otherwise I'd submit bug fixes too. T -- "You know, maybe we don't *need* enemies." "Yeah, best friends are about all I can take." -- Calvin & HobbesPeople who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. KnuthWell, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
Jul 31 2013
On Wednesday, 31 July 2013 at 15:07:28 UTC, Dicebot wrote:On Wednesday, 31 July 2013 at 14:58:24 UTC, H. S. Teoh wrote:I disagree. There is nothing in asm except how the machine works (plus a few convenience features e.g. not having to read actual opCodes in octal/hex). If you understand how the h/w works then you understand assembly code. It would simply be a matter of getting used to the notation. Plus, I would argue that learning assembly is a great path to understanding the machine itself. To be honest I can't imagine it any other way.People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird. -- D. KnuthWell, while I do agree in general, there is a huge difference between understanding how h/w executes machine code and casually reading assembly listings ;)
Jul 31 2013
On 07/31/2013 04:56 PM, H. S. Teoh wrote:It's completely worth the effort though, IMO.It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)
Jul 31 2013
On Wed, Jul 31, 2013 at 05:08:01PM +0200, Joseph Rushton Wakeling wrote:On 07/31/2013 04:56 PM, H. S. Teoh wrote:Really? There's still a market for that stuff? I want in! ;-) T -- Try to keep an open mind, but not so open your brain falls out. -- thebozIt's completely worth the effort though, IMO.It will mean I can finally talk programming with my dad, who _only_ knows how to program in assembly (and still does so regularly:-)
Jul 31 2013
On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton Wakeling wrote:Hi all, When playing with the graph library code, I noticed something odd. Here's a function to calculate the neighbours of a vertex: auto neighbours(immutable size_t v) { immutable size_t start = _sumTail[v] + _sumHead[v]; immutable size_t end = _sumTail[v + 1] + _sumHead[v + 1]; if(!_cacheNeighbours[v]) { size_t j = start; foreach (i; _sumTail[v] .. _sumTail[v + 1]) { _neighbours[j] = _head[_indexTail[i]]; ++j; } foreach (i; _sumHead[v] .. _sumHead[v + 1]) { _neighbours[j] = _tail[_indexHead[i]]; ++j; } assert(j == end); _cacheNeighbours[v] = true; } return _neighbours[start .. end]; } Now, I noticed that if instead of declaring the variables start, end, I instead manually write out these expressions in the code, I get a small but consistent speedup in the program. So, I'm curious (i) Why? As I'd have assumed the compiler could optimize away unnecessary variables like this, and (ii) is there a way of declaring start/end in the code such that at compile time the correct expression will be put in place where it's needed? I'm guessing some kind of template solution, but I didn't get it working (I probably need to study templates more:-). (... knocks head against wall to try and dislodge current micro-optimization obsession ...)There is no telling, as bear said, compare the disassemblies(you can try it on dpaste and compare). I do not believe D's code generation is optimal from what I have seen(at least for DMD). It could be cache misses due to pipelining issues(the order of instructions matter) or other weird stuff. It could be that when you alias start and end, D ends up using a different algorithm that somehow changes the code generation. I don't think D's code generation is even close to being as mature as many of the common C++ compilers.
Jul 31 2013
On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:I don't think D's code generation is even close to being as mature as many of the common C++ compilers.dmd's code generation isn't as good as e.g. gcc/clang/icc ldc and gdc use the clang and gcc backends respectively and therefore, by definition, benefit from the same code generation.
Jul 31 2013
On Wed, Jul 31, 2013 at 06:08:23PM +0200, John Colvin wrote:On Wednesday, 31 July 2013 at 12:01:43 UTC, JS wrote:Keep in mind, that the clang and gcc backends have *huge* teams of developers contributing their expertise to improve the respective optimizers, whereas there are significantly fewer people working on dmd's optimizer. No matter how skill Walter is (and I don't doubt his skill -- I mean, just look at D), he can only do so much as one person. T -- Political correctness: socially-sanctioned hypocrisy.I don't think D's code generation is even close to being as mature as many of the common C++ compilers.dmd's code generation isn't as good as e.g. gcc/clang/icc ldc and gdc use the clang and gcc backends respectively and therefore, by definition, benefit from the same code generation.
Jul 31 2013