digitalmars.D - Feature Request: nontrivial functions and vtable optimizations
- downs (11/11) Aug 12 2008 The overhead of vtable calls is well known.
- davidl (5/22) Aug 12 2008 It's better to show your benchmark examples, there people can observe an...
- Craig Black (12/28) Aug 12 2008 You are right. Function pointer invokation tends to slow down processin...
- davidl (4/42) Aug 12 2008 But it's so anti-cleanness hack.
- Craig Black (6/14) Aug 12 2008 Are you are refering to my suggestion that we should be able to specify
- Don (6/33) Aug 13 2008 I don't think this is true any more.
- Craig Black (20/55) Aug 14 2008 To be fair, modern processors are quite sophisticated and do better than...
- Jb (17/23) Aug 13 2008 Thats not true these days. I know cause I've profiled it on multiple cpu...
- Craig Black (8/23) Aug 14 2008 You may be right in some cases. Profile to be sure. But experience has...
- Jb (7/19) Aug 15 2008 I'm not saying there's no penalty, im saying it's a penalty you cant esc...
- Craig Black (7/27) Aug 15 2008 I thought you were wrong about this so I did a micro-benchmark and it
- Christopher Wright (6/11) Aug 13 2008 Or you could wait a bit[1] and have LLVM do that for you, assuming
- Craig Black (4/15) Aug 14 2008 The LLVM based D compiler seems promising. Hopefully it will mature soo...
- downs (44/44) Aug 14 2008 To clear this up, I've been running a benchmark.
- superdan (3/60) Aug 14 2008 you are looking with a binocular at a coin a mile away and tryin' to fig...
- downs (7/70) Aug 14 2008 I know. But if it doesn't matter in that case, it most likely won't matt...
- superdan (2/75) Aug 14 2008 the whole premise of speculation is it oils the common path. you have un...
- downs (7/82) Aug 14 2008 As you wish.
- Jb (24/34) Aug 14 2008 To speculate localy you still have to lookup the class info, compare it ...
- superdan (4/28) Aug 14 2008 you are right here tho for the wrong reasons :) yeah speculation is good...
- Steven Schveighoffer (3/47) Aug 14 2008 Who the **** are you, and what did you do with superdan? ;)
- Jb (22/65) Aug 14 2008 Sorry but you dont know what you are talking about. Indirect jumps get a...
- Yigal Chripun (7/86) Aug 14 2008 I think Superdan is just too old ;)
- superdan (4/85) Aug 14 2008 that's right eh. i stand corrected. what i said happens on itanium.
- Jb (4/6) Aug 14 2008 No!!!!! ;-)
- Jb (9/55) Aug 14 2008 The point is that you cant get rid of the branch prediction. You either ...
The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?
Aug 12 2008
在 Tue, 12 Aug 2008 16:45:08 +0800,downs <default_357-line yahoo.de> 写道:The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?It's better to show your benchmark examples, there people can observe and discuss -- 使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
Aug 12 2008
"downs" <default_357-line yahoo.de> wrote in message news:g7ribh$1vii$1 digitalmars.com...The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?You are right. Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction. I don't understand why this is so hard for some programmers to accept. See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation. Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code. This would indicate to the compiler what virtual functions will be called the most. -Craig
Aug 12 2008
在 Tue, 12 Aug 2008 22:46:21 +0800,Craig Black <cblack ara.com> 写道:"downs" <default_357-line yahoo.de> wrote in message news:g7ribh$1vii$1 digitalmars.com...But it's so anti-cleanness hack. -- 使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?You are right. Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction. I don't understand why this is so hard for some programmers to accept. See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation. Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code. This would indicate to the compiler what virtual functions will be called the most. -Craig
Aug 12 2008
Are you are refering to my suggestion that we should be able to specify virtual function priorities? This is much more clean than writing your own virtual function dispatch routine whenever you want a virtual function to be optimized. Further, the specifying of priority is completely optional. You won't be forced to specify a priority if you don't want to. -CraigRather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code. This would indicate to the compiler what virtual functions will be called the most. -CraigBut it's so anti-cleanness hack.
Aug 12 2008
Craig Black wrote:"downs" <default_357-line yahoo.de> wrote in message news:g7ribh$1vii$1 digitalmars.com...I don't think this is true any more. It was true in processors older than the Pentium M and AMD K10. On recent CPUs, indirect jumps and calls are predicted as well as conditional branches are. If you're seeing this kind of the speed difference, it's something else (probably, more code is being executed).The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?You are right. Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction.
Aug 13 2008
"Don" <nospam nospam.com.au> wrote in message news:g7u5mr$261s$1 digitalmars.com...Craig Black wrote:To be fair, modern processors are quite sophisticated and do better than older ones with function pointers and vtables. I don't mean to overemphasize the penalty of function pointer invokations. They are usually relatively fast on modern processors. Even so, they can still be the source of bottlenecks. You should never think of a function pointer invokation as free. Never assume that your compiler and processor are doing magical optimizations. Modern processors can still have problems with branch prediction. I speak from experience, and there are many others in the field that agree. Obviously, you should always profile first since your function pointer invokations may not be a bottleneck at all. It is going to depend on how often and how the function pointers are getting called. From experience, I have had numerous performance problems with function pointer invokations. For example, I found a very severe bottleneck in my rendering loop that was caused due to a function pointer invokation. It was destroying my performance. In general, when I refactor my code to make less function pointer calls, I usually get significant measurable speedups. -Craig"downs" <default_357-line yahoo.de> wrote in message news:g7ribh$1vii$1 digitalmars.com...I don't think this is true any more. It was true in processors older than the Pentium M and AMD K10. On recent CPUs, indirect jumps and calls are predicted as well as conditional branches are. If you're seeing this kind of the speed difference, it's something else (probably, more code is being executed).The overhead of vtable calls is well known. For this reason, it may be worthwhile to include an optimization that essentially checks if a certain class is only inherited from a few times, like, <= three, in the visible program, and if so, replace vtable calls with a tree of classinfo pointer comparisons and direct function calls. When I did this manually in some hobbyist path tracing code, I gained significant speed-ups, even though I didn't collect any hard numbers. So, for instance, the equivalent D code would look like so: class A { ... } class B : A { ... } class C : A { ... } A a = genSomeObj(); // a.test(); if (a.classinfo == typeid(B)) (cast(B)cast(void*) a).test(); // call as if final else if (a.classinfo == typeid(C)) (cast(C)cast(void*) a).test(); // dito else a.test(); // fallback Ideas?You are right. Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction.
Aug 14 2008
"Craig Black" <cblack ara.com> wrote in message news:g7s7o1$dpt$1 digitalmars.com...You are right. Function pointer invokation tends to slow down processing because it doesn't cooperate well with pipelining and branch prediction. I don't understand why this is so hard for some programmers to accept.Thats not true these days. I know cause I've profiled it on multiple cpus, correctly predicted Indirect jumps (through a pointer) cost no more than a normal call/ret pair. They get a slot in the BTB just like conditional jumps, and IIRC they are predicted to go to the same place they did last time. (on x86 anyway). Virtual methods are in such high use that Intel and Amd have put a lot of effort into optimizing them. Replacing the function pointer with a conditional branch wont make it faster.See Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation.The speedup you get from this is the inlining of the function, *not* the getting rid of the function pointer lookup. It still requires a lookup in the class info and a conditional branch to check that the "inlined function" is the correct match for the speculated object. That will still cost a whole pipline of cycles if wrongly predicted.
Aug 13 2008
"Jb" <jb nowhere.com> wrote in message news:g7ubhk$2hq5$1 digitalmars.com..."Craig Black" <cblack ara.com> wrote in message news:g7s7o1$dpt$1 digitalmars.com... (on x86 anyway). Virtual methods are in such high use that Intel and Amd have put a lot of effort into optimizing them. Replacing the function pointer with a conditional branch wont make it faster.It may or may not improve performance. Depends on the particular case.You may be right in some cases. Profile to be sure. But experience has taught me to be wary of function pointer invokations, even on modern CPU's. If you want to believe that there is never a penalty for function pointer invokation go right ahead. After all, it's only two instructions right? Must be real fast. ;) -CraigSee Microsoft Visual C++ profile guided optimizations (PGO) "Virtual Call Speculation". Using PGO, the most commonly called virtual functions are optimized so that they do not require a function pointer invokation.The speedup you get from this is the inlining of the function, *not* the getting rid of the function pointer lookup. It still requires a lookup in the class info and a conditional branch to check that the "inlined function" is the correct match for the speculated object. That will still cost a whole pipline of cycles if wrongly predicted.
Aug 14 2008
"Craig Black" <craigblack2 cox.net> wrote in message news:g82rj4$1hqn$1 digitalmars.com...I'm not saying there's no penalty, im saying it's a penalty you cant escape if you want virtual functions. Function pointer invocations == conditional branches as far as modern branch predictors are concerned. You dont gain anything by replacing the former with the later.The speedup you get from this is the inlining of the function, *not* the getting rid of the function pointer lookup. It still requires a lookup in the class info and a conditional branch to check that the "inlined function" is the correct match for the speculated object. That will still cost a whole pipline of cycles if wrongly predicted.You may be right in some cases. Profile to be sure. But experience has taught me to be wary of function pointer invokations, even on modern CPU's. If you want to believe that there is never a penalty for function pointer invokation go right ahead. After all, it's only two instructions right? Must be real fast. ;)
Aug 15 2008
"Jb" <jb nowhere.com> wrote in message news:g843q6$104a$1 digitalmars.com..."Craig Black" <craigblack2 cox.net> wrote in message news:g82rj4$1hqn$1 digitalmars.com...I thought you were wrong about this so I did a micro-benchmark and it indicates that you are correct. With only 1 branch the performance of an if statement was exactly the same as a function pointer invokation. Function pointer invokations ended up being roughly 50% faster than a sequence of if-elses. I guess I stand corrected. -CraigI'm not saying there's no penalty, im saying it's a penalty you cant escape if you want virtual functions. Function pointer invocations == conditional branches as far as modern branch predictors are concerned. You dont gain anything by replacing the former with the later.The speedup you get from this is the inlining of the function, *not* the getting rid of the function pointer lookup. It still requires a lookup in the class info and a conditional branch to check that the "inlined function" is the correct match for the speculated object. That will still cost a whole pipline of cycles if wrongly predicted.You may be right in some cases. Profile to be sure. But experience has taught me to be wary of function pointer invokations, even on modern CPU's. If you want to believe that there is never a penalty for function pointer invokation go right ahead. After all, it's only two instructions right? Must be real fast. ;)
Aug 15 2008
Craig Black wrote:Rather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code. This would indicate to the compiler what virtual functions will be called the most. -CraigOr you could wait a bit[1] and have LLVM do that for you, assuming you're willing to run LLVM bytecode. And considering it's LLVM, you could probably hack it to periodically dump the optimized code. So you'd optimize your application by fooling around with it for a couple hours. [1] "A bit" may be as long as a decade. I dunno.
Aug 13 2008
"Christopher Wright" <dhasenan gmail.com> wrote in message news:g7uhpk$308b$2 digitalmars.com...Craig Black wrote:The LLVM based D compiler seems promising. Hopefully it will mature soon. -CraigRather than relying on PGO, I would prefer the ability to assign a priorities to virtual functions in the code. This would indicate to the compiler what virtual functions will be called the most. -CraigOr you could wait a bit[1] and have LLVM do that for you, assuming you're willing to run LLVM bytecode. And considering it's LLVM, you could probably hack it to periodically dump the optimized code. So you'd optimize your application by fooling around with it for a couple hours. [1] "A bit" may be as long as a decade. I dunno.
Aug 14 2008
To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim. Sorry for the confusion.
Aug 14 2008
downs Wrote:To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim. Sorry for the confusion.you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing. make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
Aug 14 2008
superdan wrote:downs Wrote:I know. But if it doesn't matter in that case, it most likely won't matter in practial situations. Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object. Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim. Sorry for the confusion.you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing. make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
Aug 14 2008
downs Wrote:superdan wrote:the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.downs Wrote:I know. But if it doesn't matter in that case, it most likely won't matter in practial situations. Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object. Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim. Sorry for the confusion.you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing. make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
Aug 14 2008
superdan wrote:downs Wrote:As you wish. Here's the timings for 10% A, then 81% for B, 9% for C. Null: 629760 Naive: 1383764 Speculative for B: 1397784 Speculative for B/C: 1418163superdan wrote:the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.downs Wrote:I know. But if it doesn't matter in that case, it most likely won't matter in practial situations. Nonetheless, here are some better timings, including a faster (dirtier) randf() function, and a null pass that only generates the object. Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim. Sorry for the confusion.you are looking with a binocular at a coin a mile away and tryin' to figure quarter or nickel. never gonna work. most likely ur benchmark is buried in randf timing. make iteration cost next to nothing. put objects in a medium size vector. then iterate many times over it.
Aug 14 2008
"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other. So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this... (ignoring parameter setup which will be identical) MOV EAX.[objptr.vtable+offset] CALL EAX With this... MOV EAX,[objptr.classinfo] CMP EAX,expectedclasstype JNE wrongtype CALL expectedtypemethod JMP done wrongtype // normal vtable code here done And there's no way that will be faster imo. Unless perhaps you have a 486 or P1. ;-)Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.
Aug 14 2008
Jb Wrote:"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe. the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory. indirect jump is a whole different business altogether.downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown. processors today are so complicated there's no chance to have relevant synthetic tests. when checking for speed you always must use realistic loads. if method speculation as in here shows nothing on empty functions that's nothing. i've seen much weirder things. tests must be held on 5-6 real benchmarks to be any good.
Aug 14 2008
"superdan" wroteJb Wrote:Who the **** are you, and what did you do with superdan? ;) -Steve"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe. the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory. indirect jump is a whole different business altogether.downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown. processors today are so complicated there's no chance to have relevant synthetic tests. when checking for speed you always must use realistic loads. if method speculation as in here shows nothing on empty functions that's nothing. i've seen much weirder things. tests must be held on 5-6 real benchmarks to be any good.
Aug 14 2008
"superdan" <super dan.org> wrote in message news:g81o61$21bs$1 digitalmars.com...Jb Wrote:Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so. http://www.agner.org/optimize/microarchitecture.pdf Read the chapter on branch prediction. http://www.intel.com/design/processor/manuals/248966.pdf Read Chapter 3.4.1.6"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe.downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory.Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready. But it will still cost you a whole pipeline flush if the prediction was wrong.No.. erm NO... and NOOooOOOO!!!! ;-) Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code. It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown.
Aug 14 2008
Jb wrote:"superdan" <super dan.org> wrote in message news:g81o61$21bs$1 digitalmars.com...I think Superdan is just too old ;) What he is talking about is probably the way things worked in P4 and that design was canceled for many good reasons. As Jb says above, No current x86 processor *execute* both branches. My Digital Systems professor works at Intel here in Israel where they design the chips ;) and he taught us the same thing Jb said above.Jb Wrote:Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so. http://www.agner.org/optimize/microarchitecture.pdf Read the chapter on branch prediction. http://www.intel.com/design/processor/manuals/248966.pdf Read Chapter 3.4.1.6"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe.downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory.Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready. But it will still cost you a whole pipeline flush if the prediction was wrong.No.. erm NO... and NOOooOOOO!!!! ;-) Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code. It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown.
Aug 14 2008
Jb Wrote:"superdan" <super dan.org> wrote in message news:g81o61$21bs$1 digitalmars.com...relax. i do know what i'm talking about eh. i didn't claim it was timely :) also it got real jumbled in my head. like i described predicated execution instead of straight branch prediction. itanium has the latter and x86 has the former. (bad decision in itanium if you ask me.) so i was talking outta my... i mean i was talking nonsense.Jb Wrote:Sorry but you dont know what you are talking about. Indirect jumps get a slot in the branch target buffer (BTB) the same as conditional branches. They both share the same target prediction hardware. It's not only a fact it's also common sense that they would do so. http://www.agner.org/optimize/microarchitecture.pdf Read the chapter on branch prediction. http://www.intel.com/design/processor/manuals/248966.pdf Read Chapter 3.4.1.6"superdan" <super dan.org> wrote in message news:g81e1l$18js$1 digitalmars.com...no that's wrong. conditional branch is better speculated than indirect jump. they share no other hardware than the instruction fetching pipe.downs Wrote:To speculate localy you still have to lookup the class info, compare it to the speculated object type, and then conditionaly branch. You cant avoid a conditional branch, and as it will share the same pattern as the indirect jump, and (in most cases) the same prediction mechanism, neither will be better predicted than the other.Null: 729908 Naive: 1615314 Speculative for B: 1692860 Speculative for B/C: 1664040the whole premise of speculation is it oils the common path. you have uniform probabilities. what you gain on speculating for B you lose in the extra test when misspeculating on others. so to really test speculation make B like 90% of cases and retry.that's right eh. i stand corrected. what i said happens on itanium.the mechanism for conditional branch speculation is that stuff is executed in parallel on both branches inside the pipeline. the branch that makes it is committed. the other is retired without getting to write to the register file or memory.Actualy AFAIK no current x86 processor *execute* both branches. What does happen is the instruction cache will *prefetch* both branches, so that when the BTB decides which way to go it has the instruction data ready.But it will still cost you a whole pipeline flush if the prediction was wrong.hey hey... watch your language :) otherwise i agree.No.. erm NO... and NOOooOOOO!!!! ;-) Speculation is good if it allows you to inline the method and chop out a whole bunch of register shuffling, stack setup, and entry / exit code. It's got naff all to do with "having shit to do" while you're speculating. Just like it's got naff all to do with conditional branches being faster / better predicted than indirect calls.So the point is that you cant make the common path any faster unless you actualy *inline* the speculated method. In fact if you dont inline you are most probably making the situation worse because you are replacing this...you are right here tho for the wrong reasons :) yeah speculation is good if there are a few good chunky instructions to eat while speculating. if it's a function call that in turn is just a ret... the results are nothing to write home about. as it's been shown.
Aug 14 2008
"superdan" <super dan.org> wrote in message news:g82fpu$t53$1 digitalmars.com...Jb Wrote: relax.No!!!!! ;-) Actualy yeah sorry if i came over a bit cranky.
Aug 14 2008
"downs" <default_357-line yahoo.de> wrote in message news:g8193l$t88$1 digitalmars.com...To clear this up, I've been running a benchmark. module test91; import tools.time, std.stdio, tools.base, tools.mersenne; class A { void test() { } } class B : A { final override void test() { } } class C : A { final override void test() { } } A a, b, c; static this() { a = new A; b = new B; c = new C; } A gen() { if (randf() < 1f/3f) return a; else if (randf() < 0.5) return b; else return c; } void main() { const count = 1024*1024; for (int z = 0; z < 4; ++z) { writefln("Naive: ", time({ for (int i = 0; i < count; ++i) gen().test(); }())); writefln("Speculative for B: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else t.test(); } }())); writefln("Speculative for B/C: ", time({ for (int i = 0; i < count; ++i) { auto t = gen(); if (t.classinfo is typeid(B)) (cast(B)cast(void*)t).test(); else if (t.classinfo is typeid(C)) (cast(C)cast(void*)t).test(); else t.test(); } }())); } } And as it turns out, virtual method calls were at least fast enough to not make any sort of difference in my calls. Here's the output of my little proggy in the last iteration: Naive: 560958 Speculative for B: 574602 Speculative for B/C: 572429 If anything, naive is often a little faster. This kind of completely confuses my established knowledge on the matter. Looks like recent CPUs' branch predictions really are as good as people claim.The point is that you cant get rid of the branch prediction. You either have the indirect jump, or you have a local conditional branch, either way the cpu has to speculatively go in one direct or the other. And as both are dealt with in (near enough) exactly the same way, they both are dealt with by the same prediction mechanism, you dont gain anything from replacing one with the other. It's been that way since the Pentium 2 IIRC.
Aug 14 2008