digitalmars.D.bugs - int opEquals(Object), and other legacy ints
- Stewart Gordon (23/23) Jul 20 2006 There seem to be a number of leftovers from before we had a bool type,
- Walter Bright (4/6) Jul 21 2006 They are typed as returning int for efficiency reasons. These functions
- Bruno Medeiros (5/12) Jul 21 2006 But isn't bool an int internally? Why is it less efficient to use a bool...
- Jarrett Billingsley (3/4) Jul 21 2006 It's a byte internally.
- Walter Bright (2/12) Jul 21 2006 It's a byte internally, and is constrained to be one of the values 0 or ...
- Bruno Medeiros (11/24) Jul 27 2006 Duh, it's a byte of course, I should have checked that.
- xs0 (6/23) Jul 28 2006 but if the return type is bool, it becomes
- Stewart Gordon (18/34) Jul 28 2006 If it does this, then there's a serious bug in the compiler.
- Walter Bright (8/35) Jul 28 2006 The only difference between a CMP and a SUB instruction is where the
- kris (5/54) Jul 28 2006 So, why not treat false as 0, and true as not 0? That way, it works
- Frits van Bommel (3/19) Jul 28 2006 Then what would happen if a and b differ by, say, 256? Remember, an int
- kris (14/40) Jul 28 2006 Sure, but it's generally more efficient to do all logical and arithmetic...
- Frits van Bommel (5/47) Jul 29 2006 Actually, I'm pretty sure testing for zero is already how it's done
- Stewart Gordon (16/48) Jul 29 2006 If anything resembling the above, then
- Walter Bright (31/61) Jul 29 2006 ? Let's look at an example:
- Deewiant (4/12) Jul 30 2006 (a - b), if a and b are equal ints, evaluates to 0, which is generally
- Walter Bright (4/15) Jul 30 2006 Oh, I see what you mean.
- Stewart Gordon (19/37) Jul 30 2006 Exactly. But because what we have is opEquals and not opNotEquals, the
- Bruno Medeiros (22/92) Jul 30 2006 As per the other posts, Eq2 actually takes 2 instructions:
- kris (7/119) Jul 30 2006 Yes indeed. Well spotted! On anything supporting the 386 instruction set...
- Frits van Bommel (14/30) Jul 30 2006 Interesting instruction. Seems to have the exact semantics needed for
- Lionello Lunesu (5/6) Aug 07 2006 But is it faster? I've noticed that many of the higher-level assembly
- Frits van Bommel (5/11) Aug 07 2006 Heh... You may have noticed I didn't use any word related to speed :).
- kris (18/29) Aug 07 2006 If you'd looked at the setne instruction linked previously, you'd have
- Dave (7/40) Aug 07 2006 Yea, AFAIK setne is supported by 386 onward, plus a quick check of the G...
- Bruno Medeiros (19/28) Jul 30 2006 [PS: I've read Frits answer after writing this: ]
- Walter Bright (12/20) Jul 28 2006 Consider:
- Bruno Medeiros (19/40) Jul 30 2006 Well, let's think about the other way around then. Why should bool be
- Walter Bright (3/18) Jul 30 2006 I think most programmers would find this to be very surprising behavior....
- Bruno Medeiros (9/28) Aug 01 2006 Surprising behavior? What surprising behavior, those are all
- Dave (9/12) Jul 31 2006 I consider this kind of stuff the compilers job -- so if I write or
There seem to be a number of leftovers from before we had a bool type, and many people were using the int type to pass booleans around. The most obvious is int opEquals(Object) defined in the Object class. Changing this'll break a considerable amount of existing code - but then again, the 0.163 change of making imports private by default has done this already. But there are many functions in Phobos that can be cleaned up a bit without doing much harm. Just to name a few.... std.string.iswhite std.string.inPattern std.ctype.isalnum (indeed, most of the functions in std.ctype) std.file.exists std.file.isfile std.file.isdir std.intrinsic.bt std.intrinsic.btc std.intrinsic.btr std.intrinsic.bts std.math.isnan (and other is* functions) std.math.signbit Going through the other modules will probably reveal many more, but I haven't checked. Stewart.
Jul 20 2006
Stewart Gordon wrote:There seem to be a number of leftovers from before we had a bool type, and many people were using the int type to pass booleans around.They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.
Jul 21 2006
Walter Bright wrote:Stewart Gordon wrote:But isn't bool an int internally? Why is it less efficient to use a bool? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DThere seem to be a number of leftovers from before we had a bool type, and many people were using the int type to pass booleans around.They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.
Jul 21 2006
"Bruno Medeiros" <brunodomedeirosATgmail SPAM.com> wrote in message news:e9qd21$2ueu$2 digitaldaemon.com...But isn't bool an int internally? Why is it less efficient to use a bool?It's a byte internally.
Jul 21 2006
Bruno Medeiros wrote:Walter Bright wrote:It's a byte internally, and is constrained to be one of the values 0 or 1.Stewart Gordon wrote:But isn't bool an int internally? Why is it less efficient to use a bool?There seem to be a number of leftovers from before we had a bool type, and many people were using the int type to pass booleans around.They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.
Jul 21 2006
Walter Bright wrote:Bruno Medeiros wrote:Duh, it's a byte of course, I should have checked that. But the question remains, is it then less efficient to return a byte than a int? Why? And if so isn't there a way for the compiler to somehow optimize it? I find it a bit hard to believe that nowadays there isn't sufficient compiler and/or CPU technology to somehow make a bool(byte) return value as efficient as a int one. :/ -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DWalter Bright wrote:It's a byte internally, and is constrained to be one of the values 0 or 1.Stewart Gordon wrote:But isn't bool an int internally? Why is it less efficient to use a bool?There seem to be a number of leftovers from before we had a bool type, and many people were using the int type to pass booleans around.They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.
Jul 27 2006
Well, I'm just guessing, but I think something likeIt's a byte internally, and is constrained to be one of the values 0 or 1.Duh, it's a byte of course, I should have checked that. But the question remains, is it then less efficient to return a byte than a int? Why? And if so isn't there a way for the compiler to somehow optimize it? I find it a bit hard to believe that nowadays there isn't sufficient compiler and/or CPU technology to somehow make a bool(byte) return value as efficient as a int one. :/int opEquals(Foo foo) { return this.bar == foo.bar; }is compiled to something likereturn this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsIt's the 1/0 constraint on bools that causes the slowness, not the size (stack is usually size_t-aligned anyway) xs0
Jul 28 2006
xs0 wrote: <snip>Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler. Moreover, what's your evidence that subtracting one number from another might be more efficient than comparing them for equality directly?return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsIt's the 1/0 constraint on bools that causes the slowness, not the size (stack is usually size_t-aligned anyway)But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 28 2006
Stewart Gordon wrote:xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsMoreover, what's your evidence that subtracting one number from another might be more efficient than comparing them for equality directly?The only difference between a CMP and a SUB instruction is where the result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a register takes more instructions.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.It's the 1/0 constraint on bools that causes the slowness, not the size (stack is usually size_t-aligned anyway)But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.
Jul 28 2006
Walter Bright wrote:Stewart Gordon wrote:So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsMoreover, what's your evidence that subtracting one number from another might be more efficient than comparing them for equality directly?The only difference between a CMP and a SUB instruction is where the result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a register takes more instructions.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.It's the 1/0 constraint on bools that causes the slowness, not the size (stack is usually size_t-aligned anyway)But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.
Jul 28 2006
kris wrote:Walter Bright wrote:Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.Stewart Gordon wrote:So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
Jul 28 2006
Frits van Bommel wrote:kris wrote:Sure, but it's generally more efficient to do all logical and arithmetic operations in the native width of the device anyway ~ generally 32bits for current D compilers. If you're talking about issues related to actually storing a bool result, then that's part of the "concerns" noted above. Bool values derived in certains ways may need to be folded for storage, but not for testing. The subtraction case above may be included in that group, but testing should still only require a compare against zero (for both true and false). I'm suggesting only that zero values should *always* be used to test for 'truth' ~ never 1, or 255, or any value other than zero. Anywhere the keyword "true" is used (or implied) for comparative purposes, test against zero and invert the jmp-condition instead. If that's not done already, it would probably speed things up in many cases.Walter Bright wrote:Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.Stewart Gordon wrote:So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
Jul 28 2006
kris wrote:Frits van Bommel wrote:Actually, I'm pretty sure testing for zero is already how it's done (just with 1-byte operands instead of 4-byte ones). Something else: if there are multiple ways to represent true then equality testing just got a lot more complicated :).kris wrote:Sure, but it's generally more efficient to do all logical and arithmetic operations in the native width of the device anyway ~ generally 32bits for current D compilers. If you're talking about issues related to actually storing a bool result, then that's part of the "concerns" noted above. Bool values derived in certains ways may need to be folded for storage, but not for testing. The subtraction case above may be included in that group, but testing should still only require a compare against zero (for both true and false). I'm suggesting only that zero values should *always* be used to test for 'truth' ~ never 1, or 255, or any value other than zero. Anywhere the keyword "true" is used (or implied) for comparative purposes, test against zero and invert the jmp-condition instead. If that's not done already, it would probably speed things up in many cases.Walter Bright wrote:Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.Stewart Gordon wrote:So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
Jul 29 2006
Walter Bright wrote:Stewart Gordon wrote:If anything resembling the above, then return this.bar-foo.bar?0:1; which cancels out the advantage you mention next: <snip>xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsThe only difference between a CMP and a SUB instruction is where the result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a register takes more instructions.<snip>How is this (a == b) rather than (a != b)? Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
Jul 29 2006
Stewart Gordon wrote:Walter Bright wrote:? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.Stewart Gordon wrote:If anything resembling the above, then return this.bar-foo.bar?0:1;xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsI don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?
Jul 29 2006
Walter Bright wrote:Stewart Gordon wrote:(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?Walter Bright wrote:I don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?
Jul 30 2006
Deewiant wrote:Walter Bright wrote:Oh, I see what you mean. To invert the result would take another 2 instructions for a total of 3, still less than 4.Stewart Gordon wrote:(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?Walter Bright wrote:I don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?
Jul 30 2006
Walter Bright wrote:Deewiant wrote:Exactly. But because what we have is opEquals and not opNotEquals, the benefit of fewer instructions is lost (except when opEquals is simple enough that the compiler can inline and optimise away the double negation). Indeed, on this basis, if we had opNotEquals then it would be just be equivalent to opCmp for many types. So I can see people thinking that opNotEquals should just call opCmp by default. However, there's a problem with this idea - for classes that have no ordering, even the current behaviour of comparing object references would have to be explicitly programmed in. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.Walter Bright wrote:Oh, I see what you mean. To invert the result would take another 2 instructions for a total of 3, still less than 4.Stewart Gordon wrote:(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?Walter Bright wrote:I don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?
Jul 30 2006
Walter Bright wrote:Stewart Gordon wrote:As per the other posts, Eq2 actually takes 2 instructions: Eq2: ... sub EAX,0Ch[ECX] not EAX; And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! : Eq1: ... cmp EAX,0Ch[ECX] sete EAX; (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html) It seems to me perfectly valid, is there any problem here? What does the original Eq1 even do? : sub ECX,0Ch[EDX] cmp ECX,1 // Huh? sbb EAX,EAX neg EAX -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DWalter Bright wrote:? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.Stewart Gordon wrote:If anything resembling the above, then return this.bar-foo.bar?0:1;xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsI don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?
Jul 30 2006
Bruno Medeiros wrote:Walter Bright wrote:Yes indeed. Well spotted! On anything supporting the 386 instruction set (and D is targeted for 32-bit devices only), there's really no performance advantage in returning an int over returning a bool. This should be addressed, so that some of the core APIs can be cleaned up appropriately?Stewart Gordon wrote:As per the other posts, Eq2 actually takes 2 instructions: Eq2: ... sub EAX,0Ch[ECX] not EAX; And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! : Eq1: ... cmp EAX,0Ch[ECX] sete EAX; (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html) It seems to me perfectly valid, is there any problem here?Walter Bright wrote:? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.Stewart Gordon wrote:If anything resembling the above, then return this.bar-foo.bar?0:1;xs0 wrote: <snip>What instruction sequence do expect to be generated for it?Well, I'm just guessing, but I think something like > int opEquals(Foo foo) > { > return this.bar == foo.bar; > } is compiled to something likeIf it does this, then there's a serious bug in the compiler.return this.bar-foo.bar; // 1 instructionbut if the return type is bool, it becomesreturn this.bar-foo.bar?1:0; // 3 instructionsI don't understand your question.I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.How is this (a == b) rather than (a != b)?What does the original Eq1 even do? : sub ECX,0Ch[EDX] cmp ECX,1 // Huh? sbb EAX,EAX neg EAXThat's old-skool, pre-386 hacking :)
Jul 30 2006
Bruno Medeiros wrote:And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! : Eq1: ... cmp EAX,0Ch[ECX] sete EAX; (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html) It seems to me perfectly valid, is there any problem here?Interesting instruction. Seems to have the exact semantics needed for these situations. You'd almost think CPU designers care about what people want to do with their products :P.What does the original Eq1 even do? :Step by step:mov ECX,0Ch[EAX](You skipped this one) Loads this.bar into ECX.sub ECX,0Ch[EDX]Subtracts foo.bar from ECX.cmp ECX,1 // Huh?Among other things, sets borrow (aka carry) flag if ECX == 0 (i.e. if foo.bar == this.bar), clears it otherwise.sbb EAX,EAXSubtracts (EAX + borrow) from EAX, setting it to either -1 (if carry == 1) or 0 (if carry == 0).neg EAXNegates EAX. A bit weird at first glance, but it works as advertised :). But indeed, a cmp/sete combo seems to do the same in less instructions.
Jul 30 2006
But indeed, a cmp/sete combo seems to do the same in less instructions.But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.
Aug 07 2006
Lionello Lunesu wrote:Heh... You may have noticed I didn't use any word related to speed :). The reason for that is that I don't know much about optimization for speed, especially where pipelines etc. are involved... Hardware is weird.But indeed, a cmp/sete combo seems to do the same in less instructions.But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster).
Aug 07 2006
Lionello Lunesu wrote:If you'd looked at the setne instruction linked previously, you'd have seen that it consumes 3 cycles. And no; there are no jump, loops, or any other reason to cause pipeline bubbles. If you need a primer on what causes modern CPUs to stall (the silly P4 in particular) then you could do a lot worse than to read the articles by Jon Stokes at ArsTechnica. Oh, and this is just daft. Why don't we all count the cycles for a call/return instead? Or, perhaps just exactly what it costs to compare the bytes of two strings until they start to look different? You'll find the cost of setne (and probably even the prior "extra" three instructions for boolean support) is relegated to background noise. Let's face it: int is likely used instead of bool for historical reasons; probably just an artifact left over from pre-80386 days. Would be nice to get that codegen cleaned up ~ especially since it was W who claimed the reasons were performance related. Hacking the high-level code with int vs boolean, just to reflect some archaic machine instruction, is one of those things that come under the umbrella of "premature optimization".But indeed, a cmp/sete combo seems to do the same in less instructions.But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.
Aug 07 2006
kris wrote:Lionello Lunesu wrote:Yea, AFAIK setne is supported by 386 onward, plus a quick check of the GDC code that uses it seems to indicate it is faster (from the Eq1 and Eq2 samples earlier in the thread). But you're right - in many cases it will probably be background noise anyhow 'cause you only save a couple of cycles. As an aside, I think the current DMD backend may be well suited to the new Dual Core CPU because it hasn't been chasing after optimum performance on the P4 with it's 20 stage pipeline or whatever <g>If you'd looked at the setne instruction linked previously, you'd have seen that it consumes 3 cycles. And no; there are no jump, loops, or any other reason to cause pipeline bubbles. If you need a primer on what causes modern CPUs to stall (the silly P4 in particular) then you could do a lot worse than to read the articles by Jon Stokes at ArsTechnica. Oh, and this is just daft. Why don't we all count the cycles for a call/return instead? Or, perhaps just exactly what it costs to compare the bytes of two strings until they start to look different? You'll find the cost of setne (and probably even the prior "extra" three instructions for boolean support) is relegated to background noise. Let's face it: int is likely used instead of bool for historical reasons; probably just an artifact left over from pre-80386 days. Would be nice to get that codegen cleaned up ~ especially since it was W who claimed the reasons were performance related. Hacking the high-level code with int vs boolean, just to reflect some archaic machine instruction, is one of those things that come under the umbrella of "premature optimization".But indeed, a cmp/sete combo seems to do the same in less instructions.But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.
Aug 07 2006
Bruno Medeiros wrote:What does the original Eq1 even do? : sub ECX,0Ch[EDX] cmp ECX,1 // Huh? sbb EAX,EAX neg EAX[PS: I've read Frits answer after writing this: ] Ah I get it now... wasn't understanding what borrow (the mathematical notion) was, since I'm not a native english speaker. Nothing a wikipedia lookup didn't solve. So, correct me if I'm wrong: (when I say EDX I mean 0Ch[EDX] or whatever) // sets the carry flag if zero flag is on, // that is, if ECX == EDX (from previous instruction) cmp ECX,1 // sets EAX as zero and also subtracts one if carry flag is set // that is, EAX = -1 if ECX == EDX and EAX = 0 if ECX != EDX sbb EAX,EAX // two's complement negation of EAX, 0 becomes 0, -1 becomes 1 neg EAX // end result: EAX = 1 if ECX == EDX and EAX = 0 if ECX != EDX So yeah, it seems these 3 instructions do the same as SETE ... ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
Bruno Medeiros wrote:But the question remains, is it then less efficient to return a byte than a int?Yes. It's also less efficient to constrain the results to 0 or 1.Why?Consider: a = 0x1000; b = 0x2000; Now convert (a == b) into a bool. If the result is an int, I can just do (a - b), one instruction. Converting it to a byte, or to 1 or 0, takes more.And if so isn't there a way for the compiler to somehow optimize it?The math is inevitable <g>.I find it a bit hard to believe that nowadays there isn't sufficient compiler and/or CPU technology to somehow make a bool(byte) return value as efficient as a int one. :/I work with what the CPU makes available. P.S. Inevitably, some will ask "who cares" about these small efficiencies. The trouble is, these kinds of things often appear in tight loops, where small inefficiencies get multiplied by millions.
Jul 28 2006
Walter Bright wrote:Bruno Medeiros wrote:Well, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss. The only shortcoming I see is that it would be slower to compare two bool /variables/: (b1 == b2) that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?) Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false). Or is there another reason for the 0,1 restriction? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DBut the question remains, is it then less efficient to return a byte than a int?Yes. It's also less efficient to constrain the results to 0 or 1.Why?Consider: a = 0x1000; b = 0x2000; Now convert (a == b) into a bool. If the result is an int, I can just do (a - b), one instruction. Converting it to a byte, or to 1 or 0, takes more.And if so isn't there a way for the compiler to somehow optimize it?The math is inevitable <g>.
Jul 30 2006
Bruno Medeiros wrote:Well, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss. The only shortcoming I see is that it would be slower to compare two bool /variables/: (b1 == b2) that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?) Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false).I think most programmers would find this to be very surprising behavior. I know I would.
Jul 30 2006
Walter Bright wrote:Bruno Medeiros wrote:Surprising behavior? What surprising behavior, those are all implementation details, they have not a bearing on language/program behavior. And how about the alternative of using the SETE instruction for bool restriction?, you haven't commented on that yet... -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DWell, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss. The only shortcoming I see is that it would be slower to compare two bool /variables/: (b1 == b2) that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?) Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false).I think most programmers would find this to be very surprising behavior. I know I would.
Aug 01 2006
Walter Bright wrote:P.S. Inevitably, some will ask "who cares" about these small efficiencies. The trouble is, these kinds of things often appear in tight loops, where small inefficiencies get multiplied by millions.I consider this kind of stuff the compilers job -- so if I write or maintain code that is slow, I know there is probably something I can do about it w/o having to drop into assembly. Personally I've spent a huge amount of time tuning code and I can't tell you the positive effect that has on end-users. IMHO bad performance is often the "forgotten bug" (that's not to say the budget should be busted on that "last 20%" either though). - Dave
Jul 31 2006