D - Objection: bsf() return value
- Georg Wrede (15/20) May 15 2003 Both return the bit number of the first set bit. The return value is und...
- Georg Wrede (2/22) May 16 2003
- Walter (11/31) May 18 2003 bsr and bsf are intended as the simplest possible shell around the x86 b...
- Georg Wrede (38/82) May 19 2003 I've thought about this, and I've given up. The original thought went s...
- Walter (25/62) May 19 2003 In general, you are right. However, the bsr and bsf intrinsics came abou...
- Georg Wrede (40/45) May 20 2003 Ahah. Thanks, I seem to learn a lot here!
- Walter (11/23) May 20 2003 not
- Serge K (17/32) May 21 2003 making a
The spec says:int bsf(uint v)Scans the bits in v starting with bit 0, looking for the first set bit.int bsr(uint v)Scans the bits in v from the most significant bit to the least significant bit, looking for the first set bit.Both return the bit number of the first set bit. The return value is undefined if v is zero. Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser. Bonus: this change would not break existing code. And it works the same with any practical size v.
May 15 2003
It sounds like I'm talking about bsr only, but of course this goes for both. In article <ba0fr5$1d2d$1 digitaldaemon.com>, Georg Wrede says...The spec says:int bsf(uint v)Scans the bits in v starting with bit 0, looking for the first set bit.int bsr(uint v)Scans the bits in v from the most significant bit to the least significant bit, looking for the first set bit.Both return the bit number of the first set bit. The return value is undefined if v is zero. Am I the only one who finds this unnerving? Would it not be more appropriate if the return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but the test itself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check in advance for zero value would not be the wiser. Bonus: this change would not break existing code. And it works the same with any practical size v.
May 16 2003
bsr and bsf are intended as the simplest possible shell around the x86 bsr and bsf opcodes. The compiler recognizes those functions and they're well integrated into the optimizer and code generator. This makes them great as building blocks for more advanced operations. "Georg Wrede" <Georg_member pathlink.com> wrote in message news:ba0fr5$1d2d$1 digitaldaemon.com...The spec says:undefinedint bsf(uint v)Scans the bits in v starting with bit 0, looking for the first set bit.int bsr(uint v)Scans the bits in v from the most significant bit to the least significant bit, looking for the first set bit.Both return the bit number of the first set bit. The return value isif v is zero. Am I the only one who finds this unnerving? Would it not be moreappropriate ifthe return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but thetestitself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check inadvance forzero value would not be the wiser. Bonus: this change would not break existing code. And it works the samewith anypractical size v.
May 18 2003
I've thought about this, and I've given up. The original thought went something like: If speed here is really, really important for the programmer, then he'd probably code the parts of his routine in assembler anyhow. On the other hand, usually bsr would be used when examining data where you are not sure of which bits are set. In normal circumstances this might include the uint being entirely zero, for all he knows. In most cases where he has at least some idea of the bits, he'd probably test for individual bits instead. Using bsr then is restricted to cases where you can guarantee that the uint is not entirely zero. This can be achieved either by testing for zero in D code, or by making the algorithm robust enough to never, ever encounter the zero case. The former case is bound to be way slower than having bsr return -1 and testing in D for -1 only after testing for the "valid" cases. The latter case _requires_ the D code to be solid, which might be perfectly acceptable in a language like C or C++ (offence intended against them). D, however seems to avoid unnecessary "catches". (For which we all love Walter, right?) Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have no strong opinion whether the added clocks for testing for zero and returning -1 would be inappropriate here. Also, since these are "Intrinsics", the average user is implicitly discouraged from using them? Of course, if we leave bsr as-is, the compiler might be made to recogize and optimise idioms like if (v) { a = bsr(v); if (a < b) { // do something else if (a > c) { // do something } } else { throw new Error("Who stole all our bits?"); } In hindsight, it seems to me the above code already compiles to something just as fast as what could be achieved by changig the bsr function. So even making a non-intrinsic bsr that does the -1 thing would be of no value. In article <ba9vj3$ee0$1 digitaldaemon.com>, Walter says...bsr and bsf are intended as the simplest possible shell around the x86 bsr and bsf opcodes. The compiler recognizes those functions and they're well integrated into the optimizer and code generator. This makes them great as building blocks for more advanced operations. "Georg Wrede" <Georg_member pathlink.com> wrote in message news:ba0fr5$1d2d$1 digitaldaemon.com...The spec says:undefinedint bsf(uint v)Scans the bits in v starting with bit 0, looking for the first set bit.int bsr(uint v)Scans the bits in v from the most significant bit to the least significant bit, looking for the first set bit.Both return the bit number of the first set bit. The return value isif v is zero. Am I the only one who finds this unnerving? Would it not be moreappropriate ifthe return value were -1 if v is zero? This way no zero-check in programmer code is needed before bsr. And since immediately after bsr there usually is a branch anyhow, this would be the logical place to handle the zero case too. Maybe it is for performance reasons? Yes, bsr is faster this way, but thetestitself has to be done somewhere anyhow. Besides, if the return value is undefined, then there is all the reason to fear that bsr could return a reasonable looking value, and the programmer who forgot to check inadvance forzero value would not be the wiser. Bonus: this change would not break existing code. And it works the samewith anypractical size v.
May 19 2003
In general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so were not good candidates for recoding in assembler. Having direct access to bsr and bsf got about a 10% improvement in compiler optimization speed. "Georg Wrede" <Georg_member pathlink.com> wrote in message news:baae7r$smk$1 digitaldaemon.com...I've thought about this, and I've given up. The original thought wentsomethinglike: If speed here is really, really important for the programmer, then he'dprobablycode the parts of his routine in assembler anyhow. On the other hand,usuallybsr would be used when examining data where you are not sure of which bitsareset. In normal circumstances this might include the uint being entirelyzero,for all he knows. In most cases where he has at least some idea of thebits,he'd probably test for individual bits instead. Using bsr then is restricted to cases where you can guarantee that theuint isnot entirely zero. This can be achieved either by testing for zero in Dcode, orby making the algorithm robust enough to never, ever encounter the zerocase.The former case is bound to be way slower than having bsr return -1 andtestingin D for -1 only after testing for the "valid" cases. The latter case _requires_ the D code to be solid, which might beperfectlyacceptable in a language like C or C++ (offence intended against them). D, however seems to avoid unnecessary "catches". (For which we all loveWalter,right?) Since I'm no Intel hacker, (I did only 6502 in the early '80s) I have nostrongopinion whether the added clocks for testing for zero and returning -1would beinappropriate here. Also, since these are "Intrinsics", the average user is implicitlydiscouragedfrom using them? Of course, if we leave bsr as-is, the compiler might be made to recogizeandoptimise idioms like if (v) { a = bsr(v); if (a < b) { // do something else if (a > c) { // do something } } else { throw new Error("Who stole all our bits?"); } In hindsight, it seems to me the above code already compiles to somethingjustas fast as what could be achieved by changig the bsr function. So evenmaking anon-intrinsic bsr that does the -1 thing would be of no value.
May 19 2003
In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...In general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so were not good candidates for recoding in assembler. Having direct access to bsr and bsf got about a 10% improvement in compiler optimization speed.Ahah. Thanks, I seem to learn a lot here! This means that when one has a bitfield representing boolean values, say for whether to execute some from a set of subroutines, one gets to test for several falses at one go with bsf -- if several of them are zero you can skip testing those individually. Knowing (or, rather suspecting) that one day they might all be false, all one has to do is reserve one bit that always stays true, i.e. is never actually used. This lets one skip the zero test, which saves a lot of time in a tight loop. Careful desicions about which bit represents which action seems key here. Say one uses the msb for always-true, then the most likely falses should be at the other end, near lsb. But even this is not enough, since in real life the things the bits represent are not genuinely mutually independent. This means that if bit X is true, then the probabilities for the other bits being true are different than if X is false. This affects ordering decisions, especially if bsf is going to be used more than once on the bit set. This might be useful when a specific bit being true implies lowered probability for several of the following to be false. A rigorous theoretical analysis would of course be required here, but in real life one seldom has this option. Then the answer is "profiling, profiling, profiling!" But before that, dumping the bitset in a log from several test runs could give invaluable statistical data about probabilities and correlations, which will make the order decisions made between profiling runs more profitable. At this time it is also possible to change the meaning of the bits. Sure, true should always mean yes, but here it would be worthwhile to use false to mean yes for some bits to further increase the number of bits skipped by bsf. The yes-no meaning and the ordering of the bits can even be changed at runtime. All one needs is a reshuffle (maybe with a lookup table), and an xor for the changing of yes-no for some bits. A matching inline copy of the branch logic is then used instead of the original. I wonder if these optimizations are the very reason bsr and bsf exist in CPUs? --- Why am I littering this important mailgroup with these ruminations? Well, I can assume some readers have not thought about this, maybe since they have a non-C(++) or non-asm bacground. D is especially attractive to those who have avoided C(++), but still have always wanted to use a Real programming language. Sharing is gold. D makes it easy and straightforward to use this technique to gain execution speed.
May 20 2003
"Georg Wrede" <Georg_member pathlink.com> wrote in message news:badevg$2mb0$1 digitaldaemon.com...In article <bab492$1k6u$1 digitaldaemon.com>, Walter says...notIn general, you are right. However, the bsr and bsf intrinsics came about because I use bit vectors extensively in the compiler's optimizer implementation. The functions in it were large and complex, and so wereandgood candidates for recoding in assembler. Having direct access to bsrforbsf got about a 10% improvement in compiler optimization speed.Ahah. Thanks, I seem to learn a lot here! This means that when one has a bitfield representing boolean values, saywhether to execute some from a set of subroutines, one gets to test forseveralfalses at one go with bsf -- if several of them are zero you can skiptestingthose individually.What I was doing was looking for any bits set in a very large bit vector. There were two loops, the first did a scasd to find a non-zero 32 bit value, the inner was replaced with a bsf. So the bsf was only run on non-zero values.
May 20 2003
Of course, if we leave bsr as-is, the compiler might be made to recogizeandoptimise idioms like if (v) { a = bsr(v); if (a < b) { // do something else if (a > c) { // do something } } else { throw new Error("Who stole all our bits?"); } In hindsight, it seems to me the above code already compiles to somethingjustas fast as what could be achieved by changig the bsr function. So evenmaking anon-intrinsic bsr that does the -1 thing would be of no value.Actually, both BSR & BSF are conditional operations, in case of a zero input they do nothing. It's quite easy to make them return some predefined values in such case. Here is how I do that (in MSVC): inline int bsf( int x ) { __asm mov eax, 32 // if(!x) return 32; __asm bsf eax, x } inline int bsr( int x ) { __asm mov eax, -1 // if(!x) return -1; __asm bsr eax, x }
May 21 2003