digitalmars.D - 0 is not a power of 2

Andrei Alexandrescu (19/19) May 18 2015 So there's this classic trick:

Brian Schott (5/6) May 18 2015 Unless I'm mistaken, any uint that's a power of 2 will only have

Andrei Alexandrescu (3/8) May 18 2015 There are complaints it's not that fast, e.g.

H. S. Teoh via Digitalmars-d (8/12) May 18 2015 [...]

Andrei Alexandrescu (21/30) May 18 2015 Excerpt from std.experimental.allocator.common:

Johan Engelen (7/39) May 19 2015 I wish we had something like clang's -Wparenthesis.

Steven Schveighoffer (3/12) May 19 2015 It's bitwise or, not logic or.

Atila Neves (4/25) May 19 2015 Aren't predictable branches cheap on current architectures?

Martin Nowak (7/8) May 19 2015 Yes they are, and it seems one would rarely if ever call isPowOf2

Andrei Alexandrescu (4/11) May 19 2015 Yah measuring on-line is clearly the way to go. A comment on the

John Colvin (33/54) May 19 2015 I tested with a few different (modern) backends to see what was

safety0ff (7/19) May 19 2015 I think you used:

John Colvin (5/26) May 19 2015 I used what andrei posted, but yes i do see the difference. How

Almighty Bob (8/16) May 19 2015 If you dont mind asm then after you do...

Almighty Bob (3/23) May 19 2015 Oops, should be either add the carry onto x, or subtract the

w0rp (7/7) May 19 2015 I believe you can also do x & -x == x. I'm not sure if that will

Almighty Bob (2/3) May 19 2015 I think that still returns true for x = 0.

w0rp (2/5) May 19 2015 You are right. Disregard that.

Nicholas Wilson (4/10) May 19 2015 I found
Steven Schveighoffer (9/25) May 19 2015 Is it just me, or is it odd that your first version generates xor and a

Steven Schveighoffer (7/34) May 19 2015 Nevermind, xor is just zeroing a register.

Matthias Bentrup (5/6) May 19 2015 When we're talking about modulo 2^n arithmetic, 0 is in fact a
Marco Leise (22/37) May 19 2015 Any faster ?! This is already minimal assembly code with

Zoadian (3/39) May 19 2015 +1 ;)
Andrei Alexandrescu (4/7) May 19 2015 Yah, there are plenty of those in

Timon Gehr (9/16) May 19 2015 In case the range of s is such that divideRoundUp is actually good

deadalnix (3/3) May 19 2015 Have you tried things like :

Steven Schveighoffer (7/10) May 19 2015 Hm.. I think this would always succeed. Perhaps you mean:

deadalnix (4/12) May 19 2015 Both work as long as you use a fully defined instruction, like

Steven Schveighoffer (6/19) May 19 2015 Hm... I messed up, (x >> bsr(x)) is always zero. I think you meant to wr...

deadalnix (7/30) May 19 2015 No.

Steven Schveighoffer (17/47) May 21 2015 Gah, I messed up, used output from old code that wasn't doing what I

deadalnix (8/25) May 21 2015 Ha yes. You'd want to use TZCNT.

Steven Schveighoffer (7/9) May 21 2015 Hm... bsf does work in your original code, I'm thinking you may have

Brian Schott (5/8) May 19 2015 https://issues.dlang.org/show_bug.cgi?id=14380

deadalnix (2/12) May 19 2015 Why ain't we generating a tzcnt ?

Matthias Bentrup (6/6) May 19 2015 I think you can make the over/underflow at zero work in your

John Colvin (2/8) May 20 2015 Very nice

Temtaime (4/4) May 20 2015 First version isn't any slow. It's clear and can be optimized

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/18) May 20 2015 Yes, and besides, if one cares about these minor performance

Andrei Alexandrescu (4/9) May 20 2015 Nice code with dmd and gdc. Thanks!
Dominikus Dittes Scherkl (5/11) May 21 2015 The cool thing is that also the over/underflow of "-" at

Jay Norwood (15/15) May 21 2015 This formula measures a little faster on dmd. Release build,

Jay Norwood (7/9) May 22 2015 00F81005 mov eax,edx

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

So there's this classic trick:

bool isPowerOf2(uint x)
{
     return (x & (x - 1)) == 0;
}

Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:

bool isPowerOf2(uint x)
{
     return x && (x & (x - 1)) == 0;
}

But that has branches in it. So I came up with:

bool isPowerOf2(uint x)
{
     return (x & (x - 1) | !x) == 0;
}

which has no branches at least with dmd, see http://goo.gl/TVkCwc.

Any ideas for faster code?


Thanks,

Andrei

May 18 2015

"Brian Schott" <briancschott gmail.com> writes:

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 Any ideas for faster code?

Unless I'm mistaken, any uint that's a power of 2 will only have 
a single set bit, so why not use the "popcnt" instruction?

May 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/18/15 10:40 PM, Brian Schott wrote:
 On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu wrote:
 Any ideas for faster code?

 Unless I'm mistaken, any uint that's a power of 2 will only have a
 single set bit, so why not use the "popcnt" instruction?


There are complaints it's not that fast, e.g. 
http://danluu.com/assembly-intrinsics/. -- Andrei

May 18 2015

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 bool isPowerOf2(uint x)
 {
     return (x & (x - 1) | !x) == 0;
 }

[...]

Are you sure that's correct? Doesn't that return true for all non-zero
numbers?


T

-- 
"Uhh, I'm still not here." -- KD, while "away" on ICQ.

May 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/18/15 10:37 PM, H. S. Teoh via Digitalmars-d wrote:
 On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via
Digitalmars-d wrote:
 [...]
 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }

 [...]

 Are you sure that's correct? Doesn't that return true for all non-zero
 numbers?

Excerpt from std.experimental.allocator.common:

package bool isPowerOf2(uint x)
{
     return (x & (x - 1) | !x) == 0;
}

unittest
{
     assert(!isPowerOf2(0));
     assert(isPowerOf2(1));
     assert(isPowerOf2(2));
     assert(!isPowerOf2(3));
     assert(isPowerOf2(4));
     assert(!isPowerOf2(5));
     assert(!isPowerOf2(6));
     assert(!isPowerOf2(7));
     assert(isPowerOf2(8));
     assert(!isPowerOf2(9));
     assert(!isPowerOf2(10));
}


Andrei

May 18 2015

"Johan Engelen" <j j.nl> writes:

On Tuesday, 19 May 2015 at 05:51:27 UTC, Andrei Alexandrescu 
wrote:
 On 5/18/15 10:37 PM, H. S. Teoh via Digitalmars-d wrote:
 On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu 
 via Digitalmars-d wrote:
 [...]
 bool isPowerOf2(uint x)
 {
     return (x & (x - 1) | !x) == 0;
 }

 [...]

 Are you sure that's correct? Doesn't that return true for all 
 non-zero
 numbers?

 Excerpt from std.experimental.allocator.common:

 package bool isPowerOf2(uint x)
 {
     return (x & (x - 1) | !x) == 0;
 }

 unittest
 {
     assert(!isPowerOf2(0));
     assert(isPowerOf2(1));
     assert(isPowerOf2(2));
     assert(!isPowerOf2(3));
     assert(isPowerOf2(4));
     assert(!isPowerOf2(5));
     assert(!isPowerOf2(6));
     assert(!isPowerOf2(7));
     assert(isPowerOf2(8));
     assert(!isPowerOf2(9));
     assert(!isPowerOf2(10));
 }

I wish we had something like clang's -Wparenthesis.
I think
     return ( (x & (x-1)) | !x ) == 0;
is much clearer here.

  -Johan

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 1:37 AM, H. S. Teoh via Digitalmars-d wrote:
 On Mon, May 18, 2015 at 10:16:47PM -0700, Andrei Alexandrescu via
Digitalmars-d wrote:
 [...]
 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }

 [...]

 Are you sure that's correct? Doesn't that return true for all non-zero
 numbers?

It's  bitwise or, not logic or.

-Steve

May 19 2015

"Atila Neves" <atila.neves gmail.com> writes:

Aren't predictable branches cheap on current architectures?

Atila

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
     return (x & (x - 1)) == 0;
 }

 Pretty neat, but it wrongly returns true for x == 0. So the 
 obvious fix is:

 bool isPowerOf2(uint x)
 {
     return x && (x & (x - 1)) == 0;
 }

 But that has branches in it. So I came up with:

 bool isPowerOf2(uint x)
 {
     return (x & (x - 1) | !x) == 0;
 }

 which has no branches at least with dmd, see 
 http://goo.gl/TVkCwc.

 Any ideas for faster code?


 Thanks,

 Andrei

May 19 2015

"Martin Nowak" <code dawg.eu> writes:

On Tuesday, 19 May 2015 at 07:56:27 UTC, Atila Neves wrote:
 Aren't predictable branches cheap on current architectures?

Yes they are, and it seems one would rarely if ever call isPowOf2 
with 0 in std.allocator. A good thing to do, is to use a good 
hardware event profiler like perf, and record the branch misses. 
Perf is also the right tool to compare the branchless version (I 
doubt it's significantly better, especially since the felt of the 
0 branch is just a constant).

May 19 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/19/15 2:35 AM, Martin Nowak wrote:
 On Tuesday, 19 May 2015 at 07:56:27 UTC, Atila Neves wrote:
 Aren't predictable branches cheap on current architectures?

 Yes they are, and it seems one would rarely if ever call isPowOf2 with 0
 in std.allocator. A good thing to do, is to use a good hardware event
 profiler like perf, and record the branch misses. Perf is also the right
 tool to compare the branchless version (I doubt it's significantly
 better, especially since the felt of the 0 branch is just a constant).

Yah measuring on-line is clearly the way to go. A comment on the 
branches - the branch predictor has a limited capacity so branching here 
might take resources away from other places. -- Andrei

May 19 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
     return (x & (x - 1)) == 0;
 }

 Pretty neat, but it wrongly returns true for x == 0. So the 
 obvious fix is:

 bool isPowerOf2(uint x)
 {
     return x && (x & (x - 1)) == 0;
 }

 But that has branches in it. So I came up with:

 bool isPowerOf2(uint x)
 {
     return (x & (x - 1) | !x) == 0;
 }

 which has no branches at least with dmd, see 
 http://goo.gl/TVkCwc.

 Any ideas for faster code?


 Thanks,

 Andrei

I tested with a few different (modern) backends to see what was 
generated, they all essentially give you this (gcc 5.1.0 -O3 
-march=broadwell):

isPowerOf2:
	xorl	%eax, %eax
	testl	%edi, %edi
	je	.L5
	blsr	%edi, %edi
	testl	%edi, %edi
	sete	%al
.L5:
	ret
isPowerOf2b:
	blsr	%edi, %edx
	xorl	%eax, %eax
	testl	%edi, %edi
	sete	%al
	orl	%eax, %edx
	sete	%al
	ret

but if your not restricting the build to such recent processor, 
you can't emit BLSR, so you get this instead (gcc 5.1.0 -O3):

isPowerOf2b:
	leal	-1(%rdi), %eax
	xorl	%edx, %edx
	andl	%edi, %eax
	testl	%edi, %edi
	sete	%dl
	orl	%edx, %eax
	sete	%al
	ret

May 19 2015

"safety0ff" <safety0ff.dev gmail.com> writes:

On Tuesday, 19 May 2015 at 08:28:11 UTC, John Colvin wrote:
 I tested with a few different (modern) backends to see what was 
 generated, they all essentially give you this (gcc 5.1.0 -O3 
 -march=broadwell):

 isPowerOf2:
 	xorl	%eax, %eax
 	testl	%edi, %edi
 	je	.L5
 	blsr	%edi, %edi
 	testl	%edi, %edi
 	sete	%al
 .L5:
 	ret

I think you used:
     return x && (x & (x - 1)) == 0;
instead of
     return (x & (x - 1)) == 0 && x;

Which influences code generation (more weight on the x == 0 
test,) hence the branch.

May 19 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Tuesday, 19 May 2015 at 15:39:16 UTC, safety0ff wrote:
 On Tuesday, 19 May 2015 at 08:28:11 UTC, John Colvin wrote:
 I tested with a few different (modern) backends to see what 
 was generated, they all essentially give you this (gcc 5.1.0 
 -O3 -march=broadwell):

 isPowerOf2:
 	xorl	%eax, %eax
 	testl	%edi, %edi
 	je	.L5
 	blsr	%edi, %edi
 	testl	%edi, %edi
 	sete	%al
 .L5:
 	ret

 I think you used:
     return x && (x & (x - 1)) == 0;
 instead of
     return (x & (x - 1)) == 0 && x;

 Which influences code generation (more weight on the x == 0 
 test,) hence the branch.

I used what andrei posted, but yes i do see the difference. How 
infuriating. A compiler that will entirely restructure your code 
leaving you with something unrecognisable in many cases, but in 
others is sensitive to the order of operands around an &&.

May 19 2015

"Almighty Bob" <bob almighty.com> writes:

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
     return (x & (x - 1)) == 0;
 }

 which has no branches at least with dmd, see 
 http://goo.gl/TVkCwc.

 Any ideas for faster code?

If you dont mind asm then after you do...

tmp = x-1;

you could add the borrow/carry flag back onto the tmp, so it'd 
add back up to zero any time there's an underflow in the (x-1) op.

So two extra instructions, (you need a zero for the ADC) no 
branch.

May 19 2015

"Almighty Bob" <bob almighty.com> writes:

On Tuesday, 19 May 2015 at 09:34:04 UTC, Almighty Bob wrote:
 On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
 wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
    return (x & (x - 1)) == 0;
 }

 which has no branches at least with dmd, see 
 http://goo.gl/TVkCwc.

 Any ideas for faster code?

 If you dont mind asm then after you do...

 tmp = x-1;

 you could add the borrow/carry flag back onto the tmp, so it'd 
 add back up to zero any time there's an underflow in the (x-1) 
 op.

 So two extra instructions, (you need a zero for the ADC) no 
 branch.

Oops, should be either add the carry onto x, or subtract the 
carry from tmp. Either way the result of the & is non zero.

May 19 2015

"w0rp" <devw0rp gmail.com> writes:

I believe you can also do x & -x == x. I'm not sure if that will 
be actually faster or slower. You could maybe cut the 
instructions down a little with an asm{} block. The compiler 
might not figure out that it can re-use a register for x on the 
left and x on the right there. You might use popcnt in a 
version() block too, so you can use the instruction when you've 
got it.

May 19 2015

"Almighty Bob" <bob almighty.com> writes:

On Tuesday, 19 May 2015 at 10:59:53 UTC, w0rp wrote:
 I believe you can also do x & -x == x.

I think that still returns true for x = 0.

May 19 2015

"w0rp" <devw0rp gmail.com> writes:

On Tuesday, 19 May 2015 at 12:00:30 UTC, Almighty Bob wrote:
 On Tuesday, 19 May 2015 at 10:59:53 UTC, w0rp wrote:
 I believe you can also do x & -x == x.

 I think that still returns true for x = 0.

You are right. Disregard that.

May 19 2015

"Nicholas Wilson" <iamthewilsonator hotmail.com> writes:

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 ...
 Any ideas for faster code?


 Thanks,

 Andrei

I found 
www.davespace.co.uk/blog/20150131-branchless-sequences.html
and its links to be useful and interesting.

Nic

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 1:16 AM, Andrei Alexandrescu wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
      return (x & (x - 1)) == 0;
 }

 Pretty neat, but it wrongly returns true for x == 0. So the obvious fix is:

 bool isPowerOf2(uint x)
 {
      return x && (x & (x - 1)) == 0;
 }

 But that has branches in it. So I came up with:

 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }

 which has no branches at least with dmd, see http://goo.gl/TVkCwc.

Is it just me, or is it odd that your first version generates xor and a 
bunch of jumps?

I don't see xor anywhere in the "fast" version.

Another possibility is to avoid the zero check altogether. There may be 
cases where you already know before calling isPowerOf2 that it's not 
zero, and checking for zero again is wasteful.

Note that bsr is undefined for 0, so there is precedent for this.

-Steve

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 8:46 AM, Steven Schveighoffer wrote:
 On 5/19/15 1:16 AM, Andrei Alexandrescu wrote:
 So there's this classic trick:

 bool isPowerOf2(uint x)
 {
      return (x & (x - 1)) == 0;
 }

 Pretty neat, but it wrongly returns true for x == 0. So the obvious
 fix is:

 bool isPowerOf2(uint x)
 {
      return x && (x & (x - 1)) == 0;
 }

 But that has branches in it. So I came up with:

 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }

 which has no branches at least with dmd, see http://goo.gl/TVkCwc.

 Is it just me, or is it odd that your first version generates xor and a
 bunch of jumps?

 I don't see xor anywhere in the "fast" version.

Nevermind, xor is just zeroing a register.

I will note though, changing the slow version to:

if(x) return (x & (x-1)) == 0;
return 0;

this reduces the jumps by 2.

-Steve

May 19 2015

"Matthias Bentrup" <matthias.bentrup googlemail.com> writes:

On Tuesday, 19 May 2015 at 05:16:48 UTC, Andrei Alexandrescu 
wrote:
 [...], but it wrongly returns true for x == 0.

When we're talking about modulo 2^n arithmetic, 0 is in fact a 
power of two.

Proof: 2^n mod 2^n == 0.

May 19 2015

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 18 May 2015 22:16:47 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 But that has branches in it. So I came up with:
 
 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }
 
 which has no branches at least with dmd, see http://goo.gl/TVkCwc.
 
 Any ideas for faster code?
 
 
 Thanks,
 
 Andrei

Any faster ?! This is already minimal assembly code with
pretty looks!

While you are at it, you might also need "round pointer up to"
and "round pointer down to", which can be implemented with bit
ops for power-of-2 multiples.

Its the kind of stuff that's not really a std.algorithm
candidate, but still somewhat common in memory management code.
Often these and other little helpers end up in everyones
stdlib_extensions.d which I suppose don't look all that
different. Who has some of these in their code?:

- isPowerOf2, nextLargerPowerOf2
- roundUp/Down to multiple of X
- increment with wraparound at X
- clamp value (low, high or both ends)
- check if numerical value is in between two others
- compile time "iota"
- format an argument as it would appear in source code
  (i.e. add quotes around strings)

-- 
Marco

May 19 2015

"Zoadian" <no no.no> writes:

On Tuesday, 19 May 2015 at 18:04:49 UTC, Marco Leise wrote:
 Am Mon, 18 May 2015 22:16:47 -0700
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 But that has branches in it. So I came up with:
 
 bool isPowerOf2(uint x)
 {
      return (x & (x - 1) | !x) == 0;
 }
 
 which has no branches at least with dmd, see 
 http://goo.gl/TVkCwc.
 
 Any ideas for faster code?
 
 
 Thanks,
 
 Andrei

 Any faster ?! This is already minimal assembly code with
 pretty looks!

 While you are at it, you might also need "round pointer up to"
 and "round pointer down to", which can be implemented with bit
 ops for power-of-2 multiples.

 Its the kind of stuff that's not really a std.algorithm
 candidate, but still somewhat common in memory management code.
 Often these and other little helpers end up in everyones
 stdlib_extensions.d which I suppose don't look all that
 different. Who has some of these in their code?:

 - isPowerOf2, nextLargerPowerOf2
 - roundUp/Down to multiple of X
 - increment with wraparound at X
 - clamp value (low, high or both ends)
 - check if numerical value is in between two others
 - compile time "iota"
 - format an argument as it would appear in source code
   (i.e. add quotes around strings)

+1 ;)
all except the last one

May 19 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/19/15 11:05 AM, Marco Leise wrote:
 While you are at it, you might also need "round pointer up to"
 and "round pointer down to", which can be implemented with bit
 ops for power-of-2 multiples.

Yah, there are plenty of those in 
https://github.com/andralex/phobos/blob/allocator/std/experimental
allocator/common.d. 
Improvements welcome. -- Andrei

May 19 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 05/19/2015 09:56 PM, Andrei Alexandrescu wrote:
 On 5/19/15 11:05 AM, Marco Leise wrote:
 While you are at it, you might also need "round pointer up to"
 and "round pointer down to", which can be implemented with bit
 ops for power-of-2 multiples.

 Yah, there are plenty of those in
 https://github.com/andralex/phobos/blob/allocator/std/experimental/allocator/common.d.
 Improvements welcome. -- Andrei

In case the range of s is such that divideRoundUp is actually good 
enough, the following avoids the conditional:

size_t roundUpToMultipleOf(size_t s,uint base){
     auto t=s+base-1;
     return t-t%base;
}

However, both divideRoundUp and this implementation of 
roundUpToMultipleOf do not work for s in [size_t.max-base+2, size_t.max].

May 19 2015

"deadalnix" <deadalnix gmail.com> writes:

Have you tried things like :

(x >> bsr(x)) == 1 ?

I have no idea if this is faster or not, but worth trying.

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 4:01 PM, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

Hm.. I think this would always succeed. Perhaps you mean:

1 << bsr(x) == x;

And ironically, still doesn't work for 0 :D

IIRC, bsr is a binary search (even when it's an instruction), I doubt 
it's as fast as subtraction and logic-and.

-Steve

May 19 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 19 May 2015 at 20:09:23 UTC, Steven Schveighoffer 
wrote:
 On 5/19/15 4:01 PM, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

 Hm.. I think this would always succeed. Perhaps you mean:

 1 << bsr(x) == x;

Both work as long as you use a fully defined instruction, like 
tzcnt.

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 5:32 PM, deadalnix wrote:
 On Tuesday, 19 May 2015 at 20:09:23 UTC, Steven Schveighoffer wrote:
 On 5/19/15 4:01 PM, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

 Hm.. I think this would always succeed. Perhaps you mean:

 1 << bsr(x) == x;

 Both work as long as you use a fully defined instruction, like tzcnt.

Hm... I messed up, (x >> bsr(x)) is always zero. I think you meant to write:

x >> (bsr(x) - 1)

which always is 1.

Either way, it doesn't work.

-Steve

May 19 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 19 May 2015 at 22:43:46 UTC, Steven Schveighoffer 
wrote:
 On 5/19/15 5:32 PM, deadalnix wrote:
 On Tuesday, 19 May 2015 at 20:09:23 UTC, Steven Schveighoffer 
 wrote:
 On 5/19/15 4:01 PM, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

 Hm.. I think this would always succeed. Perhaps you mean:

 1 << bsr(x) == x;

 Both work as long as you use a fully defined instruction, like 
 tzcnt.

 Hm... I messed up, (x >> bsr(x)) is always zero. I think you 
 meant to write:

 x >> (bsr(x) - 1)

 which always is 1.

 Either way, it doesn't work.

 -Steve

No.

bsr(1) is 0.
1 >> bsr(1) is 1.
0 >> anything is 0.

So it doesn't matter if bsr is defined or not for 0.

May 19 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/19/15 7:41 PM, deadalnix wrote:
 On Tuesday, 19 May 2015 at 22:43:46 UTC, Steven Schveighoffer wrote:
 On 5/19/15 5:32 PM, deadalnix wrote:
 On Tuesday, 19 May 2015 at 20:09:23 UTC, Steven Schveighoffer wrote:
 On 5/19/15 4:01 PM, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

 Hm.. I think this would always succeed. Perhaps you mean:

 1 << bsr(x) == x;

 Both work as long as you use a fully defined instruction, like tzcnt.

 Hm... I messed up, (x >> bsr(x)) is always zero. I think you meant to
 write:

 x >> (bsr(x) - 1)

 which always is 1.

 Either way, it doesn't work.

 No.

 bsr(1) is 0.
 1 >> bsr(1) is 1.

Gah, I messed up, used output from old code that wasn't doing what I 
thought it was. You are right about that.

But my whole point there was that x >> bsr(x) is ALWAYS 1.

2 >> bsr(2) == 1
3 >> bsr(3) == 1
4 >> bsr(4) == 1
17 >> bsr(17) == 1

So really, your test checks to see if a value is zero or not, not 
whether it's a power of 2.

BUT, the opposite mechanism would work:

1 << bsr(x) == x;

 0 >> anything is 0.

Hah, good point :) Even if bsr(0) is undefined it doesn't matter. Didn't 
think of that.

But that means the opposite solution I mentioned above, doesn't work, 
still back to square one.

-Steve

May 21 2015

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 21 May 2015 at 13:24:57 UTC, Steven Schveighoffer 
wrote:
 Gah, I messed up, used output from old code that wasn't doing 
 what I thought it was. You are right about that.

 But my whole point there was that x >> bsr(x) is ALWAYS 1.

 2 >> bsr(2) == 1
 3 >> bsr(3) == 1
 4 >> bsr(4) == 1
 17 >> bsr(17) == 1

 So really, your test checks to see if a value is zero or not, 
 not whether it's a power of 2.

 BUT, the opposite mechanism would work:

 1 << bsr(x) == x;

Ha yes. You'd want to use TZCNT.

Alternatively, with bsf on could do:

x << bsf(x) == 1 << [32|64]

 0 >> anything is 0.

 Hah, good point :) Even if bsr(0) is undefined it doesn't 
 matter. Didn't think of that.

 But that means the opposite solution I mentioned above, doesn't 
 work, still back to square one.

 -Steve

Well no, there all needed to make it work :) Still no idea if 
this is actually faster, but there is one less operation to 
perform.

May 21 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/21/15 4:46 PM, deadalnix wrote:

 Alternatively, with bsf on could do:

 x << bsf(x) == 1 << [32|64]

Hm... bsf does work in your original code, I'm thinking you may have 
messed up bsf and bsr :)

x >> bsf(x) == 1

Which makes a LOT more sense (I tested and it works).

Don't ask me why I knew bsr vs. bsf...

-Steve

May 21 2015

"Brian Schott" <briancschott gmail.com> writes:

On Tuesday, 19 May 2015 at 20:01:29 UTC, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

https://issues.dlang.org/show_bug.cgi?id=14380

"If the content source operand is 0, the content of the 
destination operand is undefined" - This applies to both the bsf 
and bsr instructions.

May 19 2015

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 19 May 2015 at 20:41:58 UTC, Brian Schott wrote:
 On Tuesday, 19 May 2015 at 20:01:29 UTC, deadalnix wrote:
 Have you tried things like :

 (x >> bsr(x)) == 1 ?

 I have no idea if this is faster or not, but worth trying.

 https://issues.dlang.org/show_bug.cgi?id=14380

 "If the content source operand is 0, the content of the 
 destination operand is undefined" - This applies to both the 
 bsf and bsr instructions.

Why ain't we generating a tzcnt ?

May 19 2015

"Matthias Bentrup" <matthias.bentrup googlemail.com> writes:

I think you can make the over/underflow at zero work in your 
favor:

bool isPowerOf2(uint x)
{
   return (x & -x) > (x - 1);
}

May 19 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Tuesday, 19 May 2015 at 20:46:09 UTC, Matthias Bentrup wrote:
 I think you can make the over/underflow at zero work in your 
 favor:

 bool isPowerOf2(uint x)
 {
   return (x & -x) > (x - 1);
 }

Very nice

May 20 2015

"Temtaime" <temtaime gmail.com> writes:

First version isn't any slow. It's clear and can be optimized 
with gdc:

http://goo.gl/Q7HKcU

And if you matter about dmd - it generates shit all the time.

May 20 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 20 May 2015 at 09:49:06 UTC, Temtaime wrote:
 First version isn't any slow. It's clear and can be optimized 
 with gdc:

 http://goo.gl/Q7HKcU

Yes, and besides, if one cares about these minor performance 
issues, that most likely will disappear in the pipeline if you 
are a little bit careful about how you arrange your code, then 
one one would be better off letting the compiler take advantage 
of statically available information about size:

 always_inline
... allocate(ulong size){
    if(size==0) return _my_alloc_zero();
    if(is_log2_or_zero(size)) return _my_alloc_log2size(size);
    return _my_alloc_nonlog2size(size);
  }

So basically a version that preserves the tests that the compiler 
most easily can get rid of can lead to faster code even if it in 
isolation has a few extra instructions.

May 20 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/19/15 1:46 PM, Matthias Bentrup wrote:
 I think you can make the over/underflow at zero work in your favor:

 bool isPowerOf2(uint x)
 {
    return (x & -x) > (x - 1);
 }

Nice code with dmd and gdc. Thanks! 
https://github.com/andralex/phobos/commit/ec197ecd203b0ea25201
cfeb4fbbb13b2fabb7f 
-- Andrei

May 20 2015

"Dominikus Dittes Scherkl" writes:

On Tuesday, 19 May 2015 at 20:46:09 UTC, Matthias Bentrup wrote:
 I think you can make the over/underflow at zero work in your 
 favor:

 bool isPowerOf2(uint x)
 {
   return (x & -x) > (x - 1);
 }

The cool thing is that also the over/underflow of "-" at 
0x8000_0000 works. But that is really a weird calculation! 
Wouldn't pass any Polyspace or other code checker tool and need 
some special comments on why it works...

May 21 2015

"Jay Norwood" <jayn prismnet.com> writes:

This formula measures  a little faster on dmd.  Release build, 
three tests, find all values for 0..uint.max.

first result uses
if (((x-1)&(x|0x80000000))==0)

second result uses
if ((x & (x - 1) | !x) == 0)


D:\pow2\pow2\pow2\Release>pow2
duration(msec)=10259
duration(msec)=10689

D:\pow2\pow2\pow2\Release>pow2
duration(msec)=10256
duration(msec)=10695

D:\pow2\pow2\pow2\Release>pow2
duration(msec)=10264
duration(msec)=10726

May 21 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Friday, 22 May 2015 at 05:24:15 UTC, Jay Norwood wrote:

 first result uses
 if (((x-1)&(x|0x80000000))==0)

00F81005  mov         eax,edx
00F81007  lea         ecx,[edx-1]
00F8100A  or          eax,80000000h
00F8100F  test        ecx,eax

Above is what a Microsoft C++ compiler does with the first 
expression.  It gets inserted in-line in the test.

May 22 2015

D Programming

C/C++ Programming

Other

digitalmars.D - 0 is not a power of 2