digitalmars.D - Fast 2D matrix of bits
- bearophile (5/5) Sep 19 2011 In Phobos (beside fast 2D/3D/4D vectors) I'd like a fast 2D matrix of bi...
- Timon Gehr (13/22) Sep 19 2011 The implementation uses int and uint everywhere. I think it is not 64
- bearophile (11/20) Sep 19 2011 For me a huge unsigned value that I can't catch in the ctor precondition...
- Josh Simmons (20/40) Sep 19 2011 is worse than a negative value that I am able to catch. I don't need to ...
- bearophile (4/16) Sep 20 2011 My version with bsr is faster.
- Denis Shelomovskij (35/51) Sep 20 2011 Is it? Every conditional branch can break CPU command queue and slow
- bearophile (7/10) Sep 22 2011 Hours ago Don has written a comment in one of my enhancement requests th...
- Josh Simmons (5/8) Sep 20 2011 Is that science or guessing?
- Josh Simmons (7/18) Sep 20 2011 Ah, when the one I gave was slower it wasn't being unrolled by gcc,
- Josh Simmons (2/3) Sep 19 2011 Which one exactly? Sometimes I wonder...
- Paul D. Anderson (3/7) Sep 20 2011 There is another problem with bitmanip and 64-bit integers. See Bug #668...
In Phobos (beside fast 2D/3D/4D vectors) I'd like a fast 2D matrix of bits. I have added an implementation in attach in this enhancement request: http://d.puremagic.com/issues/show_bug.cgi?id=6697 It's a common enough data structure, and despite being simple it's not obvious, especially if you want it to be fast. A naive implementation that uses a BitArray is not fast enough. Bye, bearophile
Sep 19 2011
On 09/20/2011 12:19 AM, bearophile wrote:In Phobos (beside fast 2D/3D/4D vectors) I'd like a fast 2D matrix of bits. I have added an implementation in attach in this enhancement request: http://d.puremagic.com/issues/show_bug.cgi?id=6697 It's a common enough data structure, and despite being simple it's not obvious, especially if you want it to be fast. A naive implementation that uses a BitArray is not fast enough. Bye, bearophileThe implementation uses int and uint everywhere. I think it is not 64 bit aware?// sizex_, sizey_ are signed to catch negative arguments bugs.There would be no negative arguments if they were unsigned.// opIndex not used because it's slower on LDCWhy is it slower? The structure should imho certainly use opIndex(Assign). Since you want it to be really fast, this occurs multiple times:immutable int index = x + (y << this.pow2); // 1D bit index this._bits[index / nbits_item] |= 1 << (index % nbits_item);nbits_item is a power of two. So the divisions/modulo have the potential to be really fast. But the signedness of index makes some straightforward optimizations very hard to carry out for the compiler. (it will probably use bit shifts and logical and, but it has to emit something more to cope with (im)possible negative values) Try making index unsigned in those cases, it'll probably save some machine instructions. (size_t is always your best bet for indexes)
Sep 19 2011
Timon Gehr:The implementation uses int and uint everywhere. I think it is not 64 bit aware?I have not even tried it on a 64 bit system.> // sizex_, sizey_ are signed to catch negative arguments bugs. There would be no negative arguments if they were unsigned.For me a huge unsigned value that I can't catch in the ctor precondition is worse than a negative value that I am able to catch. I don't need to create a 2D matrix with sides bigger than int.max. I agree that all the other methods are probably better with size_t arguments. I'll fix it.Why is it slower?Ask it to LDC1 developers :-) Adding that method is easy, if you want it. And I agree it's handy, with syntax mat[x,y] = true/false. But note the preferred methods to use this matrix are set/reset, because they are faster than the assign. I'll think about it.But the signedness of index makes some straightforward optimizations very hard to carry out for the compiler. [...] but it has to emit something more to cope with (im)possible negative values)Right, index needs to be uint or size_t. I'll fix it. Thank you for your suggestions and notes, bye, bearophile
Sep 19 2011
On Tue, Sep 20, 2011 at 10:24 AM, bearophile <bearophileHUGS lycos.com> wro= te:Timon Gehr:is worse than a negative value that I am able to catch. I don't need to cre= ate a 2D matrix with sides bigger than int.max.The implementation uses int and uint everywhere. I think it is not 64 bit aware?I have not even tried it on a 64 bit system.=C2=A0> // sizex_, sizey_ are signed to catch negative arguments bugs. There would be no negative arguments if they were unsigned.For me a huge unsigned value that I can't catch in the ctor precondition =I agree that all the other methods are probably better with size_t argume=nts. I'll fix it.syntax mat[x,y] =3D true/false.Why is it slower?Ask it to LDC1 developers :-) Adding that method is easy, if you want it. And I agree it's handy, with =But note the preferred methods to use this matrix are set/reset, because =they are faster than the assign. I'll think about it.ve values)But the signedness of index makes some straightforward optimizations very hard to carry out for the compiler. [...] but it has to emit something more to cope with (im)possible negati=Right, index needs to be uint or size_t. I'll fix it. Thank you for your suggestions and notes, bye, bearophileZero is a power of two too :( You could also just... uint32_t next_pow_2(uint32_t n) { n -=3D 1; n |=3D n >> 1; n |=3D n >> 2; n |=3D n >> 4; n |=3D n >> 8; n |=3D n >> 16; return n + 1; }
Sep 19 2011
Josh Simmons:You could also just... uint32_t next_pow_2(uint32_t n) { n -= 1; n |= n >> 1; n |= n >> 2; n |= n >> 4; n |= n >> 8; n |= n >> 16; return n + 1; }My version with bsr is faster. Bye, bearophile
Sep 20 2011
20.09.2011 10:22, bearophile пишет:Josh Simmons:Is it? Every conditional branch can break CPU command queue and slow down execution. Anyway, I saw same problem not far-away (2^n sized OpenGL textures). IMHO (too lazy to make tests) it is the fastest: --- bool isPow2(int n) in { assert(n > 0); } body { return !((n - 1) & n); } unittest { assert(!isPow2(0)); assert(isPow2(1)); assert(isPow2(2)); assert(!isPow2(3)); assert(isPow2(4)); assert(!isPow2(5)); } int toPow2(int n) in { assert(n > 0); } body { return 1 << (bsr(n) + !isPow2(n)); } unittest { assert(toPow2(1) == 1); assert(toPow2(2) == 2); assert(toPow2(3) == 4); assert(toPow2(4) == 4); assert(toPow2(5) == 8); } --- A short version (AFAIK, !! is one lexem for dmd): --- int toPow2(int n) { return 1 << (bsr(n) + !!((n - 1) & n)); } ---You could also just... uint32_t next_pow_2(uint32_t n) { n -= 1; n |= n>> 1; n |= n>> 2; n |= n>> 4; n |= n>> 8; n |= n>> 16; return n + 1; }My version with bsr is faster. Bye, bearophile
Sep 20 2011
Denis Shelomovskij:Is it? Every conditional branch can break CPU command queue and slow down execution. Anyway, I saw same problem not far-away (2^n sized OpenGL textures).Hours ago Don has written a comment in one of my enhancement requests that is related to this topic: http://d.puremagic.com/issues/show_bug.cgi?id=4046 I will do better benchmarks, and if necessary I'll update that helper function in the bit matrix code in Bugzilla (I have already fixed most of the other suggestions given by Timon Gehr). If Don is right, this asks for some deprecation in core.bitops. Bye, bearophile
Sep 22 2011
On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileHUGS lycos.com> wrote:My version with bsr is faster. Bye, bearophileIs that science or guessing? My horribly unscientific test shows the opposite to be true, I'm looking over the assembly output to see if there's an extraneous factor.
Sep 20 2011
On Tue, Sep 20, 2011 at 6:36 PM, Josh Simmons <simmons.44 gmail.com> wrote:On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileHUGS lycos.com> wrote:Ah, when the one I gave was slower it wasn't being unrolled by gcc, when yours was slower my trivial loop was being vectorised. When they're both handled the same the results are the same not favoring either. Cool. I do like that the propagate-right version can be vectorised though.My version with bsr is faster. Bye, bearophileIs that science or guessing? My horribly unscientific test shows the opposite to be true, I'm looking over the assembly output to see if there's an extraneous factor.
Sep 20 2011
On Tue, Sep 20, 2011 at 12:03 PM, Josh Simmons <simmons.44 gmail.com> wrote:Zero is a power of two too :(Which one exactly? Sometimes I wonder...
Sep 19 2011
Timon Gehr Wrote:The implementation uses int and uint everywhere. I think it is not 64 bit aware?Paul
Sep 20 2011