digitalmars.D - Bits and Bobs and Bools
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (189/223) Feb 10 2005 Back to our good old D friend, the "bit" data type...
- Walter (3/5) Feb 10 2005 Please do, and thanks. It's a good article.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (25/28) Feb 10 2005 Thank you, it's now at:
- Matthew (5/33) Feb 10 2005 Deaf ears prepare:
- Kris (3/50) Feb 10 2005 Hear, Hear! For that deaf ear!
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/23) Feb 10 2005 I think we had that conversion already :-)
- Derek (18/29) Feb 10 2005 While this might work for bits, it doesn't for booleans. An array of
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/14) Feb 11 2005 So use wbool instead ? Individual bit pointers just don't exist.
Back to our good old D friend, the "bit" data type... (with the two constant bits: true == 1 and false == 0) While D will not ever have a boolean type like e.g. Java, it doesn't hurt if the alternative "bit" is documented ? Here is my attempt... Feel free to comment or extend. The first thing that is good to know is that bit is a "unsigned" data type. That is, if you convert true to an integer, then you get a 1 and not a -1. (having -1 as true might sound a bit silly, but has its merits since that means all bits are set when converting to a larger int type. This convention is in use elsewhere) The only "universal" convention is that false is 0, and this is also the .init value for the bit type. (in boolean contexts, "null" has a value of 0 too) The second that is good to know is that single bits are stored in bytes, and in the least significant bit. That is, "true" is equal to 0x01 when read as a byte. There are a bunch of things about bits that are "implementation defined", and that's not all that good for a language that defines e.g. int sizes ? It would be much simpler if these were defined ? This would also allow external languages, such as C, to access any bit variables in D structures... http://www.digitalmars.com/d/interface.html:Data Type Compatibility D type C type bit no equivalentWhich is not true, it should be "signed char" in C/C++ ? (but C "bool" can't be used, since it might be 4 bytes) IMPLEMENTATION DETAIL: (D) assert(typeid(typeof(true)) == typeid(bit)); assert(typeid(typeof(false)) == typeid(bit)); assert(bit.sizeof == 1); assert(true == 1); assert(false == 0); Then, bit arrays. For efficiency reasons, these are stored in 32-bit (register sized) chunks, rounded up. Since D mandates a 32-bit minimum CPU, might as well make this implementation detail fixed in the D spec ? This means that bit[1].sizeof == 4, same as bit[32], while as bit[256] occupies 32 bytes - for the data (as usual there is some overhead for length and ptr) Dividing by 32 is the same as shifting right by 5. IMPLEMENTATION DETAIL: (C) unsigned int *data; void alloc(unsigned int bitdim) { unsigned int allocdim = (bitdim + 31) / 32; // rounding up data = (unsigned int *) calloc(allocdim, sizeof(unsigned int)); } void set(unsigned int bitnum) { data[bitnum >> 5] |= 1L << (bitnum & 31); } void clear(unsigned int bitnum) { data[bitnum >> 5] &= ~(1L << (bitnum & 31)); } int test(unsigned int bitnum) { return data[bitnum >> 5] & (1L << (bitnum & 31)); } Within the chunks, the bytes are stored in little-endian byte order but still using normal bit direction (LSB first) within each byte. That is, the first bit encountered is the least significant, and this differs from e.g. how the bit integer literals work...void main() { printf("%02x\n", 0b10000000); printf("%02x\n", 0b01000000); printf("%02x\n", 0b00100000); printf("%02x\n", 0b00010000); printf("%02x\n", 0b00001000); printf("%02x\n", 0b00000100); printf("%02x\n", 0b00000010); printf("%02x\n", 0b00000001); }80 40 20 10 08 04 02 01 That's the reverse bit order, compared to what bit[] uses: (note that the byte ordering is the same on all platforms, just interpreted differently when loaded as 32-bit integers, This byte order needs to be finalized, and written in spec)void main() { bit[32] ba; for (int i = 0; i < 32; i++) { ba[] = false; ba[i] = true; printf("%08lx\n", *(cast(uint*) ba.ptr)); } }version (LittleEndian) // e.g. X86 { 00000001 00000002 00000004 00000008 00000010 00000020 00000040 00000080 00000100 00000200 00000400 00000800 00001000 00002000 00004000 00008000 00010000 00020000 00040000 00080000 00100000 00200000 00400000 00800000 01000000 02000000 04000000 08000000 10000000 20000000 40000000 80000000 } version (BigEndian) // e.g. PPC { 01000000 02000000 04000000 08000000 10000000 20000000 40000000 80000000 00010000 00020000 00040000 00080000 00100000 00200000 00400000 00800000 00000100 00000200 00000400 00000800 00001000 00002000 00004000 00008000 00000001 00000002 00000004 00000008 00000010 00000020 00000040 00000080 } Bit pointers (bit*) are allowed, but can only address the first bit of each byte, directly. This means that you can *not* take the address of individual bits in an array, just to the start of the bit[] array itself... (this can easily be accessed by using the .ptr property, or take the address of an individual bit variable : &b) However, you can then use this bit pointer with the regular index, as in bp[4], to set individual bits... (just as you could do with a bit[] bit array ,as well) But incrementing is a little funny, it will be divided by eight in order to fit the next whole byte address.bit[32] ba; bit* bp = ba.ptr; printf("%x\n", bp); bp += 4; printf("%x\n", bp); bp += 4; printf("%x\n", bp); bp += 8; printf("%x\n", bp);// an example, actual pointer values might differ bffffaa8 bffffaa8 bffffaa8 bffffaa9 For similar reasons, slices of bit arrays are only allowed when the lower bound starts on an even byte (that is: bit number is a multiple of 8, "% 8 == 0") Misaligned bit array slices throws ArrayBoundsError, or silently does nothing for the -release builds. Appending to bit arrays might not be implemented. bool: (bit) Having bit as the default boolean type in D now means that there can be only *one* value for true: 1 (one), all other non-zero values are converted when casting. wbool: (byte) While the shift and bit operators involved in getting and setting individual bits are pretty fast, it might be yet faster to just use byte arrays instead of bit arrays. dbool: (int) And since individual bytes might need masking to fit into a register, it might be faster still to use the int type to eg. return bits from functions. IMPLEMENTATION DETAIL: (D) alias bit bool; // boolean (true/false) alias byte wbool; // wide boolean (like wchar) alias int dbool; // double boolean (like dchar) byte c = cast(byte) 256; // c is now equal to "0" bit b = cast(bit) 2; // b is now equal to "true" int i = cast(int) b; // i is now equal to "1" I will probably stick this whole long post on Wiki4D. (under the GNU Free Documentation License, that is) --anders
Feb 10 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:cufvfa$2aet$1 digitaldaemon.com...I will probably stick this whole long post on Wiki4D. (under the GNU Free Documentation License, that is)Please do, and thanks. It's a good article.
Feb 10 2005
Walter wrote:Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --andersI will probably stick this whole long post on Wiki4D.Please do, and thanks. It's a good article.
Feb 10 2005
Deaf ears prepare: bit is a useless wart and should be removed. There, I said it. Now ignore me. "Anders F Björklund" <afb algonet.se> wrote in message news:cugafi$2mrr$1 digitaldaemon.com...Walter wrote:Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --andersI will probably stick this whole long post on Wiki4D.Please do, and thanks. It's a good article.
Feb 10 2005
Hear, Hear! For that deaf ear! :-) In article <cugdv5$2r8g$2 digitaldaemon.com>, Matthew says...Deaf ears prepare: bit is a useless wart and should be removed. There, I said it. Now ignore me. "Anders F Björklund" <afb algonet.se> wrote in message news:cugafi$2mrr$1 digitaldaemon.com...Walter wrote:Thank you, it's now at: http://www.prowiki.org/wiki4d/wiki.cgi?BitsAndBools Here was some of the questions raised: - Should the byte order of bit[] be "fixed", like it seems to be now ? (which could be just an accident...) Or should the uint's use native byte endianness instead ? - Taking the address of a bit in an array seems to a) not complain and b) give a bogus address ? (that is: bit[] ba; bit *bp = &ba[10];) My suggestion is that it should return: ba + (10 / 8) == ba + 1 (rounded to byte) ? Similar to using : bp = ba; bp += 10; - wbool and dbool could be added, as "official" aliases perhaps ? They could all go in std.stdbool... wbool[8191] flags; // from sieve.d dbool opEquals(Object o); // from object.d I've also taken back my previous "boolean" suggestions... The bit type is it for D, with the bool alias for shortness. This probably also nixes the "isnot" keyword, as in: assert(object isnot null); which becomes assert(object); After all, D is about performance and favorite C terseness... If you want cuddly syntax and slowness, you know where Java is. --andersI will probably stick this whole long post on Wiki4D.Please do, and thanks. It's a good article.
Feb 10 2005
Matthew wrote:Deaf ears prepare: bit is a useless wart and should be removed. There, I said it. Now ignore me.I think we had that conversion already :-) I would like a boolean type too, but I don't consider it important enough to drop D for. And it's pretty obvious that Walter won't change this lemon, so I made some limonade... Just consider it the _Bool of D, right ? --anders PS. However, this statement is nonsense and should go: http://www.digitalmars.com/d/overview.html:Bit type The fundamental data type is the bit, and D has a bit data type. This is most useful in creating arrays of bits: bit[] foo;Bit is *not* the fundamental data type, but byte is... (these days, one could even consider "int" fundamental)
Feb 10 2005
On Thu, 10 Feb 2005 16:43:38 +0100, Anders F Björklund wrote: [snip]Then, bit arrays. For efficiency reasons, these are stored in 32-bit (register sized) chunks, rounded up. Since D mandates a 32-bit minimum CPU, might as well make this implementation detail fixed in the D spec ? This means that bit[1].sizeof == 4, same as bit[32], while as bit[256] occupies 32 bytes - for the data (as usual there is some overhead for length and ptr) Dividing by 32 is the same as shifting right by 5.While this might work for bits, it doesn't for booleans. An array of individually addressable boolean values is a useful construct. bool[6] Flags; Flags[2] = true; Flags[4] = False; if (Flags[3] || Flags[5]) . . . foobar( Flags[1] ); void foobar(inout bool aSwitch) { . . . } etc... -- Derek Melbourne, Australia
Feb 10 2005
Derek wrote:So use wbool instead ? Individual bit pointers just don't exist. alias byte wbool; (well, I do know that Stewart coded an implementation, but IRL... http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/495) There is no spoon. --andersThis means that bit[1].sizeof == 4, same as bit[32], while as bit[256] occupies 32 bytes - for the data (as usual there is some overhead for length and ptr) Dividing by 32 is the same as shifting right by 5.While this might work for bits, it doesn't for booleans. An array of individually addressable boolean values is a useful construct.
Feb 11 2005