digitalmars.D - Back to basics: integer types
- Lionello Lunesu (25/25) Nov 23 2004 Hi..
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (20/40) Nov 23 2004 The supported type name "spelling" is: int8_t, int16_t, int32_t...
- Georg Wrede (67/86) Nov 23 2004 I agree.
- Sean Kelly (11/24) Nov 23 2004 Technically, since a byte is defined as "the smallest addressable unit" ...
- Russ Lewis (5/10) Nov 23 2004 Well, to quibble with your quibble: on most 32bit machines, 8bit values
- Sean Kelly (3/13) Nov 23 2004 Understood. I was actually referring to Georg's imaginary 32 bit machin...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (35/60) Nov 23 2004 I don't think that it wasn't thought of being the issue...
- Lionello Lunesu (25/35) Nov 24 2004 really _the_ reason for a bit[8] (fixed-sized array) to be allocating
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (34/90) Nov 24 2004 It is faster since you can load/store 32 bits at one time, when setting
- Lionello Lunesu (19/33) Nov 25 2004 Yeah, but if the array-size is fixed, as is the case with bit[8]? Why wo...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (19/30) Nov 25 2004 Well, that "sort of" worked for the bool alias ?
- Georg Wrede (17/23) Nov 25 2004 This "hypothetical" 32 bit machine I mentioned earlier, I was actually
Hi.. Another thought I had... I like the type size-suffix for fixed-sized integers (int8, int16, int32, etc) and this got me thinking about the most basic of types: bit. Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] the same as the current int16, etc. ? I probably need more arguments than "it's nice" to get this one through, but I'll think about it some more while you guys fire back at me. And, am I correct in understanding that subsequent bits in a struct will always be packed? struct something { bit first; bit second; }; I.e. what's the size of this struct? If they will be packed, then I've got a (new?) argument for "bool": sizeof(bool) being 1 (mostly for compatibility with old code/structs as I don't think there's any speed gain. Well, passing "bit second" from the above struct to a method taking bool/bit would need a shift operation not needed if "second" was a real bool). -- Lio. -- Get the CACert root certificate (and a personal one) at http://cacert.org/
Nov 23 2004
Lionello Lunesu wrote:I like the type size-suffix for fixed-sized integers (int8, int16, int32, etc) and this got me thinking about the most basic of types: bit.The supported type name "spelling" is: int8_t, int16_t, int32_t... See the D module: import std.stdint;Wouldn't it be nice if "bit[8]" behaved the same as byte / int8, and bit[16] the same as the current int16, etc. ? I probably need more arguments than "it's nice" to get this one through, but I'll think about it some more while you guys fire back at me.Had it been optimized for storage, it would have worked like you wroteAnd, am I correct in understanding that subsequent bits in a struct will always be packed?No. You could have tested it, too ?struct something { bit first; bit second; }; I.e. what's the size of this struct?2 bytes.If they will be packed, then I've got a (new?) argument for "bool": sizeof(bool) being 1 (mostly for compatibility with old code/structs as I don't think there's any speed gain. Well, passing "bit second" from the above struct to a method taking bool/bit would need a shift operation not needed if "second" was a real bool).There is no "real" boolean type in D. (Like one that only takes true/false?) sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask. The different implementations are fondly known as bit, wbit and dbit Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block. Personally, I'm not sure that bit *is* the fundamental data type... Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ? --anders PS. Bit is not an integer data type in D. It's a pseudo-bool data type. See my previous rants and the endless discussions on this newsgroup.
Nov 23 2004
Anders F Björklund wrote:Lionello Lunesu wrote: There is no "real" boolean type in D. (Like one that only takes true/false?) sizeof(bool) differs between 0.125, 1 or 4 depending on who you ask. The different implementations are fondly known as bit, wbit and dbit Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block. Personally, I'm not sure that bit *is* the fundamental data type... Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ?..PS. Bit is not an integer data type in D. It's a pseudo-bool data type. See my previous rants and the endless discussions on this newsgroup.I agree. While bit may be the fundamental unit of information, it is _not_, I repeat, _not_ the fundamental data type in computers. Contrary to public belief (IT professionals and laymen alike), these are not the same thing. Computer hardware manufacturers have known this for more than 50 years. If they hadn't, then the memory adress space would be a single bit string. Corollary (slightly off the issue) This would lead to two things: (a) memory pointers could point to any bit, and (b) the different cpu-sizes (today 16, 32 and 64 bits) could actually be _any_ number of bits. Thus, we could have a 17 bit CPU or a 67 bit cpu. Probably these computers would fetch several bits at once, say, those 17 bits on a "17-bit" CPU. But not necessarily. Opcodes could be of varying size without wasting memory. The CPU would know where (at which bit-adress) to fetch the next instruction. Opcodes (and entire applications) would exist to process bit- streams, a la audio CD reading. (Ever seen '1 bit processor' written on a CD player?) Most probably different parts of a CPU would be capable of handling different size "words", depending on the usage. For example, US-ASCII operations hardware would be on 7 bit circuits, and graphics operations hardware 24 bits. Oh, and an amazing thing with the memory: arbitrary size "DMA like" data transfers (say between main memory and graphics memory) would take only one clock cycle for the _entire_ transfer !!!! <diabolical laughter> This I leave as an excercise to the reader to figure out! </> Yes, very nice. But practical issues in hardware design and implementation have lead us to use a fixed word size and memory adressing by the byte. (Actually, the word size usually determines the fetched number of bits, so on some hardware architectures the _actual_ granularity of adressing is not the same as the _appearing_ granularity. E.g. we may have a CPU that is only capable of adressing memory "as 32 bit entities", but then, using various kludges, it pretends to be able to do it by the byte (=8bits). And this only because systems programmers' minds are stuck to byte as the "only" size.) Back to the issue at hand It is regrettable that this was not thought of in the early days of D. As a result we have this Bit data type that lingers haunting the entire D community. Would I become "the God in D world", on the First Day, I would throw away the Bit data type. On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.) Then I would seriously consider forbidding int casts to/from Boolean. You'd have to write myBoolean = ( myInt != 0 ); myInt2 = myBoolean ? 1 : 0; I would not disapprove of arrays of Booleans. I would not even disallow union wizardry with int on top of packed Boolean, but I would seriously discourage it. Of course the compiler writer would be free to use whatever intermediate values in Boolean calculations etc., but the programmer should percieve Booleans as distinct from 1 and 0. And of course, programmers would still be free to do bitwise logical ops between ints. But that is _really_ unrelated to this Boolean issue. Isn't it!
Nov 23 2004
In article <cnvusn$1hh3$1 digitaldaemon.com>, Georg Wrede says...(Actually, the word size usually determines the fetched number of bits, so on some hardware architectures the _actual_ granularity of adressing is not the same as the _appearing_ granularity. E.g. we may have a CPU that is only capable of adressing memory "as 32 bit entities", but then, using various kludges, it pretends to be able to do it by the byte (=8bits). And this only because systems programmers' minds are stuck to byte as the "only" size.)Technically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits. But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.)..which would make the boolean type the only D data type with an indeterminate size. I would almost rather require that such a type always be 8 or 32 bits, unless all size restrictions were eliminated (and I don't expect that to happen). Sean
Nov 23 2004
Sean Kelly wrote:Technically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits. But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable. Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.
Nov 23 2004
In article <co03i7$1nmp$1 digitaldaemon.com>, Russ Lewis says...Sean Kelly wrote:Understood. I was actually referring to Georg's imaginary 32 bit machine. SeanTechnically, since a byte is defined as "the smallest addressable unit" in C/C++, I would expect a byte on such machines to be 32 bits. But then I suppose we've gotten so used to 8 bit bytes that perhaps the kludges would be expected? In any case, D imposes size constraints on all types, so in our world a byte would be 8 bits.Well, to quibble with your quibble: on most 32bit machines, 8bit values are individually addressable. Even on architectures that require aligned reads & writes, they often use byte-addressing, and byte-level primitives are sometimes available.
Nov 23 2004
Georg Wrede wrote:Back to the issue at handA somewhat dead and beaten issue, but I'll bite...It is regrettable that this was not thought of in the early days of D. As a result we have this Bit data type that lingers haunting the entire D community.I don't think that it wasn't thought of being the issue... Maybe bit array slices got a tad harder to implement, but.Would I become "the God in D world", on the First Day, I would throw away the Bit data type.The current Deity being the proud bit creator, of course :-)On the second day I would create a Boolean data type, and write in the specification that its size would be architecture dependent, and suggest the fastest integral size would be used. (Eg, probably 32 bits on Pentiums.)Going with the main "fixed-size" ideom of D, one could probably fix it as, say: 1 byte, too - without crying. Either the way, the main issue being there would be a boolean type that only accepts true and false values.Then I would seriously consider forbidding int casts to/from Boolean. You'd have to write myBoolean = ( myInt != 0 ); myInt2 = myBoolean ? 1 : 0;Just as in Java... (the language, not the VM or OS) Also required: myBoolean = ( myPointer != null ); Note: some people *hate* these "extra" stepsI would not disapprove of arrays of Booleans. I would not even disallow union wizardry with int on top of packed Boolean, but I would seriously discourage it.I'm assuming you mean packed bit arrays. Java does those with an Object class, but language support isn't all bad. If it hadn't been for the pain to get the pointers and slices implemented, that is. Probably not worth it, IMHO.Of course the compiler writer would be free to use whatever intermediate values in Boolean calculations etc., but the programmer should percieve Booleans as distinct from 1 and 0.Heck, I'd settle for true and false being distinct from 1 and 0. (currently the are hard-coded as bit literals in the D compiler) But the fact remains that D chose to side with C and C++ on logic. (or actually more with C, since there is no "bool" primitive type) And since zero is false and non-zero is true, who's really boolean?And of course, programmers would still be free to do bitwise logical ops between ints. But that is _really_ unrelated to this Boolean issue. Isn't it!It is. A related issue, however, is what type that conditionals etc expect. Had there been a boolean type, it's natural to make them require it. IMHO, Java got this right. (again, the language, not the religion...) See http://java.sun.com/docs/books/jls/ for that language's details. Anyway, D is a whole other language and has another such specification. And in *that* spec, there's three string types and three boolean types. string types: char[], wchar[], dchar[] boolean types: bit, wbit (byte), dbit (int) You get to pick one of each, to optimize for the task at hand. It's not a fatal flaw... More of a "missed opportunity", really? "alias bit bool; alias char[] string; goto on_with_our_lives;" See also http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit --anders
Nov 23 2004
Hi..From what I recall, accessing a byte is just as fast as acessing a dword (on=386). Accessing a word (short, 16-bit) will result in a penalty. Is speedreally _the_ reason for a bit[8] (fixed-sized array) to be allocating 32-bits (independent of the data that follows it) ? For bit[] (dynamically sized) I understand it. It will definately be faster, because accessing a dword is not slower and you do less growing.No. You could have tested it, too ?Yes yes, I should have tested it. But I'm one of those "patient lurkers": reading this newsgroup while VC++ is busy building. Anyway, I did not want to end-up discussing differences between dmd and gdc, and they might differ in this respect, so I though I'd keep it theoretical. (Also that I could have tested :-/ )There is no "real" boolean type in D. (Like one that only takes true/false?)Oh no, I've started the discussion again? I really didn't mean to.Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block.Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.Personally, I'm not sure that bit *is* the fundamental data type...Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ?Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though. I also do think that 3D rendering based on triangles is just a efficient hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get rid of them in the future (NURBS and other parameter-based geometries are gaining ground again). Lionello.
Nov 24 2004
Lionello Lunesu wrote:It is faster since you can load/store 32 bits at one time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes. Like this C code: (where "/ 32" converts to ">> 5")From what I recall, accessing a byte is just as fast as acessing a dword (on >=386). Accessing a word (short, 16-bit) will result in a penalty. Is speed really _the_ reason for a bit[8] (fixed-sized array) to be allocating 32-bits (independent of the data that follows it) ? For bit[] (dynamically sized) I understand it. It will definately be faster, because accessing a dword is not slower and you do less growing.unsigned *data;[...]void Bits::set(unsigned bitnum) { data[bitnum / 32] |= 1 << (bitnum & 31); } void Bits::clear(unsigned bitnum) { data[bitnum / 32] &= ~(1 << (bitnum & 31)); } int Bits::test(unsigned bitnum) { return data[bitnum / 32] & (1 << (bitnum & 31)); }bit[] is similar, just has a count parameter and then a pointer to data. You would have to ask Walter what the *real* reason is, though ? http://www.digitalmars.com/d/arrays.html just says:Bit vectors can be constructed: bit[10] x; // array of 10 bits The amount of storage used up is implementation dependent. Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit size. x.length // 10, number of bits x.size // 4, bytes of storage So, the size per element is not (x.size / x.length).It's similar on PowerPC CPUs, by the way. At least with gdc today.Oh no, I've started the discussion again? I really didn't mean to.It goes back under the rug then... "Never mind the little bump" :-)bit* is valid too. It's just that it doesn't work properly with arrays? But you can take the address of a bit field, and use it as "inout" too. (the dirty secret of course being that single bits are stored as bytes)Otherwise you need sub-byte pointers to access &something.second, and that's a huge pain to implement and to pass around the block.Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.No argument there. But the difference between theory and practice is a lot greater in practice than in theory... ;-)Personally, I'm not sure that bit *is* the fundamental data type...Theoretically it still is. I mean, it's the basic granularity, there's nothing smaller than bit. The quarks equivalent in information technology.Of course not. Pixels are still the end result... Just that when drawing fast 3D animations, one does it a tri/quad at a time in hardware - instead of submitting each pixel and rendering in software. Similarly we can access bits using bytes or ints - plus the various bit operators like: and, or, shifts and have the hardware accumulate the bits in memory ?Similar to how the triangle is the fundamental type in 3D graphics, I think the byte has become the fundamental data type. For speed ?Ah, practically yes, but triangles are not theoretically the fundamental type of 3D geometry. Recall ray-tracers, they hardly ever used triangles. But in practice (especially in OGL/D3D-type rendering) triangles are the basis. This shouldn't prevent anybody from discussing 'the basics' though.I also do think that 3D rendering based on triangles is just a efficient hack, a modern approximation to ray-tracing. Wouldn't be surprised if we get rid of them in the future (NURBS and other parameter-based geometries are gaining ground again).There is some cool stuff possible with "Cg", http://developer.nvidia.com/page/cg_main.html (C for Graphics) Apple uses it in CoreImage : http://www.apple.com/macosx/tiger/core.html No triangles there, just plain old pixels... As a side note they are now using float pixels (i.e. RGB is now float values, and not integers) This works better, when doing several filters. But that's not really about "integer types" is it, --anders
Nov 24 2004
Hi.It is faster since you can load/store 32 bits at one time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes.Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would you even want to fetch 32-bits, if you know you only need 8 and fetching a byte is just as fast?You would have to ask Walter what the *real* reason is, though ?Actually, I was hoping he'd provide an answer to my question :-S (Q: why does bit[8] allocate 32 bits?)It's amazing how the bool/string-issue pops up every other week. Walter would just have to hard-code the aliases, just to get it over with :-)Oh no, I've started the discussion again? I really didn't mean to.It goes back under the rug then... "Never mind the little bump" :-)You mean stored as ints? If not, that would be even stranger: bit allocating a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in general allowed? I will test it myself :-)Yes, agreed. That's one point where bool and bit would differ: bool* being valid, but bit* not.bit* is valid too. It's just that it doesn't work properly with arrays? But you can take the address of a bit field, and use it as "inout" too. (the dirty secret of course being that single bits are stored as bytes)But the difference between theory and practice is a lot greater in practice than in theory... ;-)How true. When designing anything, the "theory" should be checked once in a while though.There is some cool stuff possible with "Cg",Hmm, "D for Graphics" comes to mind. Actually, considering their syntax, they might as well rename it :-)But that's not really about "integer types" is it,Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-) Don't reply to this, it's a joke.. Lionello.
Nov 25 2004
Lionello Lunesu wrote:Well, that "sort of" worked for the bool alias ? (only problem being keywords vs. literals vs. types, http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11757) The main problem with "alias char[] string;" is the existance of wchar[]... I suggested "ustring": http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11821 It kinda fits in with the whole int/uint theme of D, even if "u" here would mean unicode and not unsigned. (both types are Unicode, but char[] is ascii-biased) Or maybe "str" and "ustr" would be more orthogonal ? (keeping with: integer -> int, character -> char, etc) Just as long as it isn't "String", since that's a class.It's amazing how the bool/string-issue pops up every other week. Walter would just have to hard-code the aliases, just to get it over with :-)Oh no, I've started the discussion again? I really didn't mean to.It goes back under the rug then... "Never mind the little bump" :-)You mean stored as ints? If not, that would be even stranger: bit allocating a byte, but bit[x] allocating at least 4 bytes. Meaning there's a difference between bit and bit[1]. Is bit[1] allowed at all? Is an array of 1 in general allowed? I will test it myself :-)It's allowed, and it's 4 bytes long. (bit[1]) bit[] is 8 bytes, since it has an int length. bit[0] is also allowed, and is 0 bytes... (but that could just be a quirk or bug ?)Hmm, lets declare a floating-bit, fbit, and float as fbit[32] :-)Like a fuzzy boolean. "mostly true", "kinda false" --anders
Nov 25 2004
Lionello Lunesu wrote:This "hypothetical" 32 bit machine I mentioned earlier, I was actually referring to Pentiums (among others). So, allocating 32 bits for bit[8] is natural, since allocating a byte would not generate faster execution: the Pentium fetches all of the 32 bytes anyhow because it just cannot adress memory in smaller chunks. There are machine code instructions to "load a byte from adress x", etc, but those really load 32 bits, and then throw away the 24 other bits. (The kludges I was talking about.) So C and other programming languages could go on believing they are on a byte-adressable machine. Since this is the case, the bit-array storage size is rouded up to the nearest 32 bits. Makes writing the compiler easier, generated code faster. And generating the code faster since you don't have to check for bit-array sizes of (less-than-or-equal-to) 8, 16, 24 bits. (Of course, things get interesting when we end up having 64 bit hardware. This is just one of the issues that has to be dealt with in the 64 bit version of D compiler.)It is faster since you can load/store 32 bits at a time, when setting several. Otherwise it would load/store 8 bits at a time, by using bytes.Yeah, but if the array-size is fixed, as is the case with bit[8]? Why would you even want to fetch 32-bits, if you know you only need 8 and fetching a byte is just as fast?
Nov 25 2004