digitalmars.D - 4-character literal
- Rick Mann (13/13) Jan 25 2007 Hi. I'm porting some Mac OS X (Carbon) code, and it relies heavily on a ...
- Rick Mann (4/8) Jan 25 2007 I realized I was misunderstanding something else I saw, and that "abcd"d...
- Gregor Richards (11/26) Jan 25 2007 Two solutions come to mind:
- torhu (11/24) Jan 25 2007 Try this.
- torhu (3/30) Jan 25 2007 Seems I was too quick. Replace 'const uint ID' with 'const uint
- Rick Mann (3/15) Jan 25 2007 Of the solutions proposed so far, this is probably the cleanest. Thanks!
- Don Clugston (3/22) Jan 25 2007 In general, I don't think it could work in a useful way. 4-character
- Gregor Richards (4/23) Jan 26 2007 If there is a benevolent force watching over us, there will never be
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/9) Jan 26 2007 Which is why it is much easier to write the GUI interface code
- Joel C. Salomon (7/8) Jan 26 2007 In the process of learning to scan C (for a compiler theory class), I
- Rick Mann (2/10) Jan 26 2007 I gotta say, I think they're very useful. Multibyte-character issues asi...
- Bill Baxter (10/22) Jan 27 2007 Probably it's just that most folks rarely ever have a need for such a
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (6/8) Jan 26 2007 Not really, it just gets flipped when _stored_ on a LittleEndian...
- Bill Baxter (7/19) Jan 27 2007 Yeh, ok. I'm thinking of the case where you read in a 4-byte uint
- Joel C. Salomon (8/12) Jan 26 2007 Generally, though, arbitrary four bytes with the high bit set will
- janderson (7/20) Jan 25 2007 To add to Gregor Richards suggestions, you may try an union (untested).
- Bill Baxter (20/33) Jan 25 2007 Interesting question. How's this?
- Bill Baxter (3/43) Jan 25 2007 Damn! Torhu beat me too it!
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (14/15) Jan 26 2007 Nope,
- David L. Davis (48/56) Jan 26 2007 Here's some sample code based off of Gregor Richards' and Joel's ideas. ...
Hi. I'm porting some Mac OS X (Carbon) code, and it relies heavily on a language feature that's been used on the Mac for decades: 4-byte character literals (like 'abcd'). In this case, I'm setting up enums that will be passed to OS APIs as 32-bit unsigned int parameters: enum : uint { kSomeConstantValue = 'abcd' } I tried using a 4-character string literal, but I get the following error when I do: src/d/macos/carbon/carbonevents.d:29: Error: cannot implicitly convert expression ("ptrg") of type char[4] to uint src/d/macos/carbon/carbonevents.d:29: Error: Integer constant expression expected instead of cast(uint)"ptrg" src/d/macos/carbon/carbonevents.d:34: Error: cannot implicitly convert expression ("etrg") of type char[4] to uint src/d/macos/carbon/carbonevents.d:34: Error: Integer constant expression expected instead of cast(uint)"etrg" I tried basing the enum on dchar, and I tried appending "d" to the end of the string literals, but neither works. Any suggestions that don't involve significant re-writing of the 4-character literals? Thanks! P.S. Grr. I don't like posting my unobfuscated email address to public sites.
Jan 25 2007
Rick Mann Wrote:enum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
Rick Mann wrote:Rick Mann Wrote:Two solutions come to mind: 1) Will work, very ugly: (cast(uint) 'a' << 24) + (cast(uint) 'b' << 16) + (cast(uint) 'c' << 8) + cast(uint) 'd' 2) Probably won't work: *(cast(uint*) ("abcd".ptr)) 3) Will work: 0x61626364 Each pretty bad. IMHO the original solution is pretty bad too :) - Gregor Richardsenum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
Rick Mann wrote:Rick Mann Wrote:Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }enum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
torhu wrote:Rick Mann wrote:Seems I was too quick. Replace 'const uint ID' with 'const uint MAKE_ID', and it will compile.Rick Mann Wrote:Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }enum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
torhu Wrote:Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
Jan 25 2007
Rick Mann wrote:torhu Wrote:In general, I don't think it could work in a useful way. 4-character literals do not fit in a uint. It only really works for ASCII.Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
Jan 25 2007
Rick Mann wrote:torhu Wrote:If there is a benevolent force watching over us, there will never be four-character literals in D X_X - Gregor RichardsTry this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
Jan 26 2007
Gregor Richards wrote:Which is why it is much easier to write the GUI interface code in C++ with an extern "C" interface, than porting it over to D. --andersSadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?If there is a benevolent force watching over us, there will never be four-character literals in D X_X
Jan 26 2007
Rick Mann wrote:Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source. --Joel
Jan 26 2007
Joel C. Salomon Wrote:Rick Mann wrote:I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source.
Jan 26 2007
Rick Mann wrote:Joel C. Salomon Wrote:Probably it's just that most folks rarely ever have a need for such a thing. And in the rare case that we do, the template solution doesn't seem so bad. Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on? So you're probably going to want to use it inside a version(LittleEndian) {} else {} construct anyway. Might as well tuck that away inside the MAKE_ID template. --bbRick Mann wrote:I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source.
Jan 27 2007
Bill Baxter wrote:Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on?Not really, it just gets flipped when _stored_ on a LittleEndian... i.e. the char const 'ABCD' is the same as the hex const 0x41424344 In arch i386 this would read 44434241 but in arch ppc it's 41424344. That is, if you were to store it somewhere or look at the objectfile. --anders
Jan 26 2007
Anders F Björklund wrote:Bill Baxter wrote:Yeh, ok. I'm thinking of the case where you read in a 4-byte uint signature from a file. If you load it in as a uint, you have to watch out for the endianness of the file vs that of the platform you're running on. Or just compare as a sequence of chars rather than uint. --bbBesides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on?Not really, it just gets flipped when _stored_ on a LittleEndian... i.e. the char const 'ABCD' is the same as the hex const 0x41424344 In arch i386 this would read 44434241 but in arch ppc it's 41424344. That is, if you were to store it somewhere or look at the objectfile. --anders
Jan 27 2007
Rick Mann wrote:Joel C. Salomon Wrote:Generally, though, arbitrary four bytes with the high bit set will constitute an invalid UTF-8 sequence. Assuming you have the number 61626364 in an int32 somewhere, will 'abcd' really tell you something you wanted to know about the number? (Unless you’re dereferencing a char* cast to int*, in which case you deserve all the hassle the debugger can throw at you. ☺) --JoelEntering numbers in base 256 is asking for trouble, especially with UTF-8 source.I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.
Jan 26 2007
Rick Mann wrote:Rick Mann Wrote:To add to Gregor Richards suggestions, you may try an union (untested). union Converter { uint asInt; char[4] chr; } const Converter kSomeConstantValue = { chr : "abcd" }; //To get kSomeConstantValue.asInt -Joelenum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
Rick Mann wrote:Rick Mann Wrote:Interesting question. How's this? import std.stdio; template touint(char[] T) { static assert(T.length==4, "Integer constants must be of length 4"); const uint touint = (cast(char)T[0] << 24)| (cast(char)T[1] << 16)| (cast(char)T[2] << 8)| (cast(char)T[3]); } enum { kSomeConstantValue = touint!("xyzz") } void main() { writefln("%x", kSomeConstantValue); }enum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
Bill Baxter wrote:Rick Mann wrote:Damn! Torhu beat me too it! --bbRick Mann Wrote:Interesting question. How's this? import std.stdio; template touint(char[] T) { static assert(T.length==4, "Integer constants must be of length 4"); const uint touint = (cast(char)T[0] << 24)| (cast(char)T[1] << 16)| (cast(char)T[2] << 8)| (cast(char)T[3]); } enum { kSomeConstantValue = touint!("xyzz") } void main() { writefln("%x", kSomeConstantValue); }enum : uint { kSomeConstantValue = 'abcd' }I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
Rick Mann wrote:Any suggestions that don't involve significant re-writing of the 4-character literals? Thanks!Nope, I've rewritten mine as hex. At least it doesn't involve template voodoo. i.e. the filter that does the initial C to D translation also converts the 4-char constants to 32-bit constants. Since D is not source code compatible anyway, one might as well convert it once and for all IMHO. Ditto goes for those pesky assert macros strewn out all over the place*. Yes I am talking to you, /usr/include/AssertMacros.h. "2002's finest" (see "Living In an Exceptional World by Sean Parent" for the details: http://developer.apple.com/dev/techsupport/develop/issue11toc.shtml) Reinventing the preprocessor using templates is not my own favorite... --anders * If you see "require_noerr", that's the ONERRORGOTO I'm talking about. I just expanded it to the macro definition rather than leaving it in.
Jan 26 2007
Rick Mann Wrote:Hi. I'm porting some Mac OS X (Carbon) code, and it relies heavily on a language feature that's been used on the Mac for decades: 4-byte character literals (like 'abcd'). In this case, I'm setting up enums that will be passed to OS APIs as 32-bit unsigned int parameters: enum : uint { kSomeConstantValue = 'abcd' }Here's some sample code based off of Gregor Richards' and Joel's ideas. I had to do a little reversing logic which showed that they both gave the exact same results. (On Intel, numbers have their bits stored in the revese order...so to me, Gregor's original way seemed more correct for creating a number from a string). // char4.d // dmd v1.003 WinXP SP2 union Converter { uint asInt; char[4] chr; } const Converter kSomeConstantValue = { chr : "abcd" }; const Converter kSomeConstantValueRev = { chr : "dcba" }; private import std.stdio; int main() { // Joel writefln("kSomeConstantValue.asInt=%d", kSomeConstantValue.asInt); // Gregor Richards idea (Reversed to equal Joel's output) writefln("a+b<<8+c<<16+d<<24=%d", (cast(uint)'a') + (cast(uint)'b' << 8) + (cast(uint)'c' << 16) + (cast(uint)'d' << 24)); writefln(); writefln("******"); writefln(); // Joel's reverse to equal Gregor's output writefln("kSomeConstantValueRev.asInt=%d", kSomeConstantValueRev.asInt); // Gregor Richards original idea (Intel's numerical reverse storage) writefln("a<<24+b<<16+c<<8+d=%d", (cast(uint)'a' << 24) + (cast(uint)'b' << 16) + (cast(uint)'c' << 8) + (cast(uint)'d')); return 0; } Output: ------------ C:\dmd>dmd char4.d C:\dmd\bin\..\..\dm\bin\link.exe char4,,,user32+kernel32/noi; C:\dmd>char4 kSomeConstantValue.asInt=1684234849 a+b<<8+c<<16+d<<24=1684234849 ****** kSomeConstantValueRev.asInt=1633837924 a<<24+b<<16+c<<8+d=1633837924 C:\dmd> David L.
Jan 26 2007