www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - 4-character literal

reply Rick Mann <rmann-d-lang latencyzero.com> writes:
Hi. I'm porting some Mac OS X (Carbon) code, and it relies heavily on a
language feature that's been used on the Mac for decades: 4-byte character
literals (like 'abcd'). In this case, I'm setting up enums that will be passed
to OS APIs as 32-bit unsigned int parameters:

enum : uint
{
    kSomeConstantValue = 'abcd'
}

I tried using a 4-character string literal, but I get the following error when
I do:

src/d/macos/carbon/carbonevents.d:29: Error: cannot implicitly convert
expression ("ptrg") of type char[4] to uint
src/d/macos/carbon/carbonevents.d:29: Error: Integer constant expression
expected instead of cast(uint)"ptrg"
src/d/macos/carbon/carbonevents.d:34: Error: cannot implicitly convert
expression ("etrg") of type char[4] to uint
src/d/macos/carbon/carbonevents.d:34: Error: Integer constant expression
expected instead of cast(uint)"etrg"


I tried basing the enum on dchar, and I tried appending "d" to the end of the
string literals, but neither works.

Any suggestions that don't involve significant re-writing of the 4-character
literals? Thanks!


P.S. Grr. I don't like posting my unobfuscated email address to public sites.
Jan 25 2007
next sibling parent reply Rick Mann <rmann-d-lang latencyzero.com> writes:
Rick Mann Wrote:

 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Jan 25 2007
next sibling parent Gregor Richards <Richards codu.org> writes:
Rick Mann wrote:
 Rick Mann Wrote:
 
 
enum : uint
{
    kSomeConstantValue = 'abcd'
}
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Two solutions come to mind: 1) Will work, very ugly: (cast(uint) 'a' << 24) + (cast(uint) 'b' << 16) + (cast(uint) 'c' << 8) + cast(uint) 'd' 2) Probably won't work: *(cast(uint*) ("abcd".ptr)) 3) Will work: 0x61626364 Each pretty bad. IMHO the original solution is pretty bad too :) - Gregor Richards
Jan 25 2007
prev sibling next sibling parent reply torhu <fake address.dude> writes:
Rick Mann wrote:
 Rick Mann Wrote:
 
 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }
Jan 25 2007
next sibling parent torhu <fake address.dude> writes:
torhu wrote:
 Rick Mann wrote:
 Rick Mann Wrote:
 
 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Try this. template MAKE_ID(char[] s) { static assert(s.length == 4); const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3]; } enum : uint { kSomeConstantValue = MAKE_ID!("abcd") }
Seems I was too quick. Replace 'const uint ID' with 'const uint MAKE_ID', and it will compile.
Jan 25 2007
prev sibling parent reply Rick Mann <rmann-d-lang latencyzero.com> writes:
torhu Wrote:

 Try this.
 
 template MAKE_ID(char[] s)
 {
      static assert(s.length == 4);
      const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3];
 }
 
 enum : uint
 {
      kSomeConstantValue = MAKE_ID!("abcd")
 }
Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
Jan 25 2007
next sibling parent Don Clugston <dac nospam.com.au> writes:
Rick Mann wrote:
 torhu Wrote:
 
 Try this.

 template MAKE_ID(char[] s)
 {
      static assert(s.length == 4);
      const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3];
 }

 enum : uint
 {
      kSomeConstantValue = MAKE_ID!("abcd")
 }
Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
In general, I don't think it could work in a useful way. 4-character literals do not fit in a uint. It only really works for ASCII.
Jan 25 2007
prev sibling next sibling parent reply Gregor Richards <Richards codu.org> writes:
Rick Mann wrote:
 torhu Wrote:
 
 Try this.

 template MAKE_ID(char[] s)
 {
      static assert(s.length == 4);
      const uint ID = (s[0] << 24) | (s[1] << 16) | (s[2] << 8) | s[3];
 }

 enum : uint
 {
      kSomeConstantValue = MAKE_ID!("abcd")
 }
Of the solutions proposed so far, this is probably the cleanest. Thanks! Sadly, nothing's really as nice as just saying 'abcd'. What would it take to get multi-character literals added to the language?
If there is a benevolent force watching over us, there will never be four-character literals in D X_X - Gregor Richards
Jan 26 2007
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Gregor Richards wrote:

 Sadly, nothing's really as nice as just saying 'abcd'. What would it 
 take to get multi-character literals added to the language?
If there is a benevolent force watching over us, there will never be four-character literals in D X_X
Which is why it is much easier to write the GUI interface code in C++ with an extern "C" interface, than porting it over to D. --anders
Jan 26 2007
prev sibling parent reply "Joel C. Salomon" <JoelCSalomon Gmail.com> writes:
Rick Mann wrote:
 Sadly, nothing's really as nice as just saying 'abcd'. What would it take to
get multi-character literals added to the language?
In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source. --Joel
Jan 26 2007
parent reply Rick Mann <rmann-d-lang latencyzero.com> writes:
Joel C. Salomon Wrote:

 Rick Mann wrote:
 Sadly, nothing's really as nice as just saying 'abcd'. What would it take to
get multi-character literals added to the language?
In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source.
I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.
Jan 26 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Rick Mann wrote:
 Joel C. Salomon Wrote:
 
 Rick Mann wrote:
 Sadly, nothing's really as nice as just saying 'abcd'. What would it take to
get multi-character literals added to the language?
In the process of learning to scan C (for a compiler theory class), I first heard about those. Seems not-too-useful. If you want a number, input the number; if you want a Unicode character, enter L'那' (or whatever the D equivalent is). Entering numbers in base 256 is asking for trouble, especially with UTF-8 source.
I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.
Probably it's just that most folks rarely ever have a need for such a thing. And in the rare case that we do, the template solution doesn't seem so bad. Besides isn't the value of a multi-character literal going to be dependent on the endianness of the machine you're running on? So you're probably going to want to use it inside a version(LittleEndian) {} else {} construct anyway. Might as well tuck that away inside the MAKE_ID template. --bb
Jan 27 2007
parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Bill Baxter wrote:

 Besides isn't the value of a multi-character literal going to be 
 dependent on the endianness of the machine you're running on?
Not really, it just gets flipped when _stored_ on a LittleEndian... i.e. the char const 'ABCD' is the same as the hex const 0x41424344 In arch i386 this would read 44434241 but in arch ppc it's 41424344. That is, if you were to store it somewhere or look at the objectfile. --anders
Jan 26 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Anders F Björklund wrote:
 Bill Baxter wrote:
 
 Besides isn't the value of a multi-character literal going to be 
 dependent on the endianness of the machine you're running on?
Not really, it just gets flipped when _stored_ on a LittleEndian... i.e. the char const 'ABCD' is the same as the hex const 0x41424344 In arch i386 this would read 44434241 but in arch ppc it's 41424344. That is, if you were to store it somewhere or look at the objectfile. --anders
Yeh, ok. I'm thinking of the case where you read in a 4-byte uint signature from a file. If you load it in as a uint, you have to watch out for the endianness of the file vs that of the platform you're running on. Or just compare as a sequence of chars rather than uint. --bb
Jan 27 2007
prev sibling parent "Joel C. Salomon" <JoelCSalomon Gmail.com> writes:
Rick Mann wrote:
 Joel C. Salomon Wrote:
 Entering numbers in base 256 is asking for trouble, especially with UTF-8
source.
I gotta say, I think they're very useful. Multibyte-character issues aside, it can be a lot handier to see 'abcd' in a debugger than '61626364'. And handling multibyte characters isn't that big a deal...just include all the bytes. If the integer interpretation is 4 bytes, treat it as an uint. If it's more, treat it as a ulong, and issue appropriate warnings/errors when assigning. Pad the values out with zeros. I'm not sure I understand the resistance to the notion.
Generally, though, arbitrary four bytes with the high bit set will constitute an invalid UTF-8 sequence. Assuming you have the number 61626364 in an int32 somewhere, will 'abcd' really tell you something you wanted to know about the number? (Unless you’re dereferencing a char* cast to int*, in which case you deserve all the hassle the debugger can throw at you. ☺) --Joel
Jan 26 2007
prev sibling next sibling parent janderson <askme me.com> writes:
Rick Mann wrote:
 Rick Mann Wrote:
 
 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
To add to Gregor Richards suggestions, you may try an union (untested). union Converter { uint asInt; char[4] chr; } const Converter kSomeConstantValue = { chr : "abcd" }; //To get kSomeConstantValue.asInt -Joel
Jan 25 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Rick Mann wrote:
 Rick Mann Wrote:
 
 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Interesting question. How's this? import std.stdio; template touint(char[] T) { static assert(T.length==4, "Integer constants must be of length 4"); const uint touint = (cast(char)T[0] << 24)| (cast(char)T[1] << 16)| (cast(char)T[2] << 8)| (cast(char)T[3]); } enum { kSomeConstantValue = touint!("xyzz") } void main() { writefln("%x", kSomeConstantValue); }
Jan 25 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Bill Baxter wrote:
 Rick Mann wrote:
 Rick Mann Wrote:

 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
I realized I was misunderstanding something else I saw, and that "abcd"d doesn't do what I thought (make a 4-byte character). So: how to I do the equivalent of 'abcd'? Thanks!
Interesting question. How's this? import std.stdio; template touint(char[] T) { static assert(T.length==4, "Integer constants must be of length 4"); const uint touint = (cast(char)T[0] << 24)| (cast(char)T[1] << 16)| (cast(char)T[2] << 8)| (cast(char)T[3]); } enum { kSomeConstantValue = touint!("xyzz") } void main() { writefln("%x", kSomeConstantValue); }
Damn! Torhu beat me too it! --bb
Jan 25 2007
prev sibling next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Rick Mann wrote:

 Any suggestions that don't involve significant re-writing of the 4-character
literals? Thanks!
Nope, I've rewritten mine as hex. At least it doesn't involve template voodoo. i.e. the filter that does the initial C to D translation also converts the 4-char constants to 32-bit constants. Since D is not source code compatible anyway, one might as well convert it once and for all IMHO. Ditto goes for those pesky assert macros strewn out all over the place*. Yes I am talking to you, /usr/include/AssertMacros.h. "2002's finest" (see "Living In an Exceptional World by Sean Parent" for the details: http://developer.apple.com/dev/techsupport/develop/issue11toc.shtml) Reinventing the preprocessor using templates is not my own favorite... --anders * If you see "require_noerr", that's the ONERRORGOTO I'm talking about. I just expanded it to the macro definition rather than leaving it in.
Jan 26 2007
prev sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
Rick Mann Wrote:

 
 Hi. I'm porting some Mac OS X (Carbon) code, and it relies heavily on a
language feature that's been used on the Mac for decades: 4-byte character
literals (like 'abcd'). In this case, I'm setting up enums that will be passed
to OS APIs as 32-bit unsigned int parameters:
 
 enum : uint
 {
     kSomeConstantValue = 'abcd'
 }
 
Here's some sample code based off of Gregor Richards' and Joel's ideas. I had to do a little reversing logic which showed that they both gave the exact same results. (On Intel, numbers have their bits stored in the revese order...so to me, Gregor's original way seemed more correct for creating a number from a string). // char4.d // dmd v1.003 WinXP SP2 union Converter { uint asInt; char[4] chr; } const Converter kSomeConstantValue = { chr : "abcd" }; const Converter kSomeConstantValueRev = { chr : "dcba" }; private import std.stdio; int main() { // Joel writefln("kSomeConstantValue.asInt=%d", kSomeConstantValue.asInt); // Gregor Richards idea (Reversed to equal Joel's output) writefln("a+b<<8+c<<16+d<<24=%d", (cast(uint)'a') + (cast(uint)'b' << 8) + (cast(uint)'c' << 16) + (cast(uint)'d' << 24)); writefln(); writefln("******"); writefln(); // Joel's reverse to equal Gregor's output writefln("kSomeConstantValueRev.asInt=%d", kSomeConstantValueRev.asInt); // Gregor Richards original idea (Intel's numerical reverse storage) writefln("a<<24+b<<16+c<<8+d=%d", (cast(uint)'a' << 24) + (cast(uint)'b' << 16) + (cast(uint)'c' << 8) + (cast(uint)'d')); return 0; } Output: ------------ C:\dmd>dmd char4.d C:\dmd\bin\..\..\dm\bin\link.exe char4,,,user32+kernel32/noi; C:\dmd>char4 kSomeConstantValue.asInt=1684234849 a+b<<8+c<<16+d<<24=1684234849 ****** kSomeConstantValueRev.asInt=1633837924 a<<24+b<<16+c<<8+d=1633837924 C:\dmd> David L.
Jan 26 2007