digitalmars.D.bugs - Universal character names not supported
- Sean Kelly (7/7) Oct 21 2005 C:\code\d>type test.d
- =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= (13/21) Oct 22 2005 -----BEGIN PGP SIGNED MESSAGE-----
- Sean Kelly (7/9) Oct 22 2005 WinXP. I created the file with UltraEdit, so a UTF-8 BOM may exist in t...
- Thomas Kuehne (7/16) Oct 22 2005 That is no Unicode problem. The file containts the *character* sequence ...
- Unknown W. Brackets (11/19) Oct 22 2005 That's because your code is wrong.
- Sean Kelly (43/47) Oct 22 2005 Is it? From the D spec:
- Unknown W. Brackets (27/100) Oct 23 2005 I'm not sure I understand why you're quoting what you are.
- Sean Kelly (5/9) Oct 23 2005 If that's the case then it's fine with me. I read the D spec as that it...
- =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= (14/25) Oct 23 2005 -----BEGIN PGP SIGNED MESSAGE-----
- Walter Bright (17/27) Oct 23 2005 Yes, it's wrong. D does not support the \u or \U syntax for identifier
- Sean Kelly (7/25) Oct 23 2005 Thanks for clearing that up. The reference to Appendix D of the C99 spe...
- zwang (2/40) Oct 23 2005 I have come across them in obfuscated C code :-)
C:\code\d>type test.d void \u00A0() {} void main() {} C:\code\d>dmd test test.d(1): no identifier for declarator void
Oct 21 2005
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sean Kelly schrieb:C:\code\d>type test.d void \u00A0() {} void main() {} C:\code\d>dmd test test.d(1): no identifier for declarator voidWhat OS do you use? Could you please zip the file and send it to me? (the compression should ensure that no "magic" encoding conversion are triggered) Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFDWfk73w+/yD4P9tIRAh7pAJ40f/pL2oBCO3zZ+ywZnJDNqzndgwCeI3YQ qQld3fiRigSbsDpy/wnNUgQ= =kQSd -----END PGP SIGNATURE-----
Oct 22 2005
In article <djct8n$uka$1 digitaldaemon.com>, =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= says...What OS do you use?WinXP. I created the file with UltraEdit, so a UTF-8 BOM may exist in the file as well.Could you please zip the file and send it to me?Done. I've also put it online here: http://home.f4.ca/sean/d/ucn.zip Sean
Oct 22 2005
Sean Kelly schrieb am 2005-10-22:In article <djct8n$uka$1 digitaldaemon.com>, =?UTF-8?B?VGhvbWFzIEvDvGhuZQ==?= says...That is no Unicode problem. The file containts the *character* sequence "\00A0", whereas it should contain the *byte* sequence "00A0" (assuming UTF-16 BE) or other *byte* sequences for UTF-8, UTF-16 LE, UTF-32 BE and UTF-32 LE. For a source sample that uses non-ASCI identifiers try: http://dstress.kuehne.cn/run/unicode_03.d ThomasWhat OS do you use?WinXP. I created the file with UltraEdit, so a UTF-8 BOM may exist in the file as well.Could you please zip the file and send it to me?Done. I've also put it online here: http://home.f4.ca/sean/d/ucn.zip
Oct 22 2005
That's because your code is wrong. void \u00A0() Is like: void 'a'() Which gives a similar message: dummy.d(1): no identifier for declarator void dummy.d(1): semicolon expected, not '97U' dummy.d(1): Declaration expected, not '97U' If I use Unicode, it works fine: void を検索() -[Unknown]C:\code\d>type test.d void \u00A0() {} void main() {} C:\code\d>dmd test test.d(1): no identifier for declarator void
Oct 22 2005
In article <djedqc$h0e$1 digitaldaemon.com>, Unknown W. Brackets says...That's because your code is wrong. void \u00A0() Is like: void 'a'()Is it? From the D spec: Identifier: IdentiferStart IdentiferStart IdentifierChars IdentifierChars: IdentiferChar IdentiferChar IdentifierChars IdentifierStart: _ Letter UniversalAlpha Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.) And from the C standard: identifier: identifier-nondigit identifier identifier-nondigit identifier digit identifier-nondigit: nondigit universal-character-name 6.4.3 Universal character names Syntax universal-character-name: \u hex-quad \U hex-quad hex-quad hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit .. Universal character names may be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set. Semantics The universal character name \Unnnnnnnn designates the character whose eight-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn.62) Similarly, the universal character name \unnnn designates the character whose four-digit short identifier is nnnn (and whose eight-digit short identifier is 0000nnnn). Do I have the declaration format wrong? I'll admit I've never used these before. Sean
Oct 22 2005
I'm not sure I understand why you're quoting what you are. In D, you can do this: printf("hello" \n); Yes, no typo. See that \n outside? That works fine. In D, \n is just another character literal. So is \0. In fact, so is \u00A0. They are all character literals. Like "hello". This is not, in any way, the same as C, nor should it be construed to be. D is not C. D is not C with rockets strapped on... D is another language entirely, which is very similar to C. D, unlike C (or at least, most implementations of C), supports Unicode source files. This means, instead of resorting to tricks or pretending, you can actually use strings of such characters. Things like à. Obviously, people from other countries than America do not type \uBLAH every time they want a character like that, nor should you. Unicode is multibyte. For example, \u00A0 means the Unicode character 00A0. In other words, (in UTF-16) those exact bytes: 00 and A0. This is in contrast to ANSI which only uses one byte per character. D supports actually having those codes in the file. Open it with a binary text editor, and add those two bytes right there for the function name. This is true unicode. This means that Japanese programmers can program Japanese in Japanese, not using \uBLAH. If I were Japanese (or fluent in the language), I would rather use English than that. You're right that this is something C supported that D does not. However, since it is - imho - entirely and completely useless (especially compared to the much better feature D does have), I don't see why anyone's going to complain. -[Unknown]In article <djedqc$h0e$1 digitaldaemon.com>, Unknown W. Brackets says...That's because your code is wrong. void \u00A0() Is like: void 'a'()Is it? From the D spec: Identifier: IdentiferStart IdentiferStart IdentifierChars IdentifierChars: IdentiferChar IdentiferChar IdentifierChars IdentifierStart: _ Letter UniversalAlpha Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.) And from the C standard: identifier: identifier-nondigit identifier identifier-nondigit identifier digit identifier-nondigit: nondigit universal-character-name 6.4.3 Universal character names Syntax universal-character-name: \u hex-quad \U hex-quad hex-quad hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit .. Universal character names may be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set. Semantics The universal character name \Unnnnnnnn designates the character whose eight-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn.62) Similarly, the universal character name \unnnn designates the character whose four-digit short identifier is nnnn (and whose eight-digit short identifier is 0000nnnn). Do I have the declaration format wrong? I'll admit I've never used these before. Sean
Oct 23 2005
In article <djfdin$1aom$1 digitaldaemon.com>, Unknown W. Brackets says...You're right that this is something C supported that D does not. However, since it is - imho - entirely and completely useless (especially compared to the much better feature D does have), I don't see why anyone's going to complain.If that's the case then it's fine with me. I read the D spec as that it was intended to support this format, but perhaps it just meant that the chars were supported in UTF format? Why the bit about UniversalAlpha for identifiers then? Sean
Oct 23 2005
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sean Kelly schrieb:In article <djfdin$1aom$1 digitaldaemon.com>, Unknown W. Brackets says...If it only sayed UTF for identifiers the following should compile: void 6(){ // do some thing } "6" is part of Unicode but is no "UniversalAlpha". Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFDW0f73w+/yD4P9tIRAr6DAKCGIsiaonqhZFYIAXK+bqj1zrJmMQCgtH+u WhpEyfz7EV9HvXN4HxbW12k= =XFDR -----END PGP SIGNATURE-----You're right that this is something C supported that D does not. However, since it is - imho - entirely and completely useless (especially compared to the much better feature D does have), I don't see why anyone's going to complain.If that's the case then it's fine with me. I read the D spec as that it was intended to support this format, but perhaps it just meant that the chars were supported in UTF format? Why the bit about UniversalAlpha for identifiers then?
Oct 23 2005
"Sean Kelly" <sean f4.ca> wrote in message news:djehj1$jso$1 digitaldaemon.com...In article <djedqc$h0e$1 digitaldaemon.com>, Unknown W. Brackets says...Yes, it's wrong. D does not support the \u or \U syntax for identifier characters. D supports actual embedded unicode alpha characters as identifier characters. Similarly, D does not support digraphs or trigraphs. D does support \u and \U as a way to specify unicode characters within string literals. The reason that D doesn't support identifiers like abc\u00A0xyz is because nobody in their right mind would actually write such an identifier. C is forced to because the C source character set is unspecified, so digraphs, trigraphs, and \u kludges are necessary to get 'portable' source code. One would presumably write C unicode identifiers in unicode, then there'd be some translater that converted them to \u syntax so that it would work with other C compilers. The D source character set *is* specified to be unicode, so there's no reason to translate it to \u notation. I've also never, ever seen anyone use \u notation in C code outside of a test suite. Ditto for both trigraphs and digraphs.That's because your code is wrong. void \u00A0() Is like: void 'a'()Is it?
Oct 23 2005
In article <djfnnu$1i1f$1 digitaldaemon.com>, Walter Bright says..."Sean Kelly" <sean f4.ca> wrote in message news:djehj1$jso$1 digitaldaemon.com...Thanks for clearing that up. The reference to Appendix D of the C99 spec threw me, as it referred to these characters. I suppose I should have realized that you meant the letters themselves rather than the formatting.In article <djedqc$h0e$1 digitaldaemon.com>, Unknown W. Brackets says...Yes, it's wrong. D does not support the \u or \U syntax for identifier characters. D supports actual embedded unicode alpha characters as identifier characters. Similarly, D does not support digraphs or trigraphs.That's because your code is wrong. void \u00A0() Is like: void 'a'()Is it?I've also never, ever seen anyone use \u notation in C code outside of a test suite. Ditto for both trigraphs and digraphs.Me either. I stumbled across that portion of the spec yesterday and tried it on a whim :-) Sorry for the confusion. Sean
Oct 23 2005
Sean Kelly wrote:In article <djfnnu$1i1f$1 digitaldaemon.com>, Walter Bright says...I have come across them in obfuscated C code :-)"Sean Kelly" <sean f4.ca> wrote in message news:djehj1$jso$1 digitaldaemon.com...Thanks for clearing that up. The reference to Appendix D of the C99 spec threw me, as it referred to these characters. I suppose I should have realized that you meant the letters themselves rather than the formatting.In article <djedqc$h0e$1 digitaldaemon.com>, Unknown W. Brackets says...Yes, it's wrong. D does not support the \u or \U syntax for identifier characters. D supports actual embedded unicode alpha characters as identifier characters. Similarly, D does not support digraphs or trigraphs.That's because your code is wrong. void \u00A0() Is like: void 'a'()Is it?I've also never, ever seen anyone use \u notation in C code outside of a test suite. Ditto for both trigraphs and digraphs.Me either. I stumbled across that portion of the spec yesterday and tried it on a whim :-) Sorry for the confusion. Sean
Oct 23 2005