www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Unicode symbols in the identifiers

reply "Andrey" <andr-sar yandex.ru> writes:
Should these variants serve as identifiers?

auto x²; //fails to compile: char 0x00b2 not allowed in 
identifier, unsupported char 0xb2 (why? is it not a digit?)

Same for ⅀, ∫ and etc.

Official documentations says:
«
D source text can be in one of the following formats:
ASCII
UTF-8
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
»

Math symbols could have a great use compare to just characters 
from other languages (who does code in Greek or Chinese?). Still, 
this function name in russian cause compile error: 2.вквадрате 
(вквадрате(2))
Jan 10 2013
next sibling parent reply "evilrat" <evilrat666 gmail.com> writes:
On Friday, 11 January 2013 at 02:09:33 UTC, Andrey wrote:
 Should these variants serve as identifiers?

 auto x²; //fails to compile: char 0x00b2 not allowed in 
 identifier, unsupported char 0xb2 (why? is it not a digit?)

 Same for ⅀, ∫ and etc.

 Official documentations says:
 «
 D source text can be in one of the following formats:
 ASCII
 UTF-8
 UTF-16BE
 UTF-16LE
 UTF-32BE
 UTF-32LE
 »

 Math symbols could have a great use compare to just characters 
 from other languages (who does code in Greek or Chinese?). 
 Still, this function name in russian cause compile error: 
 2.вквадрате (вквадрате(2))
save module as utf8(or any other enconding from list above) file, from your error description it is ascii
Jan 10 2013
parent reply "Andrey" <andr-sar yandex.ru> writes:
 save module as utf8(or any other enconding from list above) 
 file, from your error description it is ascii
I'm pretty sure I'm saving it in unicode. I can use all unicode chars easily in string literals ("x²") and output them to console. But using them in identifiers leads to compiler error. Apart from that, try this code: int в_квадрате(int num) { return num*num; } writeln(2.в_квадрате); You get: Error: found 'в_квадрате' when expecting ','
Jan 10 2013
next sibling parent "Andrey" <andr-sar yandex.ru> writes:
Forgot to mention. Linux 64 bit, D version 2.060
Jan 11 2013
prev sibling parent "evilrat" <evilrat666 gmail.com> writes:
On Friday, 11 January 2013 at 07:47:03 UTC, Andrey wrote:
 Apart from that, try this code:

 int в_квадрате(int num) { return num*num; }

 writeln(2.в_квадрате);

 You get: Error: found 'в_квадрате' when expecting ','
don't have any errors with this code(dmd 2.061, win8) but x² as identifier is really gives error. idk maybe it is a bug, or maybe not
Jan 11 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 11, 2013 at 03:09:29AM +0100, Andrey wrote:
 Should these variants serve as identifiers?
 
 auto x²; //fails to compile: char 0x00b2 not allowed in identifier,
 unsupported char 0xb2 (why? is it not a digit?)
Weird, identifiers like "Цвет" and "張" and even "ℝ" all work fine, but "⅀" doesn't work. Maybe it's a bug? [...]
 Still, this function name in russian cause compile error: 2.вквадрате
 (вквадрате(2))
This works for me: import std.stdio; real плюс(real a, real b) { return a+b; } void main() { writeln(плюс(1.61803, 3.14159)); writeln(1.61803.плюс(3.14159)); } Both writeln's print 4.75962. Are you sure you saved your source file in UTF-8 format? T -- "I'm not childish; I'm just in touch with the child within!" - RL
Jan 10 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-01-11 03:09, Andrey wrote:
 Should these variants serve as identifiers?

 auto x²; //fails to compile: char 0x00b2 not allowed in identifier,
 unsupported char 0xb2 (why? is it not a digit?)

 Same for ⅀, ∫ and etc.

 Official documentations says:
 «
 D source text can be in one of the following formats:
 ASCII
 UTF-8
 UTF-16BE
 UTF-16LE
 UTF-32BE
 UTF-32LE
 »

 Math symbols could have a great use compare to just characters from
 other languages (who does code in Greek or Chinese?). Still, this
 function name in russian cause compile error: 2.вквадрате
(вквадрате(2))
According to the specification D doesn't necessarly support unicode identifiers: "Identifiers start with a letter, _, or universal alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.) Identifiers can be arbitrarily long, and are case sensitive. Identifiers starting with __ (two underscores) are reserved." http://dlang.org/lex.html#Identifier -- /Jacob Carlborg
Jan 11 2013
parent "Andrey" <andr-sar yandex.ru> writes:
 According to the specification D doesn't necessarly support 
 unicode identifiers:

 "Identifiers start with a letter, _, or universal alpha, and 
 are followed by any number of letters, _, digits, or universal 
 alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) 
 Appendix D. (This is the C99 Standard.) Identifiers can be 
 arbitrarily long, and are case sensitive. Identifiers starting 
 with __ (two underscores) are reserved."

 http://dlang.org/lex.html#Identifier
http://www.algonet.se/~afb/d/universalalphas/universalalphas.html I can't understand logic why there are such symbols allowed µ·ʰʱʲʳʴʵʶʷʸʻʽʾʿˀˁːˑˠˡˢˣˤͺՙऽଽι‿⁀ℂℇℊℋℌℍℎℏℐℑℒℓℕ℘ℙℚℛℜℝℤΩℨKÅℬℭ℮ℯℰℱℳℴℵℶℷℸⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿↀↁↂ々〆〇〡〢〣〤〥〦〧〨〩 and very useful math symbols completely ignored. C++11 standard allows these: ²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉ Annex E (normative) Universal character names for identifier characters [charname] E.1 Ranges of characters allowed [charname.allowed] 00A8, 00AA, 00AD, 00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF 0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054, 2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF 3004-3007, 3021-302F, 3031-303F 3040-D7FF F900-FD3D, FD40-FDCF, FDF0-FE44, FE47-FFFD 10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD
Jan 11 2013
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 11 Jan 2013 02:09:29 -0000, Andrey <andr-sar yandex.ru> wrote:

 Should these variants serve as identifiers?
See: http://dlang.org/lex.html#Identifier "Identifiers start with a letter, _, or universal alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.) Identifiers can be arbitrarily long, and are case sensitive. Identifiers starting with __ (two underscores) are reserved." Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 11 2013