digitalmars.D.learn - support for unicode in identifiers
- Vlad Levenfeld (9/9) Jun 01 2014 I was pretty happy to find that I could use mu and sigma when
- Chris Nicholson-Sauls (9/18) Jun 01 2014 The allowed characters are those defined as "universal" in
- Vlad Levenfeld (12/12) Jun 01 2014 With unicode support (especially with UCFS) I can really code
- Vlad Levenfeld (1/1) Jun 01 2014 Ah!, found it in utf.h as ALPHA_TABLE
I was pretty happy to find that I could use mu and sigma when writing statistical routines, but I've found that for more obscure non-ascii characters the support is hit or miss. For example, none of the subscripts are valid characters, but I can use superscript n as well as dot-notation for derivatives. I'm using dmd 2.065. What's the story behind the scenes? Is there a rationale behind the supported/unsupported or is it happenstance? Is there anywhere I can find a list of supported characters?
Jun 01 2014
On Sunday, 1 June 2014 at 22:26:42 UTC, Vlad Levenfeld wrote:I was pretty happy to find that I could use mu and sigma when writing statistical routines, but I've found that for more obscure non-ascii characters the support is hit or miss. For example, none of the subscripts are valid characters, but I can use superscript n as well as dot-notation for derivatives. I'm using dmd 2.065. What's the story behind the scenes? Is there a rationale behind the supported/unsupported or is it happenstance? Is there anywhere I can find a list of supported characters?The allowed characters are those defined as "universal" in ISO/IEC 9899 (the C standard). It's a pretty long list, but almost only "alphas;" I'm actually surprised you got superscripts and some other things to work. As I understand it, the intention was a) be like C99, and b) allow things like using "stærð" rather than "staerdh." I'm not sure usage like yours was even thought about, although I'd concede that it seems reasonable.
Jun 01 2014
With unicode support (especially with UCFS) I can really code more in the way I think. I never gave it much thought until I worked with D, but now that I have I feel it is a bit weird to work with epsilons and deltas on paper and "eps" and "del" or something on the screen. And what's a more descriptive variable name than the symbol used for it in the canonical representations? So, this may be a very naive question but I wonder, since dmd is open source, is there somewhere that the list of supported symbols can be extended? (hopefully something trivial to change, like a big array literal tucked away somewhere) I'm looking through the files labeled 'lexer' and 'utf' and things like that on github currently, but nothing's jumped out at me yet.
Jun 01 2014