www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - support for unicode in identifiers

reply "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
I was pretty happy to find that I could use mu and sigma when 
writing statistical routines, but I've found that for more 
obscure non-ascii characters the support is hit or miss. For 
example, none of the subscripts are valid characters, but I can 
use superscript n as well as dot-notation for derivatives.
I'm using dmd 2.065. What's the story behind the scenes? Is there 
a rationale behind the supported/unsupported or is it 
happenstance? Is there anywhere I can find a list of supported 
characters?
Jun 01 2014
parent reply "Chris Nicholson-Sauls" <ibisbasenji gmail.com> writes:
On Sunday, 1 June 2014 at 22:26:42 UTC, Vlad Levenfeld wrote:
 I was pretty happy to find that I could use mu and sigma when 
 writing statistical routines, but I've found that for more 
 obscure non-ascii characters the support is hit or miss. For 
 example, none of the subscripts are valid characters, but I can 
 use superscript n as well as dot-notation for derivatives.
 I'm using dmd 2.065. What's the story behind the scenes? Is 
 there a rationale behind the supported/unsupported or is it 
 happenstance? Is there anywhere I can find a list of supported 
 characters?
The allowed characters are those defined as "universal" in ISO/IEC 9899 (the C standard). It's a pretty long list, but almost only "alphas;" I'm actually surprised you got superscripts and some other things to work. As I understand it, the intention was a) be like C99, and b) allow things like using "stærð" rather than "staerdh." I'm not sure usage like yours was even thought about, although I'd concede that it seems reasonable.
Jun 01 2014
parent reply "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
With unicode support (especially with UCFS) I can really code 
more in the way I think. I never gave it much thought until I 
worked with D, but now that I have I feel it is a bit weird to 
work with epsilons and deltas on paper and "eps" and "del" or 
something on the screen. And what's a more descriptive variable 
name than the symbol used for it in the canonical representations?

So, this may be a very naive question but I wonder, since dmd is 
open source, is there somewhere that the list of supported 
symbols can be extended? (hopefully something trivial to change, 
like a big array literal tucked away somewhere) I'm looking 
through the files labeled 'lexer' and 'utf' and things like that 
on github currently, but nothing's jumped out at me yet.
Jun 01 2014
parent "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
Ah!, found it in utf.h as ALPHA_TABLE
Jun 01 2014