digitalmars.D - Why not extend do to allow unicode in ID's?
- Bert (80/80) Jun 29 2019 It would greatly expand the coverage.
- sarn (8/8) Jun 29 2019 D already allows non-latin characters in identifiers, just not
- Bert (3/11) Jun 30 2019 Yeah, I noticed some work but many do not and I'm not even sure
- Dennis (27/33) Jun 30 2019 Currently D allows "universal alphas" in identifiers, so Greek
- Bert (21/54) Jun 30 2019 Thanks. I guess I could create a small routine that hacks the
- Dennis (11/21) Jul 01 2019 I don't have much Nim experience myself, so maybe you should ask
- Martin Krejcirik (4/7) Jul 01 2019 I think a source code should be easily editable by anyone using a
- Bert (11/19) Jul 01 2019 It's time to grow up? How can progress be made if we don't
- Martin Krejcirik (4/7) Jul 02 2019 Editors maybe, but what about keyboards ? Can you easily write my
- Timon Gehr (4/11) Jul 02 2019 Of course. I can easily write this even on a US keyboard. My editor is
- XavierAP (4/8) Jul 03 2019 auto Krejčiřík = 0;
- Jonathan M Davis (19/22) Jul 01 2019 ...
- rikki cattermole (4/30) Jul 01 2019 No DIP is required. The lexer just needs updating to match to the
- Jonathan M Davis (9/41) Jul 01 2019 If a character is a Unicode alpha character, then yes. However if it's n...
It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful. In fact, it would be nice for ops too. ░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼ ∩ε≡φ±≥≤⌠⌡÷≈∙·√ⁿ²■☺♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼ I realize the excuse is going to be "It makes the code look ugly or hard to read", not all editors will support it, etc... So? Those are lame excuses. People can abuse anything, you can't police the world. Stopping all legitimate uses because someone might use it illegitimately is ignorant and harmful(which is why it is ignorant). It is best to enable unicode support and then have standards and guidelines and let some people shoot themselves in the foot if they want... that is the best way to learn not to do it again. Imagine being able to write proper mathematical formula ID's: ∞ δ Ω Θ Φ τ µ σ ε φ or using valid mathematical operators: ∩ ≡ ± ≥ ≤ ÷ ≈ ∙ √ ⁿ ² or when you write a card game: ♥ ♦ ♣ ♠ These are much better than the verbose words that we have to use now. I know some will say the opposite, but they can say it and be wrong. Trying to stop me from shooting myself in the foot when I don't own a gun is abusive to me and just like shooting me in the foot! I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness. There is a reason the first amendment exists in the US, because the founders knew what limitations of expression would do. The same applies to all things. Maybe one could use a switch to enable such a language with a compiler warning about such use. Maybe we can have a special D code page for useful symbols so there is a standard code for each that one could properly map using their editor of choice? For example, we could have each symbol map to a long name that one could use to replace the source: ♥ = Symbol_Heart_0x2660 // or even just __Symbol__0x2660 ♦ = Symbol_Diamond_0x2661 ♣ = Symbol_Club_0x2662 ♠ = Symbol_Spade_0x2663 And one could then change any source code between the symbolic form and the verbose form using a command line utility. E.g., int ♥ = 3; Can be converted to int Symbol_Heart_0x2660 = 3; and back without issue(99.9999999999% of code). This would potentially cause issue with meta programming when comparing string of the id names but this is a minor issue. In fact, internally D could just use the long symbol name and require the programmer to use them. E.g., static if (id == "♥") // invalid if id gets converted to long name internally. symbol to it's long name There are solutions to the problems... let's work on finding one to make D better.
Jun 29 2019
D already allows non-latin characters in identifiers, just not arbitrary symbols: import std.stdio; void main() { double φ = 1.61803398874989484820; writeln(φ); }
Jun 29 2019
On Sunday, 30 June 2019 at 03:12:55 UTC, sarn wrote:D already allows non-latin characters in identifiers, just not arbitrary symbols: import std.stdio; void main() { double φ = 1.61803398874989484820; writeln(φ); }Yeah, I noticed some work but many do not and I'm not even sure what does or doesn't ;/
Jun 30 2019
On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:Imagine being able to write proper mathematical formula ID's:Currently D allows "universal alphas" in identifiers, so Greek letters are allowed already. See: https://dlang.org/spec/lex.html#identifiersor using valid mathematical operators:In D you can't add operators, but if you want math notation on existing ones, you might be interested in fonts with programming ligatures such as: https://github.com/tonsky/FiraCodeor when you write a card game:Custom literals can be added with templates, for example: octal!377 (https://github.com/dlang/phobos/blob/d57be4690fc923a1974a4ef4d8b84a951131d219/std/conv.d#L4062) tok!"if" (https://github.com/dlang-community/libdparse/blob/5270739bcd1962418784c7760773e24d28b6009b/src/dparse/lexer.d#L115) Since in strings any Unicode is allowed, you can do something similar: suit!"♥"I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness.If it's only for yourself, you can add a build step that substitutes your custom symbols with valid identifiers before compiling. Or use your own fork of the compiler, you probably only need to remove this line: https://github.com/dlang/dmd/blob/2599559d624275bfcff298b3a8b31f9d82ae534f/src/dmd/lexer.d#L524 Finally, if you truly long for ultimate freedom in how you write code, then Nim might be the right language for you since it aligns more with your "putting full trust in the programmer" view than D. In Nim, any non-ascii character is valid for identifiers, so even invalid Unicode characters are allowed. https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywords
Jun 30 2019
On Sunday, 30 June 2019 at 10:10:41 UTC, Dennis wrote:On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:Thanks. I guess I could create a small routine that hacks the binary that reverses the if check. This would be easiest to maintain as I woudln't have to recompile dmd every release, just install the new one and patch.Imagine being able to write proper mathematical formula ID's:Currently D allows "universal alphas" in identifiers, so Greek letters are allowed already. See: https://dlang.org/spec/lex.html#identifiersor using valid mathematical operators:In D you can't add operators, but if you want math notation on existing ones, you might be interested in fonts with programming ligatures such as: https://github.com/tonsky/FiraCodeor when you write a card game:Custom literals can be added with templates, for example: octal!377 (https://github.com/dlang/phobos/blob/d57be4690fc923a1974a4ef4d8b84a951131d219/std/conv.d#L4062) tok!"if" (https://github.com/dlang-community/libdparse/blob/5270739bcd1962418784c7760773e24d28b6009b/src/dparse/lexer.d#L115) Since in strings any Unicode is allowed, you can do something similar: suit!"♥"I don't write any code that anyone else reads so let me make the choice for myself rather than create arbitrary rules that limit expressiveness.If it's only for yourself, you can add a build step that substitutes your custom symbols with valid identifiers before compiling. Or use your own fork of the compiler, you probably only need to remove this line: https://github.com/dlang/dmd/blob/2599559d624275bfcff298b3a8b31f9d82ae534f/src/dmd/lexer.d#L524Finally, if you truly long for ultimate freedom in how you write code, then Nim might be the right language for you since it aligns more with your "putting full trust in the programmer" view than D. In Nim, any non-ascii character is valid for identifiers, so even invalid Unicode characters are allowed. https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywordsI've heard of nim but never really looked in to it much.... but every time I hear about it I am more and more enticed. It seems well put together but the syntax is a little off putting. I'm sure I could get used to it. I have a few questions: 1. There doesn't seem to be good IDE support. I mainly use Visual Studio and I see a nim for VSC which I don't use ;/ Is there any really good IDE support? 2. How does meta programming of Nim compare to D's? The main reason I use D is it's meta programming. 3. Nim seems to be have somewhat of a strong categorical and functional foundation. Is it more like Haskell than D? (In the sense of catering to strongly structured programming(functors, natural transformations, etc)) I'll try to read over the manual. Maybe my next program will be in Nim.
Jun 30 2019
On Sunday, 30 June 2019 at 23:27:56 UTC, Bert wrote:I have a few questions: 1. There doesn't seem to be good IDE support. I mainly use Visual Studio and I see a nim for VSC which I don't use ;/ Is there any really good IDE support?I don't have much Nim experience myself, so maybe you should ask on the Nim forum.2. How does meta programming of Nim compare to D's? The main reason I use D is it's meta programming.It also has static if ('when'), CTFE, type reflection ('typedesc') and templates. In addition, it has AST macros which D will not have. (You can find long past discussions why, or Google 'The Lisp Curse' for something related).3. Nim seems to be have somewhat of a strong categorical and functional foundation. Is it more like Haskell than D? (In the sense of catering to strongly structured programming(functors, natural transformations, etc))Both are system programming languages that support mutation, loops and pointers, so you can write C-style procedural code in either language. Whether Nim's higher level constructs are similar to Haskell is something I cannot judge.
Jul 01 2019
On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful.I think a source code should be easily editable by anyone using a keyborad and plain editor. Extended characters only complicate things.
Jul 01 2019
On Monday, 1 July 2019 at 17:14:08 UTC, Martin Krejcirik wrote:On Saturday, 29 June 2019 at 22:38:06 UTC, Bert wrote:It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicate things, why not just force everyone to code in binary? That would be the simplest thing to do, right? What you are telling me is that you want too force me to use your view but you don't want me to force you to use mine. What you are actually doing is assuming it would be a problem without actually knowing or having any evidence it would be. You should ponder that a little.It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful.I think a source code should be easily editable by anyone using a keyborad and plain editor. Extended characters only complicate things.
Jul 01 2019
On Monday, 1 July 2019 at 23:52:25 UTC, Bert wrote:It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicateEditors maybe, but what about keyboards ? Can you easily write my name (Krejčiřík) without copy and paste or character selector tool ?
Jul 02 2019
On 02.07.19 11:10, Martin Krejcirik wrote:On Monday, 1 July 2019 at 23:52:25 UTC, Bert wrote:Of course. I can easily write this even on a US keyboard. My editor is set up to translate Krej\vci\vr\'ik to Krejčiřík as I type. This is not a hard problem.It's time to grow up? How can progress be made if we don't progress. 99% of all modern text editors support UTF-8... with your logic we could say that ascii characters only complicateEditors maybe, but what about keyboards ? Can you easily write my name (Krejčiřík) without copy and paste or character selector tool ?
Jul 02 2019
On Tuesday, 2 July 2019 at 18:28:06 UTC, Timon Gehr wrote:Of course. I can easily write this even on a US keyboard. My editor is set up to translate Krej\vci\vr\'ik to Krejčiřík as I type. This is not a hard problem.auto Krejčiřík = 0; static assert(is(typeof(Krejčiřík))); Already supported :)
Jul 03 2019
On Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote:It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful.... Like most major languages, D supports identifiers with alphanumeric characters plus underscore with the first character not being allowed to be numeric. However, unlike most languages, it expands that to include Unicode alpha characters, meaning that quite a lot of Unicode is supported in identifiers. So, it already goes far beyond what most languages do. That being said, I think that you'll find that most folks will not be in favor of using Unicode in identifiers outside of code intended for people of a specific language who actually use those characters normally (e.g. Japanese characters when all of the programmers involved read and write Japanese and have keyboards that support it). The fact that a character is not a key on a typical keyboard means that anyone using an identifier with that charater in it will almost certainly have to copy-paste it, and that's really not going to over well with most people. If you really feel strongly about the matter, you can always create a DIP to propose a language change to allow more Unicode characters in identifiers, but I would not expect it to be accepted. - Jonathan M Davis
Jul 01 2019
On 02/07/2019 2:17 PM, Jonathan M Davis wrote:On Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote:No DIP is required. The lexer just needs updating to match to the (current) Unicode spec. https://github.com/dlang/dmd/blob/master/src/dmd/lexer.d#L1082It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful.... Like most major languages, D supports identifiers with alphanumeric characters plus underscore with the first character not being allowed to be numeric. However, unlike most languages, it expands that to include Unicode alpha characters, meaning that quite a lot of Unicode is supported in identifiers. So, it already goes far beyond what most languages do. That being said, I think that you'll find that most folks will not be in favor of using Unicode in identifiers outside of code intended for people of a specific language who actually use those characters normally (e.g. Japanese characters when all of the programmers involved read and write Japanese and have keyboards that support it). The fact that a character is not a key on a typical keyboard means that anyone using an identifier with that charater in it will almost certainly have to copy-paste it, and that's really not going to over well with most people. If you really feel strongly about the matter, you can always create a DIP to propose a language change to allow more Unicode characters in identifiers, but I would not expect it to be accepted. - Jonathan M Davis
Jul 01 2019
On Monday, July 1, 2019 8:56:55 PM MDT rikki cattermole via Digitalmars-d wrote:On 02/07/2019 2:17 PM, Jonathan M Davis wrote:If a character is a Unicode alpha character, then yes. However if it's not, then that would definitely be a language change and would require a DIP. The spec is quite specific about it requiring Unicode alpha characters, and the code does the same. Without looking at the Unicode spec, I have no clue which characters are alpha characters, but I'd be extremely surprised if a character like ± or ♥ qualified. - Jonathan M DavisOn Saturday, June 29, 2019 4:38:06 PM MDT Bert via Digitalmars-d wrote:No DIP is required. The lexer just needs updating to match to the (current) Unicode spec. https://github.com/dlang/dmd/blob/master/src/dmd/lexer.d#L1082It would greatly expand the coverage. It would be nice to use certain characters that are truly meaningful.... Like most major languages, D supports identifiers with alphanumeric characters plus underscore with the first character not being allowed to be numeric. However, unlike most languages, it expands that to include Unicode alpha characters, meaning that quite a lot of Unicode is supported in identifiers. So, it already goes far beyond what most languages do. That being said, I think that you'll find that most folks will not be in favor of using Unicode in identifiers outside of code intended for people of a specific language who actually use those characters normally (e.g. Japanese characters when all of the programmers involved read and write Japanese and have keyboards that support it). The fact that a character is not a key on a typical keyboard means that anyone using an identifier with that charater in it will almost certainly have to copy-paste it, and that's really not going to over well with most people. If you really feel strongly about the matter, you can always create a DIP to propose a language change to allow more Unicode characters in identifiers, but I would not expect it to be accepted. - Jonathan M Davis
Jul 01 2019
On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote:a character like ±A good example of a character that should not be allowed in identifiers, because it has a meaning of operator (and in general in theory we may want to reserve it for such future use). ISO or Unicode define what, not all, characters are letters or alphanumeric: https://dlang.org/spec/lex.html#identifiers https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks
Jul 03 2019
On Wednesday, 3 July 2019 at 23:21:19 UTC, XavierAP wrote:On Tuesday, 2 July 2019 at 04:34:42 UTC, Jonathan M Davis wrote:Maybe, maybe not. It could be useful in some contexts... probably could be more confusing but -, +, ± can be very useful as sub or superscripts for special mathematical situations(I've seen it used many times, such as representing the even and odd sets of things or for lower and raising operations that are encoded in symbolic form(such as momentum operations that can be computed by multiplication)). It may not be worth allowing because s_-*s_++3 would be very ambiguous... as would s±4+3. Specially if ± is also defined as an operator... But ± should be allowed to be used as an operator as that is the most useful case. 4 ± 3 could be a mathematical object containing two values. a ± b could be a mathematical object containing 2(m+n) values depend on how many values a and b contains. (4 ± 3)*(±6) contains 4 values = 42, -42, 6, -6. So D could go through the unicode list and determine which symbols are best suited for operators and which for identifiers and then enable their usage. Many symbols that are not appropriate for id's would be appropriate for operators: ▌╚█ These are ugly in some sense but they could have good meaning in relation to operations. █ could mean boxing: █a means box a. But they could also be useful for Id's... █ could mean rectangle. Symbols are arbitrary. We know millions of symbols. Our brain has no issues decoding them after we learn the meaning. The only problem is that it's nice to have consistency so we don't have to learn many different purposes for the same symbol(but we already do, it's not a huge deal, it does slow us down a little but usually context is clear). I think having it more open ended is better. It might require people exercising their neurons little bit but it is a good thing in the long run. Obviously people could make it very difficult by making code very terse but I doubt that would happen much. People don't code in D to make their life more difficult, they do it to make it less. Virtually everyone will choose the symbols in a logical way that will make sense. What could be done is that any unicode character in an id could have some ascii equivalent. someÆx is also some::432::x or whatever. If a good symbol could be found instead of ::. Then IDE's could learn to support the syntax and convert between them. A simple hotkey could work between the two and code pages could be flipped to change the keyboard. a pragma(codepage, 43) could inform the IDE to use use a codepage. These might have issues but without trying different things the optimal solution can't be found.a character like ±A good example of a character that should not be allowed in identifiers, because it has a meaning of operator (and in general in theory we may want to reserve it for such future use). ISO or Unicode define what, not all, characters are letters or alphanumeric: https://dlang.org/spec/lex.html#identifiers https://docs.microsoft.com/en-us/dotnet/api/system.char.isletter#remarks
Jul 04 2019