digitalmars.D - Non-ASCII in the future in the lexer

Cecil Ward (66/66) May 30 2023 What do you think? It occurred to me that as the language

Dom DiSc (7/16) May 31 2023 I'm fully with you, but the problem is not to have any Unicode

Walter Bright (4/9) May 31 2023 I use putty a lot to access computers remotely in text mode. With some

Richard (Rikki) Andrew Cattermole (6/7) May 31 2023 s/programming/Windows/
Kagamin (3/7) Jun 01 2023 Do you have Consolas font set in configuration Window ->

Walter Bright (1/1) May 31 2023 Some interesting food for thought. Thanks for taking the time to post th...
H. S. Teoh (50/74) May 31 2023 D already supports Unicode identifiers. For example, this is valid D

Walter Bright (6/12) May 31 2023 I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll thr...

Danni Coy (18/30) Jun 01 2023 The compose key on X windows (Linux) is user configurable.
Timon Gehr (13/29) Jun 01 2023 I am just using the Agda input mode in emacs, so e.g., I just type "\to"...

Walter Bright (3/5) Jun 01 2023 https://agda.readthedocs.io/en/v2.6.3/tools/emacs-mode.html

Timon Gehr (6/13) Jun 01 2023 Only this part is relevant:

Quirin Schroll (67/131) Jun 01 2023 TL;DR: What you want can be gained using smart fonts or other

Richard (Rikki) Andrew Cattermole (4/9) Jun 01 2023 Not the point of the above but related:
Cecil Ward (42/178) Jun 01 2023 About the search in your text editor. You

H. S. Teoh (22/39) Jun 01 2023 Coincidentally, I recently wrote a program (in D, of course :-P) that

Abdulhaq (13/24) Jun 02 2023 Of course it's subjective but I strongly dislike typing on
Abdulhaq (13/27) Jun 02 2023 Of course it's subjective but I strongly dislike typing on

Meta (4/35) Jun 02 2023 Not to mention that you can't have weighted keyboard (musical)

Walter Bright (10/19) Jun 03 2023 I'd prefer a regular keyboard with conventional keys - but with a displa...

Richard (Rikki) Andrew Cattermole (4/5) Jun 03 2023 Sort of, its pretty expensive.

H. S. Teoh (24/46) Jun 03 2023 That works too.

Cecil Ward <cecil cecilward.com> writes:

What do you think? It occurred to me that as the language 
develops we are occasionally having discussions about new 
keywords, or even changing them, for example: s/body/do/ some 
while back.

Unicode has been around for 30 years now and yet it is not 
getting fully used in programming languages for example. We are 
still stuck in our minds with ASCII only. Should we in future 
start mining the riches of unicode when we make changes to the 
grammar of programming languages (and other grammars)?

Would it be worthwhile considering wider unicode alternatives for 
keywords that we already have? Examples: comparison operators and 
other operators. We have unicode symbols for

≤     less than or equal <=
≥    greater than or equal >=

a proper multiplication sign ‘×’, like an x, as well as the * 
that we have been stuck with since the beginning of time.

± 	plus or minus might come in useful someday, can’t think what 
for.

I have … as one character; would be nice to have that as an 
alternative to .. (two ASCII fullstops) maybe?

I realise that this issue is hardly about the cure for world 
peace, but there seems to be little reason to be confined to 
ASCII forever when there are better suited alternatives and 
things that might spark the imagination of designers.

One extreme case or two: Many editors now automatically employ ‘ 
’ supposed to be 6-9 quotes, instead of ASCII '', so too with “ ” 
(6-9 matching pair). When Walter was designing the literal 
strings lexical items many items needed to be found for all the 
alternatives. And we have « » which are familiar to French 
speakers? It would be very nice to to fall over on 6-9 quotes 
anyway, and just accept them as an alternative. The second case 
that comes to mind: I was thinking about regex grammars and XML’s 
grammar, and I think one or both can now handle all kinds of 
unicode whitespace. That’s the kind of thinking I’m interested 
in. It would be good to handle all kinds of whitespace, as we do 
all kinds of newline sequences. We probably already do both well. 
And no one complains saying ‘we ought not bother with tab’, so 
handling U+0085 and the various whitespace types such as &nbsp in 
our lexicon of our grammar is to me a no-brainer.

And what use might we find some day for § and ¶ ? Could be great 
for some new exotic grammatical structural pattern. Look at the 
mess that C++ got into with the syntax of templates. They needed 
something other than < >. Almost anything. They could have done 
no worse with « ».

Another point: These exotics are easy to find in your text editor 
because they won’t be overused.

As for usability, some of our tools now have or could have 
‘favourite characters’ or ‘snippet’ text strings in a place in 
the ui where they are readily accessible. I have a unicode 
character map app and also a file with my unicode favourite 
characters in it. So there are things that we can do ourselves. 
And having a favourites comment block in a starter template file 
might be another example.

Argument against: would complicate our regexes with a new need 
for multiple alternatives as in  [xyz] rather than just one 
possible character in a search or replace operation. But I think 
that some regex engines are unicode aware and can understand 
concepts like all x-characters where x is some property or 
defines a subset.

I have a concern. I love the betterC idea. Something inside my 
head tells me not to move too far from C. But we have already 
left the grammar of C behind, for good reason. C doesn’t have .. 
or … ( :-) ) nor does it have $. So that train has left. But I’m 
talking about things that C is never going to have.

One point of clarification: I am not talking about D runtime. I’m 
confining myself to D’s lexer and D’s grammar.

May 30 2023

Dom DiSc <dominikus scherkl.de> writes:

On Wednesday, 31 May 2023 at 06:23:43 UTC, Cecil Ward wrote:
 What do you think? It occurred to me that as the language 
 develops we are occasionally having discussions about new 
 keywords, or even changing them, for example: s/body/do/ some 
 while back.

 Unicode has been around for 30 years now and yet it is not 
 getting fully used in programming languages for example. We are 
 still stuck in our minds with ASCII only. Should we in future 
 start mining the riches of unicode when we make changes to the 
 grammar of programming languages (and other grammars)?

I'm fully with you, but the problem is not to have any Unicode 
symbols in the grammar as operators or delimiters or whatever. 
It's the input method. Most keyboards don't have them on the keys 
and any other method is awfully slow. Even a well-designed 
selector table is slow if it is needed often - and most editors 
are far from providing such.

May 31 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2023 1:22 AM, Dom DiSc wrote:
 I'm fully with you, but the problem is not to have any Unicode symbols in the 
 grammar as operators or delimiters or whatever. It's the input method. Most 
 keyboards don't have them on the keys and any other method is awfully slow.
Even 
 a well-designed selector table is slow if it is needed often - and most
editors 
 are far from providing such.

I use putty a lot to access computers remotely in text mode. With some 
experimentation, some Unicode characters are rendered, but some aren't, like
the 
69 quotes. Maybe the programming world isn't quite ready for them yet.

May 31 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 31/05/2023 8:47 PM, Walter Bright wrote:
 Maybe the programming world isn't quite ready for them yet.

s/programming/Windows/

Try ConEmu terminal emulator, it supports Putty. I find ConEmu works 
very well with Cygwin for Unicode printing.

An extension of this is that we really need a proper console module in 
Phobos that uses the 16bit stuff as that makes it work out of the box.

May 31 2023

Kagamin <spam here.lot> writes:

On Wednesday, 31 May 2023 at 08:47:04 UTC, Walter Bright wrote:
 I use putty a lot to access computers remotely in text mode. 
 With some experimentation, some Unicode characters are 
 rendered, but some aren't, like the 69 quotes. Maybe the 
 programming world isn't quite ready for them yet.

Do you have Consolas font set in configuration Window -> 
Appearance?

Jun 01 2023

Walter Bright <newshound2 digitalmars.com> writes:

Some interesting food for thought. Thanks for taking the time to post this.

May 31 2023

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, May 31, 2023 at 06:23:43AM +0000, Cecil Ward via Digitalmars-d wrote:
 What do you think? It occurred to me that as the language develops we
 are occasionally having discussions about new keywords, or even
 changing them, for example: s/body/do/ some while back.
 
 Unicode has been around for 30 years now and yet it is not getting
 fully used in programming languages for example. We are still stuck in
 our minds with ASCII only. Should we in future start mining the riches
 of unicode when we make changes to the grammar of programming
 languages (and other grammars)?

D already supports Unicode identifiers.  For example, this is valid D
today:

	int функция(int параметр) {
		return (параметр > 0) ? 2*функция(параметр-1) + 1 : 2;
	}

Of course, current language keywords are English- (and ASCII-) only.


 Would it be worthwhile considering wider unicode alternatives for
 keywords that we already have? Examples: comparison operators and
 other operators. We have unicode symbols for
 
 ≤     less than or equal <=
 ≥    greater than or equal >=
 
 a proper multiplication sign ‘×’, like an x, as well as the * that we
 have been stuck with since the beginning of time.

This is all great, but as someone else has already said, the input
method could be a problem area.  On my PC, I've set up XKB input with a
compose key such that many of these symbols are relatively easily
accessible; for example, Compose + < + = produces ≤; and Compose + v + /
produces √.  However, some symbols are more tricky to input, and some
are not accessible this way.  While it's always possible to, e.g., use a
character map widget to select a particular symbol, that significantly
slows down how fast you can type code, which negatively affects
productivity.

One dream I've always had is the so-called software-controlled keyboard:
instead of a keyboard with physical keys, you'd have a keyboard that's
actually a touchscreen, with keys that can be replaced from software.
So for example, when writing D + Unicode symbols, you'd switch to
"Unicode D" layout where symbols like ≤, ≥, ×, etc. are easily
accessible.  We already have this on our mobile devices, in fact, to
various degrees of customizability.  It just has to be taken to the next
step of allowing easy remapping of keyboard layouts and switching
between them.  Each future programming language, for example, could come
with its own layout having language-specific symbols easily accessible.


 ± 	plus or minus might come in useful someday, can’t think what for.

In one of my projects, there's a vector calculator program where ±
produces an expression that returns a list of values produced by all
possible combinations of signs where the ± operator appears.  It's very
useful for certain applications, like combinatorial polytopes where ±
appears frequently.


[...]
 Argument against: would complicate our regexes with a new need for
 multiple alternatives as in  [xyz] rather than just one possible
 character in a search or replace operation. But I think that some
 regex engines are unicode aware and can understand concepts like all
 x-characters where x is some property or defines a subset.

std.regex *is* unicode-aware, BTW. Check this out:

````d
import std;
string преобразовать(string текст) {
	return текст.replaceAll(regex(`[а-я]`), "X");
}
void main() {
	writefln("blah blah это не правда blah
blah".преобразовать);
}
````

Output:

````
blah blah XXX XX XXXXXX blah blah
````

It correctly handles ranges of non-ASCII characters.


T

-- 
Real Programmers use "cat > a.out".

May 31 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2023 8:13 AM, H. S. Teoh wrote:
 This is all great, but as someone else has already said, the input
 method could be a problem area.  On my PC, I've set up XKB input with a
 compose key such that many of these symbols are relatively easily
 accessible; for example, Compose + < + = produces ≤; and Compose + v + /
 produces √.  However, some symbols are more tricky to input, and some
 are not accessible this way.

I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll through
the 
various incarnations of a letter. So, placing the cursor on a, and hitting
^X-U, 
changes it to a with an umlaut, a with an accent, etc. On a -, it scrolls 
through the various - variations. On ", it scrolls through the quoting symbols.

Of course, this is pretty limited.

May 31 2023

Danni Coy <danni.coy gmail.com> writes:

On Thu, Jun 1, 2023 at 4:35 PM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 5/31/2023 8:13 AM, H. S. Teoh wrote:
 This is all great, but as someone else has already said, the input
 method could be a problem area.  On my PC, I've set up XKB input with a
 compose key such that many of these symbols are relatively easily
 accessible; for example, Compose + < + = produces ≤; and Compose + v + /
 produces √.  However, some symbols are more tricky to input, and some
 are not accessible this way.

 I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll through
the
 various incarnations of a letter. So, placing the cursor on a, and hitting
^X-U,
 changes it to a with an umlaut, a with an accent, etc. On a -, it scrolls
 through the various - variations. On ", it scrolls through the quoting symbols.

 Of course, this is pretty limited.

The compose key on X windows (Linux) is user configurable.
You can use it to do basically whatever you want.
there are extra bindings available online for the greek alphabet and
mathematical symbols.
it's controlled from a configuration file for which the syntax looks
something like this.
<Multi_key> <asciitilde> <asciitilde>          : "≈"

On windows there is at least one addon that adds this functionality
and is user configurable.
I don't know what the situation is on Mac or on Wayland.

As low hanging fruit I would like to see constants such as MATH_PI
defined as by the symbol (eg π).
I think one of the most important qualities of code is readability and
getting the balance between verlbosity and terseness is important.

I would also like to see syntax like the following be possible

if ( 0 < x ≤ 8) {}  (lowers to if ( x > 0 && x <= 8) {} )

Jun 01 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 6/1/23 08:31, Walter Bright wrote:
 On 5/31/2023 8:13 AM, H. S. Teoh wrote:
 This is all great, but as someone else has already said, the input
 method could be a problem area.  On my PC, I've set up XKB input with a
 compose key such that many of these symbols are relatively easily
 accessible; for example, Compose + < + = produces ≤; and Compose + v + /
 produces √.  However, some symbols are more tricky to input, and some
 are not accessible this way.

 
 I've struggled with that, too. On MicroEmacs, I fixed ^X-U to scroll 
 through the various incarnations of a letter. So, placing the cursor on 
 a, and hitting ^X-U, changes it to a with an umlaut, a with an accent, 
 etc. On a -, it scrolls through the various - variations. On ", it 
 scrolls through the quoting symbols.
 
 Of course, this is pretty limited.
 

I am just using the Agda input mode in emacs, so e.g., I just type "\to" 
and I get "→", "\'a" and I get "á", etc. Many editors have similar 
plugins. This also works perfectly over ssh. In any case, the approach I 
have taken with my own lexers is that Unicode is supported, but never 
required. E.g., people can just write "->" instead of "→" and this is 
the case for all Unicode syntax elements (except if you have to match an 
identifier name I guess). After that, whether or not non-ASCII tokens 
are used at all becomes a question of code style and formatting.

In my experience, many programmers are too lazy (and/or ideologically 
against Unicode) to set up simple Unicode input and still prefer to 
write ASCII, but I much prefer reading Unicode. Further down the road, I 
plan to address this disconnect using an automatic code formatter.

Jun 01 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 6/1/2023 6:49 AM, Timon Gehr wrote:
 I am just using the Agda input mode in emacs, so e.g., I just type "\to" and I 
 get "→", "\'a" and I get "á", etc.

https://agda.readthedocs.io/en/v2.6.3/tools/emacs-mode.html

https://github.com/DigitalMars/med/blob/master/src/med/more.d#L350

Jun 01 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 6/1/23 20:20, Walter Bright wrote:
 On 6/1/2023 6:49 AM, Timon Gehr wrote:
 I am just using the Agda input mode in emacs, so e.g., I just type 
 "\to" and I get "→", "\'a" and I get "á", etc.

 
 https://agda.readthedocs.io/en/v2.6.3/tools/emacs-mode.html
 

Only this part is relevant:
https://agda.readthedocs.io/en/v2.6.3/tools/emacs-mode.html#unicode-input

(Agda has an emacs mode for the language and an input mode. I am using 
the input mode even for D code. There's also a TeX input mode, but the 
Agda input mode has more convenient bindings, so I am using that.)

 https://github.com/DigitalMars/med/blob/master/src/med/more.d#L350

Jun 01 2023

Quirin Schroll <qs.il.paperinik gmail.com> writes:

TL;DR: What you want can be gained using smart fonts or other 
smart UI tools.

---

On Wednesday, 31 May 2023 at 06:23:43 UTC, Cecil Ward wrote:
 Unicode has been around for 30 years now and yet it is not 
 getting fully used in programming languages for example. We are 
 still stuck in our minds with ASCII only. Should we in future 
 start mining the riches of unicode when we make changes to the 
 grammar of programming languages (and other grammars)?

The gain is too little for the cost. The gain is circumstantially 
negative and that will happen at exactly those places where it is 
particularly unfortunate.

 Would it be worthwhile considering wider unicode alternatives 
 for keywords that we already have? Examples: comparison 
 operators and other operators. We have unicode symbols for

 ≤     less than or equal <=
 ≥    greater than or equal >=

 a proper multiplication sign ‘×’, like an x, as well as the * 
 that we have been stuck with since the beginning of time.

 ± 	plus or minus might come in useful someday, can’t think what 
 for.

I can: `±` could be used for in-place negation. Let’s say you 
have:
```d
ref int f(); // is costly or has side-effects
```
To negate the result in-place, you have to do:
```d
int* p = &f();
*p = -*p;
```
or
```d
(ref int x) { x = -x; }(f());
```

 I have … as one character; would be nice to have that as an 
 alternative to .. (two ASCII fullstops) maybe?

 I realise that this issue is hardly about the cure for world 
 peace, but there seems to be little reason to be confined to 
 ASCII forever when there are better suited alternatives and 
 things that might spark the imagination of designers.

The problem are fonts that don’t support certain characters and 
editors defaulting to legacy encodings. One can handle 
`FranÃ§ais`, but `a Ã— b` (UTF-8 read as Windows-1252) is a 
problem because who knows what the character was.

It’s not that the gain is rather little, it’s the potential for 
high cost. A lot of people will avoid those like the plague 
because of legacy issues.

 One extreme case or two: Many editors now automatically employ 
 ‘ ’ supposed to be 6-9 quotes, instead of ASCII '', so too with 
 “ ” (6-9 matching pair).

Many document processors do that. Whoever writes code in them, 
they’re wrong.

 When Walter was designing the literal strings lexical items 
 many items needed to be found for all the alternatives. And we 
 have « » which are familiar to French speakers? It would be 
 very nice to to fall over on 6-9 quotes anyway, and just accept 
 them as an alternative.

Accepting them is one possibility. Having an editor that replaces 
“” by "" and ‘’ by '' is another. Any regex-replace can easily 
used for that: `‘([^’]*)’` by `'$1'`.

 The second case that comes to mind: I was thinking about regex 
 grammars and XML’s grammar, and I think one or both can now 
 handle all kinds of unicode whitespace.

Definitely not regex. It’s not standardized at all.

XML is quite a non-problem because directly supports specifying 
an encoding.

 That’s the kind of thinking I’m interested in. It would be good 
 to handle all kinds of whitespace, as we do all kinds of 
 newline sequences. We probably already do both well. And no one 
 complains saying ‘we ought not bother with tab’, so handling 
 U+0085 and the various whitespace types such as &nbsp in our 
 lexicon of our grammar is to me a no-brainer.

 And what use might we find some day for § and ¶ ? Could be 
 great for some new exotic grammatical structural pattern. Look 
 at the mess that C++ got into with the syntax of templates. 
 They needed something other than < >. Almost anything. They 
 could have done no worse with « ».

As a German, I find «» and ‹› a little irritating, because we’re 
using them like this: »« and ›‹. The Swiss use «content» and the 
French use « content » (with half-spaces).

C++ was wrong on template syntax, but they were right on using 
ASCII. D has good template syntax, and it’s ASCII.

 Another point: These exotics are easy to find in your text 
 editor because they won’t be overused.

Citation needed.

 As for usability, some of our tools now have or could have 
 ‘favourite characters’ or ‘snippet’ text strings in a place in 
 the ui where they are readily accessible. I have a unicode 
 character map app and also a file with my unicode favourite 
 characters in it. So there are things that we can do ourselves. 
 And having a favourites comment block in a starter template 
 file might be another example.

If you employ tooling, the best option is to leave the source 
code as-is and use a OpenType font or other UI-oriented things.

 Argument against: would complicate our regexes with a new need 
 for multiple alternatives as in  [xyz] rather than just one 
 possible character in a search or replace operation. But I 
 think that some regex engines are unicode aware and can 
 understand concepts like all x-characters where x is some 
 property or defines a subset.

Making `grep` harder to use is definitely a deal-breaker.

 I have a concern. I love the betterC idea. Something inside my 
 head tells me not to move too far from C. But we have already 
 left the grammar of C behind, for good reason. C doesn’t have 
 .. or … ( :-) ) nor does it have $. So that train has left. But 
 I’m talking about things that C is never going to have.

Unicode has U+2025 ‥ for you as well.

C is overly restrictive. It’s not based on ASCII, but a proper 
subset of ASCII that’s compatible with even older standards like 
EBCDIC. In today’s age, ASCII support is quite a safe bet. 
Unicode support isn’t.

 One point of clarification: I am not talking about D runtime. 
 I’m confining myself to D’s lexer and D’s grammar.

It sounds great in theory, but if any tool in your chain has no 
support for that, you’re out. I was running into that on Windows 
recently. Not D related.

I’m a Unicode fan. I created my own keyboard layout which puts a 
lot of nice stuff on AltGr and dead key sequences (e.g. proper 
quotation marks, currency symbols, math symbols, the complete 
Greek alphabet) while leaving anything that is printed on the 
keys where it was. Yet I fail to see the advantage of × over * 
and similar *in code.* There are several fonts that visually 
replace <= by a wider ≤ sign, != by a wide ≠, etc. If you want 
alternatives, use a font. It’s non-intrusive to the source code. 
It’s a million times better than Unicode in source. I don’t use 
those fonts because for some reason, they add a plethora of 
things that make sense in certain languages, e.g. replace `>>` by 
a ligature (think of `»`). That makes sense when it’s an 
operator, but it doesn’t when it’s two closing angle brackets 
(cf. Java or C++).

Jun 01 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 02/06/2023 3:47 AM, Quirin Schroll wrote:
     The second case that comes to mind: I was thinking about regex
     grammars and XML’s grammar, and I think one or both can now handle
     all kinds of unicode whitespace.
 
 Definitely not regex. It’s not standardized at all.

Not the point of the above but related: 
https://www.unicode.org/reports/tr18/

Unicode for regex is in fact standardized :)

Jun 01 2023

Cecil Ward <cecil cecilward.com> writes:

On Thursday, 1 June 2023 at 15:47:00 UTC, Quirin Schroll wrote:
 TL;DR: What you want can be gained using smart fonts or other 
 smart UI tools.

 ---

 On Wednesday, 31 May 2023 at 06:23:43 UTC, Cecil Ward wrote:
 Unicode has been around for 30 years now and yet it is not 
 getting fully used in programming languages for example. We 
 are still stuck in our minds with ASCII only. Should we in 
 future start mining the riches of unicode when we make changes 
 to the grammar of programming languages (and other grammars)?

 The gain is too little for the cost. The gain is 
 circumstantially negative and that will happen at exactly those 
 places where it is particularly unfortunate.

 Would it be worthwhile considering wider unicode alternatives 
 for keywords that we already have? Examples: comparison 
 operators and other operators. We have unicode symbols for

 ≤     less than or equal <=
 ≥    greater than or equal >=

 a proper multiplication sign ‘×’, like an x, as well as the * 
 that we have been stuck with since the beginning of time.

 ± 	plus or minus might come in useful someday, can’t think 
 what for.

 I can: `±` could be used for in-place negation. Let’s say you 
 have:
 ```d
 ref int f(); // is costly or has side-effects
 ```
 To negate the result in-place, you have to do:
 ```d
 int* p = &f();
 *p = -*p;
 ```
 or
 ```d
 (ref int x) { x = -x; }(f());
 ```

 I have … as one character; would be nice to have that as an 
 alternative to .. (two ASCII fullstops) maybe?

 I realise that this issue is hardly about the cure for world 
 peace, but there seems to be little reason to be confined to 
 ASCII forever when there are better suited alternatives and 
 things that might spark the imagination of designers.

 The problem are fonts that don’t support certain characters and 
 editors defaulting to legacy encodings. One can handle 
 `FranÃ§ais`, but `a Ã— b` (UTF-8 read as Windows-1252) is a 
 problem because who knows what the character was.

 It’s not that the gain is rather little, it’s the potential for 
 high cost. A lot of people will avoid those like the plague 
 because of legacy issues.

 One extreme case or two: Many editors now automatically employ 
 ‘ ’ supposed to be 6-9 quotes, instead of ASCII '', so too 
 with “ ” (6-9 matching pair).

 Many document processors do that. Whoever writes code in them, 
 they’re wrong.

 When Walter was designing the literal strings lexical items 
 many items needed to be found for all the alternatives. And we 
 have « » which are familiar to French speakers? It would be 
 very nice to to fall over on 6-9 quotes anyway, and just 
 accept them as an alternative.

 Accepting them is one possibility. Having an editor that 
 replaces “” by "" and ‘’ by '' is another. Any regex-replace 
 can easily used for that: `‘([^’]*)’` by `'$1'`.

 The second case that comes to mind: I was thinking about regex 
 grammars and XML’s grammar, and I think one or both can now 
 handle all kinds of unicode whitespace.

 Definitely not regex. It’s not standardized at all.

 XML is quite a non-problem because directly supports specifying 
 an encoding.

 That’s the kind of thinking I’m interested in. It would be 
 good to handle all kinds of whitespace, as we do all kinds of 
 newline sequences. We probably already do both well. And no 
 one complains saying ‘we ought not bother with tab’, so 
 handling U+0085 and the various whitespace types such as &nbsp 
 in our lexicon of our grammar is to me a no-brainer.

 And what use might we find some day for § and ¶ ? Could be 
 great for some new exotic grammatical structural pattern. Look 
 at the mess that C++ got into with the syntax of templates. 
 They needed something other than < >. Almost anything. They 
 could have done no worse with « ».

 As a German, I find «» and ‹› a little irritating, because 
 we’re using them like this: »« and ›‹. The Swiss use «content» 
 and the French use « content » (with half-spaces).

 C++ was wrong on template syntax, but they were right on using 
 ASCII. D has good template syntax, and it’s ASCII.

 Another point: These exotics are easy to find in your text 
 editor because they won’t be overused.

 Citation needed.

 As for usability, some of our tools now have or could have 
 ‘favourite characters’ or ‘snippet’ text strings in a place in 
 the ui where they are readily accessible. I have a unicode 
 character map app and also a file with my unicode favourite 
 characters in it. So there are things that we can do 
 ourselves. And having a favourites comment block in a starter 
 template file might be another example.

 If you employ tooling, the best option is to leave the source 
 code as-is and use a OpenType font or other UI-oriented things.

 Argument against: would complicate our regexes with a new need 
 for multiple alternatives as in  [xyz] rather than just one 
 possible character in a search or replace operation. But I 
 think that some regex engines are unicode aware and can 
 understand concepts like all x-characters where x is some 
 property or defines a subset.

 Making `grep` harder to use is definitely a deal-breaker.

 I have a concern. I love the betterC idea. Something inside my 
 head tells me not to move too far from C. But we have already 
 left the grammar of C behind, for good reason. C doesn’t have 
 .. or … ( :-) ) nor does it have $. So that train has left. 
 But I’m talking about things that C is never going to have.

 Unicode has U+2025 ‥ for you as well.

 C is overly restrictive. It’s not based on ASCII, but a proper 
 subset of ASCII that’s compatible with even older standards 
 like EBCDIC. In today’s age, ASCII support is quite a safe bet. 
 Unicode support isn’t.

 One point of clarification: I am not talking about D runtime. 
 I’m confining myself to D’s lexer and D’s grammar.

 It sounds great in theory, but if any tool in your chain has no 
 support for that, you’re out. I was running into that on 
 Windows recently. Not D related.

 I’m a Unicode fan. I created my own keyboard layout which puts 
 a lot of nice stuff on AltGr and dead key sequences (e.g. 
 proper quotation marks, currency symbols, math symbols, the 
 complete Greek alphabet) while leaving anything that is printed 
 on the keys where it was. Yet I fail to see the advantage of × 
 over * and similar *in code.* There are several fonts that 
 visually replace <= by a wider ≤ sign, != by a wide ≠, etc. If 
 you want alternatives, use a font. It’s non-intrusive to the 
 source code. It’s a million times better than Unicode in 
 source. I don’t use those fonts because for some reason, they 
 add a plethora of things that make sense in certain languages, 
 e.g. replace `>>` by a ligature (think of `»`). That makes 
 sense when it’s an operator, but it doesn’t when it’s two 
 closing angle brackets (cf. Java or C++).

About the search in your text editor. You

I had thought of ‘×’ for cross-product maybe. :-)

I don’t want you all to misunderstand me here, I’m not suggesting 
that I can defend all of these ideas, I’m just trying to free up 
our imagination. If we decide that we really want some perfect 
symbol for a new situation, maybe something already established, 
perhaps in maths or elsewhere, then I’m merely saying that we 
should perhaps remember that unicode exists and is not a new 
weird thing anymore.

The usability thing is not something that I’m too worried about 
because solutions will rise to meet problems. I have my favourite 
little snippets of IPA characters in a document and I keep that 
handy. My iPad has installable keyboard handlers of all sorts, 
including poly tonic ancient greek.

What made me think about this topic though is looking at my 
iPad’s virtual keyboard. The character … is no less accessible 
than ‘a’, and é and ß are just a long press. The ± £ § ¥ € 
characters on my iPad are no less accessible than ASCII. Over 
time, maybe keyboards will evolve seeing as it has already been 
with the iPad.

But we absolutely should not be ignoring usability here. We 
should ‘game’ how users will cope when affected by more 
adventurous proposals

A thought, those of us who hate new keywords (I am not one - I 
even love ADA!) would be able to consider mining Unicode for 
single character or few-character symbols instead of long english 
words that might cause breakage apart from restricting the space 
remaining for user-defined identifiers.

Staying inside ASCII’s 95 characters forever, it’s a bit like 
being a caged animal that when freed doesn’t want to leave its 
small world. We’re so very used to ASCII.

The takeaway here is just ‘remember unicode exists’ and the 
usability situation for some users is now first class if you are 
either lucky, like iPad owners, or else you set yourself up with 
some simple aids in the right way for what works for you. Having 
unicode in the back of your mind might help us become beloved by 
mathematicians, because we’ve made the ‘perfect fit’ choice. But 
the usability thing has to always be kept in mind and tips and 
links towards little apps ought to be readily handed out. We 
probably want to have ASCII longwinded fallbacks for users who 
really don’t like the keyboard situation though.

Jun 01 2023

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Thu, Jun 01, 2023 at 08:54:19PM +0000, Cecil Ward via Digitalmars-d wrote:
[...]
 I don’t want you all to misunderstand me here, I’m not suggesting that
 I can defend all of these ideas, I’m just trying to free up our
 imagination. If we decide that we really want some perfect symbol for
 a new situation, maybe something already established, perhaps in maths
 or elsewhere, then I’m merely saying that we should perhaps remember
 that unicode exists and is not a new weird thing anymore.
 
 The usability thing is not something that I’m too worried about
 because solutions will rise to meet problems. I have my favourite
 little snippets of IPA characters in a document and I keep that handy.
 My iPad has installable keyboard handlers of all sorts, including poly
 tonic ancient greek.

Coincidentally, I recently wrote a program (in D, of course :-P) that
translates ASCII transcriptions of IPA into Unicode.  And many years
ago, I also wrote a program (in C -- this was before I discovered D)
that translated ASCII wrapped inside <grk>...</grk> or <rus>...</rus>
tags into polytonic Greek or Cyrillic.  In my text editor I could just
type out the desired ASCII transcriptions, select the text, and pipe it
through these programs to get the Unicode out.


 What made me think about this topic though is looking at my iPad’s
 virtual keyboard. The character … is no less accessible than ‘a’, and
 é and ß are just a long press. The ± £ § ¥ € characters on my iPad are
 no less accessible than ASCII. Over time, maybe keyboards will evolve
 seeing as it has already been with the iPad.

[...]

I believe that the next step is to USB/WiFi touchscreen keyboards that
can be reconfigured to any symbol set by software.  All we need is a
long, horizontal device with a touchscreen mounted on suitable support
that makes it comfortable to type on, then have a standard API for
software to configure whatever symbols it wishes the user to use on it.
Instantly switch to APL symbols and back, for example.  Or, for that
matter, have the layout completely software-driven: imagine instantly
switching from a typewriter keyboard to a piano keyboard, for example,
for easy music input. Or a guitar fret for instant MIDI improvisation.


T

-- 
When you breathe, you inspire. When you don't, you expire. -- The Weekly Reader

Jun 01 2023

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 1 June 2023 at 22:04:11 UTC, H. S. Teoh wrote:
 I believe that the next step is to USB/WiFi touchscreen 
 keyboards that can be reconfigured to any symbol set by 
 software.  All we need is a long, horizontal device with a 
 touchscreen mounted on suitable support that makes it 
 comfortable to type on, then have a standard API for software 
 to configure whatever symbols it wishes the user to use on it. 
 Instantly switch to APL symbols and back, for example.  Or, for 
 that matter, have the layout completely software-driven: 
 imagine instantly switching from a typewriter keyboard to a 
 piano keyboard, for example, for easy music input. Or a guitar 
 fret for instant MIDI improvisation.

  Of course it's subjective but I strongly dislike typing on 
touchscreens and am surprised to find a programmer who prefers 
them.

Also when playing the guitar, the way the string is struck and 
the position and force of the finger on the fretboard allows 
great variation in the sound. The idea of somehow even attempting 
to simulate that on a touch screen makes me feel sad for the loss 
of virtuosity even just thinking about it :-) . Similarly for the 
loss of key weight and travel on a piano keyboard. It all reminds 
me of how the virtual world is generally supplanting the real 
world, with the massive loss that entails.

I obviously got out of bed on the wrong side today :-)

Jun 02 2023

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 1 June 2023 at 22:04:11 UTC, H. S. Teoh wrote:
 -.
 [...]

 I believe that the next step is to USB/WiFi touchscreen 
 keyboards that can be reconfigured to any symbol set by 
 software.  All we need is a long, horizontal device with a 
 touchscreen mounted on suitable support that makes it 
 comfortable to type on, then have a standard API for software 
 to configure whatever symbols it wishes the user to use on it. 
 Instantly switch to APL symbols and back, for example.  Or, for 
 that matter, have the layout completely software-driven: 
 imagine instantly switching from a typewriter keyboard to a 
 piano keyboard, for example, for easy music input. Or a guitar 
 fret for instant MIDI improvisation.


 T

Of course it's subjective but I strongly dislike typing on 
touchscreens and am surprised to find a programmer who prefers 
them.

Also when playing the guitar, the way the string is struck and 
the position and force of the finger on the fretboard allows 
great variation in the sound. The idea of somehow even attempting 
to simulate that on a touch screen makes me feel sad for the loss 
of virtuosity even just thinking about it :-) . Similarly for the 
loss of key weight and travel on a piano keyboard. It all reminds 
me of how the virtual world is generally supplanting the real 
world, with the massive loss that entails.

I obviously got out of bed on the wrong side today :-)

Jun 02 2023

Meta <jared771 gmail.com> writes:

On Friday, 2 June 2023 at 12:11:26 UTC, Abdulhaq wrote:
 On Thursday, 1 June 2023 at 22:04:11 UTC, H. S. Teoh wrote:
 -.
 [...]

 I believe that the next step is to USB/WiFi touchscreen 
 keyboards that can be reconfigured to any symbol set by 
 software.  All we need is a long, horizontal device with a 
 touchscreen mounted on suitable support that makes it 
 comfortable to type on, then have a standard API for software 
 to configure whatever symbols it wishes the user to use on it. 
 Instantly switch to APL symbols and back, for example.  Or, 
 for that matter, have the layout completely software-driven: 
 imagine instantly switching from a typewriter keyboard to a 
 piano keyboard, for example, for easy music input. Or a guitar 
 fret for instant MIDI improvisation.


 T

 Of course it's subjective but I strongly dislike typing on 
 touchscreens and am surprised to find a programmer who prefers 
 them.

 Also when playing the guitar, the way the string is struck and 
 the position and force of the finger on the fretboard allows 
 great variation in the sound. The idea of somehow even 
 attempting to simulate that on a touch screen makes me feel sad 
 for the loss of virtuosity even just thinking about it :-) . 
 Similarly for the loss of key weight and travel on a piano 
 keyboard. It all reminds me of how the virtual world is 
 generally supplanting the real world, with the massive loss 
 that entails.

 I obviously got out of bed on the wrong side today :-)

Not to mention that you can't have weighted keyboard (musical) 
keys with a touchscreen. I can't even stand physical unweighted 
keys.

Jun 02 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 6/1/2023 3:04 PM, H. S. Teoh wrote:
 I believe that the next step is to USB/WiFi touchscreen keyboards that
 can be reconfigured to any symbol set by software.  All we need is a
 long, horizontal device with a touchscreen mounted on suitable support
 that makes it comfortable to type on, then have a standard API for
 software to configure whatever symbols it wishes the user to use on it.
 Instantly switch to APL symbols and back, for example.  Or, for that
 matter, have the layout completely software-driven: imagine instantly
 switching from a typewriter keyboard to a piano keyboard, for example,
 for easy music input. Or a guitar fret for instant MIDI improvisation.

I'd prefer a regular keyboard with conventional keys - but with a display on 
each keytop that is a graphic of what the key is bound to. For example, when
you 
hit the shift key, the graphic switches to upper case.

Making this software configurable really opens things up - anything is possible 
- all while preserving touch typing.

No, I'm not going to give up tactile typing. It's so much faster than 
touchscreen keyboards. For example, when I'm transcribing text, I type while I 
read the text, and don't have to go back and forth.

I'm surprised nobody makes a keyboard like I described.

Jun 03 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/06/2023 7:22 AM, Walter Bright wrote:
 I'm surprised nobody makes a keyboard like I described.

Sort of, its pretty expensive.

https://www.elgato.com/us/en/p/stream-deck-xl

I suspect the hard part is the keycap. The controller shouldn't be too hard.

Jun 03 2023

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Sat, Jun 03, 2023 at 12:22:42PM -0700, Walter Bright via Digitalmars-d wrote:
 On 6/1/2023 3:04 PM, H. S. Teoh wrote:
 I believe that the next step is to USB/WiFi touchscreen keyboards
 that can be reconfigured to any symbol set by software.  All we need
 is a long, horizontal device with a touchscreen mounted on suitable
 support that makes it comfortable to type on, then have a standard
 API for software to configure whatever symbols it wishes the user to
 use on it.


[...]
 I'd prefer a regular keyboard with conventional keys - but with a
 display on each keytop that is a graphic of what the key is bound to.
 For example, when you hit the shift key, the graphic switches to upper
 case.

That works too.


 Making this software configurable really opens things up - anything is
 possible - all while preserving touch typing.

True -- I can't say I'm a big fan of the completely smooth and
featureless touchscreen; makes typing harder 'cos you're not sure if
your fingers are exactly on the right keys.  But nobody says we can't
use flexible touchscreens on a ridged surface that your fingers could
feel... or maybe a fabric-based surface with plastic bubbles underneath
that can be reconfigured?  That does add a whole new layer of mechanical
complexity though.  So probably not worth it.  But it's an interesting
thought.


On Sun, Jun 04, 2023 at 07:35:02AM +1200, Richard (Rikki) Andrew Cattermole via
Digitalmars-d wrote:
 On 04/06/2023 7:22 AM, Walter Bright wrote:
 I'm surprised nobody makes a keyboard like I described.

 
 Sort of, its pretty expensive.
 
 https://www.elgato.com/us/en/p/stream-deck-xl
 
 I suspect the hard part is the keycap. The controller shouldn't be too
 hard.

The controller can be exactly the same as the conventional keyboard: it
can continue sending exactly the same keycodes for each key; the
software just has to translate the keycodes into different symbols based
on what's currently displayed on the key. The keyboard itself doesn't
need to know or care.

All that's really needed is a tiny configurable pixel screen on each
keycap that can be loaded with any arbitrary graphic. It could very well
have a completely separate connection to the PC from the keyboard's
primary output.


T

-- 
There are two ways to write error-free programs; only the third one works.

Jun 03 2023

D Programming

C/C++ Programming

Other

digitalmars.D - Non-ASCII in the future in the lexer