digitalmars.dip.ideas - use =?UTF-8?B?wqtjaGV2cm9uc8K7?= to represent string literal

barbosso (7/7) Jan 16 They are in the ASCII table.

Richard (Rikki) Andrew Cattermole (7/7) Jan 16 They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC...

barbosso (3/10) Jan 16 The extended ASCII has 8 bits, 256 distinguish characters

Richard (Rikki) Andrew Cattermole (3/18) Jan 16 D files are encoded as UTF-8.

barbosso (3/21) Jan 16 Do you understand what you wrote?

Richard (Rikki) Andrew Cattermole (6/30) Jan 16 Yes.

barbosso (6/26) Jan 16 now I see.

barbosso (2/22) Jan 16 GCC and Clang can compile identifiers with Unicode symbols.

Richard (Rikki) Andrew Cattermole (6/31) Jan 16 I know, I implemented D's UAX31 identifiers.

monkyyy (3/5) Jan 16 Directional quotes are new... simpler behavior
IchorDev (11/14) Jan 19 I’m personally quite happy with the choice of strings, especially

monkyyy (2/9) Jan 16 https://forum.dlang.org/post/ipyynnyaszcypnzioyng@forum.dlang.org
Paul Backus (7/14) Jan 16 The obvious follow-on question is, if we allow chevrons, should
Atila Neves (2/9) Feb 12 What would this enable/make better that isn't currently possible?
Walter Bright (10/10) Feb 12 D has shied away from using grammar syntax based on characters that:

barbosso <barb your.io> writes:

They are in the ASCII table.
They are directional and balanced.
The D language should use them.

Just look:
```
writeln(«Hello, World!»);
```

Jan 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 
8859-1 aka Latin-1.

They do not fit in a single byte.

C2 AB

https://symbl.cc/en/00AB/

For us to introduce a new string syntax, it would need to do something 
that the existing ones cannot reasonably do.

Jan 16

barbosso <barb your.io> writes:

On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 They are not part of ASCII, they are part of an "extended 
 ASCII" ISO/IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.

The extended ASCII has 8 bits, 256 distinguish characters

Jan 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 They are not part of ASCII, they are part of an "extended ASCII" ISO/ 
 IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do something 
 that the existing ones cannot reasonably do.

 
 The extended ASCII has 8 bits, 256 distinguish characters

D files are encoded as UTF-8.

Therefore it does not support extended ASCII.

Jan 16

barbosso <barb your.io> writes:

On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 They are not part of ASCII, they are part of an "extended 
 ASCII" ISO/ IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.

 
 The extended ASCII has 8 bits, 256 distinguish characters

 D files are encoded as UTF-8.

 Therefore it does not support extended ASCII.

Do you understand what you wrote?

Jan 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 They are not part of ASCII, they are part of an "extended ASCII" 
 ISO/ IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.

 The extended ASCII has 8 bits, 256 distinguish characters

 D files are encoded as UTF-8.

 Therefore it does not support extended ASCII.

 
 Do you understand what you wrote?

Yes.

Extended ASCII is both a character set and an encoding.

The character set is supported as part of Unicode, the encoding is not 
supported as we use UTF-8 which conflicts on the 8th bit for the first 
byte in the code unit.

Jan 16

barbosso <barb your.io> writes:

On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard 
 (Rikki) Andrew Cattermole wrote:
 [...]

 The extended ASCII has 8 bits, 256 distinguish characters

 D files are encoded as UTF-8.

 Therefore it does not support extended ASCII.

 
 Do you understand what you wrote?

 Yes.

 Extended ASCII is both a character set and an encoding.

 The character set is supported as part of Unicode, the encoding 
 is not supported as we use UTF-8 which conflicts on the 8th bit 
 for the first byte in the code unit.


now I see.
UTF-8 use 1 byte to represent 128 characters ASCII
and 2 bytes for other characters (including «chevrons»).
So, what's the problem?

Jan 16

barbosso <barb your.io> writes:

On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:
 On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]

 
 Do you understand what you wrote?

 Yes.

 Extended ASCII is both a character set and an encoding.

 The character set is supported as part of Unicode, the 
 encoding is not supported as we use UTF-8 which conflicts on 
 the 8th bit for the first byte in the code unit.


 now I see.
 UTF-8 use 1 byte to represent 128 characters ASCII
 and 2 bytes for other characters (including «chevrons»).
 So, what's the problem?

GCC and Clang can compile identifiers with Unicode symbols.

Jan 16

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 17/01/2025 11:16 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:
 On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]

 Do you understand what you wrote?

 Yes.

 Extended ASCII is both a character set and an encoding.

 The character set is supported as part of Unicode, the encoding is 
 not supported as we use UTF-8 which conflicts on the 8th bit for the 
 first byte in the code unit.


 now I see.
 UTF-8 use 1 byte to represent 128 characters ASCII
 and 2 bytes for other characters (including «chevrons»).
 So, what's the problem?

 
 GCC and Clang can compile identifiers with Unicode symbols.

I know, I implemented D's UAX31 identifiers.

Better to have the right terminology for this.

However the current stance is that we have possibly too many string 
types. So far you have proposed new delimiters but not new behaviors 
(which would be required to add it).

Jan 16

monkyyy <crazymonkyyy gmail.com> writes:

On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
  So far you have proposed new delimiters but not new behaviors 
 (which would be required to add it).

Directional quotes are new... simpler behavior

Jan 16

IchorDev <zxinsworld gmail.com> writes:

On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 However the current stance is that we have possibly too many 
 string types. So far you have proposed new delimiters but not 
 new behaviors (which would be required to add it).

I’m personally quite happy with the choice of strings, especially 
the token delimited strings, token strings, and so on. I do wish 
token strings were more conducive to concatenation because they 
usually still have syntax highlighting, making them work well 
with rudimentary mixins, but if you need to use `{` then you 
can’t easily cut them off to `}~insert~q{` other text. Thankfully 
IES partially solves this, but it’s maybe a little overkill for 
most of my use-cases (fast code generation), so I’ll just put up 
with having my code be all one colour.

Jan 19

monkyyy <crazymonkyyy gmail.com> writes:

On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```

https://forum.dlang.org/post/ipyynnyaszcypnzioyng forum.dlang.org

Jan 16

Paul Backus <snarwin gmail.com> writes:

On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```

The obvious follow-on question is, if we allow chevrons, should 
we also allow strings to be enclosed in:

* “Non-ASCII double quotes”?
* 「Chinese corner brackets」?
* 《Tibetan angle brackets》?

If not, why not?

Jan 16

Atila Neves <atila.neves gmail.com> writes:

On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```

What would this enable/make better that isn't currently possible?

Feb 12

Walter Bright <newshound2 digitalmars.com> writes:

D has shied away from using grammar syntax based on characters that:

1. are not ASCII

2. are not on a standard keyboard

We also do not use things like “quotes” in syntax. They are accepted in
string 
literals, however.

D does support named character entities:

https://dlang.org/spec/entity.html

in string literals.

Any and all Unicode code points are supported in comments and string literals. 
Some are supported in identifiers, but I'm not convinced that is a good idea.

Feb 12

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - use =?UTF-8?B?wqtjaGV2cm9uc8K7?= to represent string literal