www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - use =?UTF-8?B?wqtjaGV2cm9uc8K7?= to represent string literal

reply barbosso <barb your.io> writes:
They are in the ASCII table.
They are directional and balanced.
The D language should use them.

Just look:
```
writeln(«Hello, World!»);
```
Jan 16
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 
8859-1 aka Latin-1.

They do not fit in a single byte.

C2 AB

https://symbl.cc/en/00AB/

For us to introduce a new string syntax, it would need to do something 
that the existing ones cannot reasonably do.
Jan 16
parent reply barbosso <barb your.io> writes:
On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 They are not part of ASCII, they are part of an "extended 
 ASCII" ISO/IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.
The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 They are not part of ASCII, they are part of an "extended ASCII" ISO/ 
 IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do something 
 that the existing ones cannot reasonably do.
The extended ASCII has 8 bits, 256 distinguish characters
D files are encoded as UTF-8. Therefore it does not support extended ASCII.
Jan 16
parent reply barbosso <barb your.io> writes:
On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 They are not part of ASCII, they are part of an "extended 
 ASCII" ISO/ IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.
The extended ASCII has 8 bits, 256 distinguish characters
D files are encoded as UTF-8. Therefore it does not support extended ASCII.
Do you understand what you wrote?
Jan 16
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 They are not part of ASCII, they are part of an "extended ASCII" 
 ISO/ IEC 8859-1 aka Latin-1.

 They do not fit in a single byte.

 C2 AB

 https://symbl.cc/en/00AB/

 For us to introduce a new string syntax, it would need to do 
 something that the existing ones cannot reasonably do.
The extended ASCII has 8 bits, 256 distinguish characters
D files are encoded as UTF-8. Therefore it does not support extended ASCII.
Do you understand what you wrote?
Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.
Jan 16
parent reply barbosso <barb your.io> writes:
On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 17/01/2025 10:34 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:26:29 UTC, Richard 
 (Rikki) Andrew Cattermole wrote:
 [...]
The extended ASCII has 8 bits, 256 distinguish characters
D files are encoded as UTF-8. Therefore it does not support extended ASCII.
Do you understand what you wrote?
Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.
now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?
Jan 16
parent reply barbosso <barb your.io> writes:
On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:
 On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) 
 Andrew Cattermole wrote:
 [...]
Do you understand what you wrote?
Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.
now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?
GCC and Clang can compile identifiers with Unicode symbols.
Jan 16
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/01/2025 11:16 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:
 On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 17/01/2025 10:43 AM, barbosso wrote:
 On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 [...]
Do you understand what you wrote?
Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.
now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?
GCC and Clang can compile identifiers with Unicode symbols.
I know, I implemented D's UAX31 identifiers. Better to have the right terminology for this. However the current stance is that we have possibly too many string types. So far you have proposed new delimiters but not new behaviors (which would be required to add it).
Jan 16
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
  So far you have proposed new delimiters but not new behaviors 
 (which would be required to add it).
Directional quotes are new... simpler behavior
Jan 16
prev sibling parent IchorDev <zxinsworld gmail.com> writes:
On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 However the current stance is that we have possibly too many 
 string types. So far you have proposed new delimiters but not 
 new behaviors (which would be required to add it).
I’m personally quite happy with the choice of strings, especially the token delimited strings, token strings, and so on. I do wish token strings were more conducive to concatenation because they usually still have syntax highlighting, making them work well with rudimentary mixins, but if you need to use `{` then you can’t easily cut them off to `}~insert~q{` other text. Thankfully IES partially solves this, but it’s maybe a little overkill for most of my use-cases (fast code generation), so I’ll just put up with having my code be all one colour.
Jan 19
prev sibling next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```
https://forum.dlang.org/post/ipyynnyaszcypnzioyng forum.dlang.org
Jan 16
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```
The obvious follow-on question is, if we allow chevrons, should we also allow strings to be enclosed in: * “Non-ASCII double quotes”? * 「Chinese corner brackets」? * 《Tibetan angle brackets》? If not, why not?
Jan 16
prev sibling next sibling parent Atila Neves <atila.neves gmail.com> writes:
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:
 They are in the ASCII table.
 They are directional and balanced.
 The D language should use them.

 Just look:
 ```
 writeln(«Hello, World!»);
 ```
What would this enable/make better that isn't currently possible?
Feb 12
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
D has shied away from using grammar syntax based on characters that:

1. are not ASCII

2. are not on a standard keyboard

We also do not use things like “quotes” in syntax. They are accepted in
string 
literals, however.

D does support named character entities:

https://dlang.org/spec/entity.html

in string literals.

Any and all Unicode code points are supported in comments and string literals. 
Some are supported in identifiers, but I'm not convinced that is a good idea.
Feb 12