digitalmars.dip.ideas - use =?UTF-8?B?wqtjaGV2cm9uc8K7?= to represent string literal
- barbosso (7/7) Jan 16 They are in the ASCII table.
- Richard (Rikki) Andrew Cattermole (7/7) Jan 16 They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC...
- barbosso (3/10) Jan 16 The extended ASCII has 8 bits, 256 distinguish characters
- Richard (Rikki) Andrew Cattermole (3/18) Jan 16 D files are encoded as UTF-8.
- barbosso (3/21) Jan 16 Do you understand what you wrote?
- Richard (Rikki) Andrew Cattermole (6/30) Jan 16 Yes.
- barbosso (6/26) Jan 16 now I see.
- barbosso (2/22) Jan 16 GCC and Clang can compile identifiers with Unicode symbols.
- Richard (Rikki) Andrew Cattermole (6/31) Jan 16 I know, I implemented D's UAX31 identifiers.
- monkyyy (2/9) Jan 16 https://forum.dlang.org/post/ipyynnyaszcypnzioyng@forum.dlang.org
- Paul Backus (7/14) Jan 16 The obvious follow-on question is, if we allow chevrons, should
- Atila Neves (2/9) Feb 12 What would this enable/make better that isn't currently possible?
- Walter Bright (10/10) Feb 12 D has shied away from using grammar syntax based on characters that:
They are in the ASCII table. They are directional and balanced. The D language should use them. Just look: ``` writeln(«Hello, World!»); ```
Jan 16
They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.
Jan 16
On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote:They are not part of ASCII, they are part of an "extended ASCII" ISO/IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
On 17/01/2025 10:34 AM, barbosso wrote:On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote:D files are encoded as UTF-8. Therefore it does not support extended ASCII.They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:On 17/01/2025 10:34 AM, barbosso wrote:Do you understand what you wrote?On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote:D files are encoded as UTF-8. Therefore it does not support extended ASCII.They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
On 17/01/2025 10:43 AM, barbosso wrote:On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.On 17/01/2025 10:34 AM, barbosso wrote:Do you understand what you wrote?On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote:D files are encoded as UTF-8. Therefore it does not support extended ASCII.They are not part of ASCII, they are part of an "extended ASCII" ISO/ IEC 8859-1 aka Latin-1. They do not fit in a single byte. C2 AB https://symbl.cc/en/00AB/ For us to introduce a new string syntax, it would need to do something that the existing ones cannot reasonably do.The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote:On 17/01/2025 10:43 AM, barbosso wrote:now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.On 17/01/2025 10:34 AM, barbosso wrote:Do you understand what you wrote?On Thursday, 16 January 2025 at 21:26:29 UTC, Richard (Rikki) Andrew Cattermole wrote:D files are encoded as UTF-8. Therefore it does not support extended ASCII.[...]The extended ASCII has 8 bits, 256 distinguish characters
Jan 16
On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote:GCC and Clang can compile identifiers with Unicode symbols.On 17/01/2025 10:43 AM, barbosso wrote:now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.[...]Do you understand what you wrote?
Jan 16
On 17/01/2025 11:16 AM, barbosso wrote:On Thursday, 16 January 2025 at 22:03:25 UTC, barbosso wrote:I know, I implemented D's UAX31 identifiers. Better to have the right terminology for this. However the current stance is that we have possibly too many string types. So far you have proposed new delimiters but not new behaviors (which would be required to add it).On Thursday, 16 January 2025 at 21:45:39 UTC, Richard (Rikki) Andrew Cattermole wrote:GCC and Clang can compile identifiers with Unicode symbols.On 17/01/2025 10:43 AM, barbosso wrote:now I see. UTF-8 use 1 byte to represent 128 characters ASCII and 2 bytes for other characters (including «chevrons»). So, what's the problem?On Thursday, 16 January 2025 at 21:38:50 UTC, Richard (Rikki) Andrew Cattermole wrote:Yes. Extended ASCII is both a character set and an encoding. The character set is supported as part of Unicode, the encoding is not supported as we use UTF-8 which conflicts on the 8th bit for the first byte in the code unit.[...]Do you understand what you wrote?
Jan 16
On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) Andrew Cattermole wrote:So far you have proposed new delimiters but not new behaviors (which would be required to add it).Directional quotes are new... simpler behavior
Jan 16
On Thursday, 16 January 2025 at 22:21:54 UTC, Richard (Rikki) Andrew Cattermole wrote:However the current stance is that we have possibly too many string types. So far you have proposed new delimiters but not new behaviors (which would be required to add it).I’m personally quite happy with the choice of strings, especially the token delimited strings, token strings, and so on. I do wish token strings were more conducive to concatenation because they usually still have syntax highlighting, making them work well with rudimentary mixins, but if you need to use `{` then you can’t easily cut them off to `}~insert~q{` other text. Thankfully IES partially solves this, but it’s maybe a little overkill for most of my use-cases (fast code generation), so I’ll just put up with having my code be all one colour.
Jan 19
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:They are in the ASCII table. They are directional and balanced. The D language should use them. Just look: ``` writeln(«Hello, World!»); ```https://forum.dlang.org/post/ipyynnyaszcypnzioyng forum.dlang.org
Jan 16
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:They are in the ASCII table. They are directional and balanced. The D language should use them. Just look: ``` writeln(«Hello, World!»); ```The obvious follow-on question is, if we allow chevrons, should we also allow strings to be enclosed in: * “Non-ASCII double quotes”? * 「Chinese corner brackets」? * 《Tibetan angle brackets》? If not, why not?
Jan 16
On Thursday, 16 January 2025 at 21:19:34 UTC, barbosso wrote:They are in the ASCII table. They are directional and balanced. The D language should use them. Just look: ``` writeln(«Hello, World!»); ```What would this enable/make better that isn't currently possible?
Feb 12
D has shied away from using grammar syntax based on characters that: 1. are not ASCII 2. are not on a standard keyboard We also do not use things like “quotes” in syntax. They are accepted in string literals, however. D does support named character entities: https://dlang.org/spec/entity.html in string literals. Any and all Unicode code points are supported in comments and string literals. Some are supported in identifiers, but I'm not convinced that is a good idea.
Feb 12