digitalmars.dip.ideas - directional quotes

monkyyy (16/16) Aug 12 2024 Ascii deprecated several marks of english grammar to fit into 7

Richard (Rikki) Andrew Cattermole (8/11) Aug 12 2024 So basically backtick?

Timon Gehr (2/8) Aug 13 2024 No, this nests. Backticks don't nest.

Quirin Schroll (50/66) Aug 13 2024 I’m 80% sure this is trolling.

Richard (Rikki) Andrew Cattermole (3/13) Aug 13 2024 All D code is expected to be in normal form C, eliminating this issue.

Quirin Schroll (9/24) Aug 13 2024 Again, compilers making assumptions.

Richard (Rikki) Andrew Cattermole (8/34) Aug 13 2024 Well no, he was weighing it very low against the cost to performance or

monkyyy (12/20) Aug 13 2024 Directional quotes are officially part of the closest thing

monkyyy <crazymonkyyy gmail.com> writes:

Ascii deprecated several marks of english grammar to fit into 7 
bits, one of these features was the directional quotes and so c 
had to make strings with single quotes and rules about escaping. 
We are no longer c and its no longer the 60's.

Imagine making a 1 char typo of escape characters when making a 
deeply nested strings for mixins.

I'd suggest "heavy double comma" as its visibly distinct in 3 
monospace fonts I checked

  ❝ ❞

U+275D U+275E

I believe all directional quote schemes will require users to add 
custom xmodmap to type or ide plugins so I believe monospace font 
behavoir so be the primary concern.

A directional quoted string should have the simplest parsing rule 
of it counts up on U+275D and down at U+275E and returns when its 
0; all other escapes and characters are ignored.

Aug 12 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 13/08/2024 6:30 AM, monkyyy wrote:
 A directional quoted string should have the simplest parsing rule of it 
 counts up on U+275D and down at U+275E and returns when its 0; all other 
 escapes and characters are ignored.

So basically backtick?

```d
import std.stdio;

void main() {
     writeln(`\"`); // \"
}
```

Aug 12 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/12/24 21:51, Richard (Rikki) Andrew Cattermole wrote:
 On 13/08/2024 6:30 AM, monkyyy wrote:
 A directional quoted string should have the simplest parsing rule of 
 it counts up on U+275D and down at U+275E and returns when its 0; all 
 other escapes and characters are ignored.

 
 So basically backtick?

No, this nests. Backticks don't nest.

Aug 13 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Monday, 12 August 2024 at 18:30:02 UTC, monkyyy wrote:
 Ascii deprecated several marks of english grammar to fit into 7 
 bits, one of these features was the directional quotes and so c 
 had to make strings with single quotes and rules about 
 escaping. We are no longer c and its no longer the 60's.

 Imagine making a 1 char typo of escape characters when making a 
 deeply nested strings for mixins.

In that case, use `iq{}` strings.

 I'd suggest "heavy double comma" as its visibly distinct in 3 
 monospace fonts I checked

  ❝ ❞

 U+275D U+275E

 I believe all directional quote schemes will require users to 
 add custom xmodmap to type or ide plugins so I believe 
 monospace font behavoir so be the primary concern.

 A directional quoted string should have the simplest parsing 
 rule of it counts up on U+275D and down at U+275E and returns 
 when its 0; all other escapes and characters are ignored.

I’m 80% sure this is trolling.

D already has delimited strings: `q"(abc(")adb")"`. It’s hard to 
believe you’ll ever run into a case where all of the four 
delimiters `()`, `[]`, `{}`, `<>` will be in the string in an 
unbalanced way.

But that doesn’t even convey how bad this idea is, if you think 
it through.

Not all fonts have U+275D and U+275E, not even close. You’d be 
much better suited with chevrons (`«»`), as those are reasonably 
supported by fonts because chevrons are standard in French. 
Generally, you can’t expect fonts having more than the basic 
ASCII characters. Even those that have, they might not be 
visually distinct enough. There’s a reason D only has `10L` and 
not `10l` as literals, even if on most monospace fonts, `l`, `I`, 
and `1` are distinct enough. IMO, allowing anything non-ASCII in 
D code (except for comments) is an error and will trip people up. 
I have run into issues of C++ compilers making assumptions what 
the input and output encoding is. I work for a German company and 
all our error messages are in German. You won’t find any literal 
Ää, Öö, Üü, ß in our codebase; those are all `\u00FC` for ü etc. 
and they’re in `u8""` literals.

Proponents’ best arguments are: “Why not” and “Some words look 
like slurs when using ASCII replacements”. Too bad. I’m 
confronted with BS and ASS daily (which stand for *balance sheet* 
and *assets,* to be clear), and it’s funny initially and then you 
just get used to it.

They never had to debug code because in some string literal, 
there was Unicode nonsense like a soft hyphen, which made it 
unequal to every string it was compared to. Best thing is, 
printing the string to Windows’s CMD removes the soft hyphen!

With ASCII, what strings are equal and which aren’t is obvious. 
With Unicode, it’s some special circle of hell:
```d
// This compiles:
void main()
{
     int ä = 0;
     int ä = 1;
}
```

Maybe I’m overly conservative, but I can tell you, it’s not out 
of spite, it’s just from real, non-hypothetical experience. 
Probably, people who live and work in the US have little to no 
experience with those kinds of issues. UK folk basically only 
because £ (U+00A3) is non-ASCII.

Don’t get me wrong, I love typographically correct quotes. I have 
them on my keyboard and use them everywhere it makes sense. It 
makes sense for forum posts, but not for code.

Aug 13 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/08/2024 3:42 AM, Quirin Schroll wrote:
 With ASCII, what strings are equal and which aren’t is obvious. With 
 Unicode, it’s some special circle of hell:
 
 |// This compiles: void main() { int ä = 0; int ä = 1; } |
 
 Maybe I’m overly conservative, but I can tell you, it’s not out of 
 spite, it’s just from real, non-hypothetical experience. Probably, 
 people who live and work in the US have little to no experience with 
 those kinds of issues. UK folk basically only because £ (U+00A3) is 
 non-ASCII.

All D code is expected to be in normal form C, eliminating this issue.

Compiler doesn't give you any help on this, Walter didn't want it.

Aug 13 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Tuesday, 13 August 2024 at 15:46:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 14/08/2024 3:42 AM, Quirin Schroll wrote:
 With ASCII, what strings are equal and which aren’t is 
 obvious. With Unicode, it’s some special circle of hell:
 
 |// This compiles: void main() { int ä = 0; int ä = 1; } |
 
 Maybe I’m overly conservative, but I can tell you, it’s not 
 out of spite, it’s just from real, non-hypothetical 
 experience. Probably, people who live and work in the US have 
 little to no experience with those kinds of issues. UK folk 
 basically only because £ (U+00A3) is non-ASCII.

 All D code is expected to be in normal form C, eliminating this 
 issue.

Again, compilers making assumptions.

My bet is most programmers don’t know what a UTF normal form is, 
but use the keys on their keyboard and Ctrl+C, Ctrl-V stuff. It’s 
so niche, Wikipedia has an article about 
normalization/equivalence in < 10 languages, none of which is 
Spanish.

 Compiler doesn't give you any help on this, Walter didn't want 
 it.

The issue is Walter being too good at avoiding rookie mistakes.

Aug 13 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/08/2024 4:37 AM, Quirin Schroll wrote:
 On Tuesday, 13 August 2024 at 15:46:01 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 14/08/2024 3:42 AM, Quirin Schroll wrote:
 With ASCII, what strings are equal and which aren’t is obvious. With 
 Unicode, it’s some special circle of hell:

 |// This compiles: void main() { int ä = 0; int ä = 1; } |

 Maybe I’m overly conservative, but I can tell you, it’s not out of 
 spite, it’s just from real, non-hypothetical experience. Probably, 
 people who live and work in the US have little to no experience with 
 those kinds of issues. UK folk basically only because £ (U+00A3) is 
 non-ASCII.

 All D code is expected to be in normal form C, eliminating this issue.

 
 Again, compilers making assumptions.
 
 My bet is most programmers don’t know what a UTF normal form is, but use 
 the keys on their keyboard and Ctrl+C, Ctrl-V stuff. It’s so niche, 
 Wikipedia has an article about normalization/equivalence in < 10 
 languages, none of which is Spanish.
 
 Compiler doesn't give you any help on this, Walter didn't want it.

 
 The issue is Walter being too good at avoiding rookie mistakes.

Well no, he was weighing it very low against the cost to performance or 
added complexity. He doesn't interact with Unicode all that much.

If you have experiences in other languages, especially with team members 
from other languages please feel free to ask Mike to arrange a meeting 
to talk to him about it.

I ended up dropping it, because in practice an IDE like IntelliJ already 
normalizes as you type, so I'm weighing it a lot further down than I'd like.

Aug 13 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Tuesday, 13 August 2024 at 15:42:15 UTC, Quirin Schroll wrote:

 Not all fonts have U+275D and U+275E, not even close.

Can you name one?

 much better suited with chevrons (`«»`), French.

Directional quotes are officially part of the closest thing 
English has to a authority; the newspapers style guides; I think 
your drastically underestimating the support for it in 
native-english because internet-english of course changed to fit 
the keyboard

 won’t find any literal Ää, Öö, Üü, ß in our codebase; those are 
 all `\u00FC` for ü etc. and they’re in `u8""` literals.

Diacritics are a separate debate no? Im unaware of any char 
sequences that combine into quotation marks

 With ASCII, what strings are equal and which aren’t is obvious.

Windows endlines, terminal escapes, the extended character set, 
rare control characters

 I love typographically correct quotes.
 but
 I’m 80% sure this is trolling.

Its great to have your support

Aug 13 2024

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - directional quotes