digitalmars.D - Why not flag away the mistakes of the past?

Taylor Hillegeist (9/9) Mar 06 2018 So i've seen on the forum over the years arguments about

FeepingCreature (8/8) Mar 07 2018 For what it's worth, I like autodecoding.
jmh530 (5/15) Mar 07 2018 That's the approach used for most things, but there's a lot of
Guillaume Piolat (5/8) Mar 07 2018 auto-decoding problem was mostly that it couldn't be @nogc since

Jonathan M Davis (21/29) Mar 07 2018 I'd actually argue that that's the lesser of the problems with

Nick Treleaven (9/12) Mar 07 2018 For me the fundamental problem is having char[] in the language

Jonathan M Davis (35/47) Mar 07 2018 In principle, char is supposed to be a UTF-8 code unit, and strings are

Guillaume Piolat (6/28) Mar 08 2018 I'd agree with you, hate the special casing. However it seems to

Jonathan M Davis (18/48) Mar 08 2018 Most everyone who debated in favor of it early on is very much against i...

Taylor Hillegeist (24/62) Mar 08 2018 I wasn't so much asking about auto-decoding in particular more

Jonathan M Davis (35/58) Mar 08 2018 Any and all changes need to be weighed for their pros and cons. No one l...

Chris (7/21) Mar 09 2018 It's aleady been said (by myself and others) that we should

H. S. Teoh (25/38) Mar 08 2018 [...]

Henrik (6/27) Mar 08 2018 Which companies are against changing this? They must be powerful
Guillaume Piolat (10/24) Mar 09 2018 I remember something a bit different last time it was discussed:

Steven Schveighoffer (8/19) Mar 07 2018 Note, autodecoding is NOT a feature of the language, but rather a

Seb (5/15) Mar 07 2018 Well, I tried that already:

H. S. Teoh (11/30) Mar 07 2018 Argh... this really struck a nerve. "Not much interest"?! I think a
Dukc (8/11) Mar 08 2018 No. The main problem with that (and the idea of using a compiler

Jon Degenhardt (9/19) Mar 07 2018 Auto-decoding is a significant issue for the applications I work

Seb (4/14) Mar 07 2018 Well you can use byCodeUnit, which disables auto-decoding

H. S. Teoh (22/33) Mar 07 2018 [...]

Gary Willoughby (2/4) Mar 09 2018 Please!!!

Jon Degenhardt (4/20) Mar 07 2018 I looked at this once. It didn't appear to be a viable solution,

Taylor Hillegeist <taylorh140 gmail.com> writes:

So i've seen on the forum over the years arguments about 
auto-decoding (mostly) and some other things. Things that have 
been considered mistakes, and cannot be corrected because of the 
breaking changes it would create. And I always wonder why not 
make a solution to the tune of a flag that makes things work as 
they used too, and make the new behavior default.

dmd --UseAutoDecoding

That way the breaking change was easily fixable, and the mistakes 
of the past not forever. Is it just the cost of maintenance?

Mar 06 2018

FeepingCreature <feepingcreature gmail.com> writes:

For what it's worth, I like autodecoding.

I worry we could be in a situation where a moderate number of 
people are strong opponents and a lot of people are weak fans, 
none of which individually care enough to post. Hopefully the D 
survey results will shed some light on this, though I don't 
remember if it was written to actually ask people's opinion of 
autodecoding or just list it as a possible issue to raise, which 
would fall into the same trap.

Mar 07 2018

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
 So i've seen on the forum over the years arguments about 
 auto-decoding (mostly) and some other things. Things that have 
 been considered mistakes, and cannot be corrected because of 
 the breaking changes it would create. And I always wonder why 
 not make a solution to the tune of a flag that makes things 
 work as they used too, and make the new behavior default.

 dmd --UseAutoDecoding

 That way the breaking change was easily fixable, and the 
 mistakes of the past not forever. Is it just the cost of 
 maintenance?

That's the approach used for most things, but there's a lot of 
things that rely on auto-decoding, so it would be a big effort to 
actually implement that.

Mar 07 2018

Guillaume Piolat <notthat email.com> writes:

On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
 That way the breaking change was easily fixable, and the 
 mistakes of the past not forever. Is it just the cost of 
 maintenance?

auto-decoding problem was mostly that it couldn't be  nogc since 
throwing, but with further releases exception throwing will get 
 nogc. So it's getting fixed.

Mar 07 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d 
wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist

 wrote:
 That way the breaking change was easily fixable, and the
 mistakes of the past not forever. Is it just the cost of
 maintenance?

 auto-decoding problem was mostly that it couldn't be  nogc since
 throwing, but with further releases exception throwing will get
  nogc. So it's getting fixed.

I'd actually argue that that's the lesser of the problems with
auto-decoding. The big problem is that it's auto-decoding. Code points are
almost always the wrong level to be operating at. The programmer needs to be
in control of whether the code is operating on code units, code points, or
graphemes, and because of auto-decoding, we have to constantly avoid using
the range primitives for arrays on strings. Tons of range-based code has to
special case for strings in order to work around auto-decoding. We're
constantly fighting our own API in order to process strings sanely and
efficiently.

IMHO,  nogc and nothrow don't matter much in comparison. Yes, it would be
nice if range-based code operating on strings were  nogc and nothrow, but
most D code really doesn't care. It uses the GC anyway, and most of the
time, no exceptions are thrown, because the strings are valid Unicode. Yes,
the fact that the range primitives for strings throw UTFExceptions instead
of using the Unicode replacement character is a problem, but that problem is
small in comparison to the problems caused by the auto-decoding itself. Even
if front and popFront used the variant of decode that used the replacement
character, auto-decoding would still be a huge problem.

- Jonathan M Davis

Mar 07 2018

Nick Treleaven <nick geany.org> writes:

On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis 
wrote:
 I'd actually argue that that's the lesser of the problems with 
 auto-decoding. The big problem is that it's auto-decoding. Code 
 points are almost always the wrong level to be operating at.

For me the fundamental problem is having char[] in the language 
at all, meaning a Unicode string. Arbitrary slicing and indexing 
are not Unicode compatible, if we revisit this we need a String 
type that doesn't support those operations. Plus the issue of 
string validation - a Unicode string type should be assumed to 
have valid contents - unsafe data should only be checked at 
string construction time, so iterating should always be nothrow.

Mar 07 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 07, 2018 13:40:20 Nick Treleaven via Digitalmars-d 
wrote:
 On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis

 wrote:
 I'd actually argue that that's the lesser of the problems with
 auto-decoding. The big problem is that it's auto-decoding. Code
 points are almost always the wrong level to be operating at.

 For me the fundamental problem is having char[] in the language
 at all, meaning a Unicode string. Arbitrary slicing and indexing
 are not Unicode compatible, if we revisit this we need a String
 type that doesn't support those operations. Plus the issue of
 string validation - a Unicode string type should be assumed to
 have valid contents - unsafe data should only be checked at
 string construction time, so iterating should always be nothrow.

In principle, char is supposed to be a UTF-8 code unit, and strings are
supposed to be validated up front rather than constantly validated, but it's
never been that way in practice.

Regardless, having char[] be sliceable is actually perfectly fine and
desirable. That's exactly what you want whenever you operate on code units,
and it's frequently the case that you want to be operating at the code unit
level. But the programmer needs to be able to reasonably control when code
units, code points, or graphemes are used, because each has their time and
place. If we had a string type, it would need to provide access to each of
those levels and likely would not be directly sliceable at all, because
slicing a string is kind of meaningless, because in principle, a string is
just on opaque piece of character data. It's when you're dealing at the code
unit, code point, or grapheme level that you actually start operating on
pieces of a string, and that means that the level that you're operating at
needs to be defined.

Having char[] be an array of code units works quite well, because then you
have efficiency by default. You then need to wrap it in another range type
when appropriate to get a range of code points or graphemes, or you need to
explicitly decode when appropriate. Whereas right now, what we have is
Phobos being "helpful" and constantly decoding for us such that we get
needlessy inefficient code, and it's at the code point level, which is
usually not the level you want to operate at. So, you don't have efficiency
or correctness.

Ultimately, it really doesn't work to hide the details of Unicode and not
have the programmer worry about code units, code points, and graphemes
unless you don't care about efficency. As such, what we really need is to
cleanly give the programmer the tools to manage Unicode without the language
or library assuming what the programmer wants - especially assuming an
inefficient default. The language itself actually does a decent job of that.
It's Phobos that dropped the ball on that one, because Andrei didn't know
about graphemes and tried to make Phobos Unicode-correct by default.
Instead, we get inefficient and incorrect by defaltu.

- Jonathan M Davis

Mar 07 2018

Guillaume Piolat <notthat email.com> writes:

On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis 
wrote:
 On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via 
 Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist

 wrote:
 That way the breaking change was easily fixable, and the 
 mistakes of the past not forever. Is it just the cost of 
 maintenance?

 auto-decoding problem was mostly that it couldn't be  nogc 
 since throwing, but with further releases exception throwing 
 will get  nogc. So it's getting fixed.

 I'd actually argue that that's the lesser of the problems with 
 auto-decoding. The big problem is that it's auto-decoding. Code 
 points are almost always the wrong level to be operating at. 
 The programmer needs to be in control of whether the code is 
 operating on code units, code points, or graphemes, and because 
 of auto-decoding, we have to constantly avoid using the range 
 primitives for arrays on strings. Tons of range-based code has 
 to special case for strings in order to work around 
 auto-decoding. We're constantly fighting our own API in order 
 to process strings sanely and efficiently.

I'd agree with you, hate the special casing. However it seems to 
me this has been debated to death already, and that auto-decoding 
was successfully advocated by Alexandrescu and al; surviving the 
controversy years ago.

Mar 08 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d 
wrote:
 On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis

 wrote:
 On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via

 Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist

 wrote:
 That way the breaking change was easily fixable, and the
 mistakes of the past not forever. Is it just the cost of
 maintenance?

 auto-decoding problem was mostly that it couldn't be  nogc
 since throwing, but with further releases exception throwing
 will get  nogc. So it's getting fixed.

 I'd actually argue that that's the lesser of the problems with
 auto-decoding. The big problem is that it's auto-decoding. Code
 points are almost always the wrong level to be operating at.
 The programmer needs to be in control of whether the code is
 operating on code units, code points, or graphemes, and because
 of auto-decoding, we have to constantly avoid using the range
 primitives for arrays on strings. Tons of range-based code has
 to special case for strings in order to work around
 auto-decoding. We're constantly fighting our own API in order
 to process strings sanely and efficiently.

 I'd agree with you, hate the special casing. However it seems to
 me this has been debated to death already, and that auto-decoding
 was successfully advocated by Alexandrescu and al; surviving the
 controversy years ago.

Most everyone who debated in favor of it early on is very much against it
now (and I'm one of them). Experience and a better understanding of Unicode
has shown it to be a terrible idea. I question that you will find any
significant contributor to Phobos who would choose to have it if we were
starting from scratch, and most of the folks who post in the newsgroup agree
with that. The problem is what to do given that we don't want it and that no
one has come up with a way to remove it without breaking tons of code in the
process or even providing a clean migration path. So, given how difficult it
is to remove at this point, you'll find disagreement about how that should
be handled ranging from deciding that we're just stuck with it to wanting to
remove it regardless of the cost. But there seems to be almost universal
agreement now (certainly among the folks who might make such a decision)
that auto-decoding was a mistake. So, there's agreement that it would
ideally go, but there isn't agreement on what we should actually do given
the situation that we're in.

- Jonathan M Davis

Mar 08 2018

Taylor Hillegeist <taylorh140 gmail.com> writes:

On Thursday, 8 March 2018 at 17:14:16 UTC, Jonathan M Davis wrote:
 On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via 
 Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis

 wrote:
 On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via

 Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor 
 Hillegeist

 wrote:
 That way the breaking change was easily fixable, and the 
 mistakes of the past not forever. Is it just the cost of 
 maintenance?

 auto-decoding problem was mostly that it couldn't be  nogc 
 since throwing, but with further releases exception 
 throwing will get  nogc. So it's getting fixed.

 I'd actually argue that that's the lesser of the problems 
 with auto-decoding. The big problem is that it's 
 auto-decoding. Code points are almost always the wrong level 
 to be operating at. The programmer needs to be in control of 
 whether the code is operating on code units, code points, or 
 graphemes, and because of auto-decoding, we have to 
 constantly avoid using the range primitives for arrays on 
 strings. Tons of range-based code has to special case for 
 strings in order to work around auto-decoding. We're 
 constantly fighting our own API in order to process strings 
 sanely and efficiently.

 I'd agree with you, hate the special casing. However it seems 
 to me this has been debated to death already, and that 
 auto-decoding was successfully advocated by Alexandrescu and 
 al; surviving the controversy years ago.

 Most everyone who debated in favor of it early on is very much 
 against it now (and I'm one of them). Experience and a better

I wasn't so much asking about auto-decoding in particular more 
about the mentality and methods of breaking changes.

In a way any change to the compiler is a breaking change when it 
comes to the configuration.

I for one never expect code to compile on the latest compiler, It 
has to be the same compiler same version for the code base to 
work as expected.

At one point I envisioned every file with a header that states 
the version of the compiler required for that module. A 
sophisticated configuration tool could take and compile each 
module with its respective version and then one could link. (this 
could very well be the worst idea ever)

I'm not saying we should be quick to change... oh noo that would 
be very bad. But after you set in the filth of your decisions 
long and hard and are certian that it is indeed bad there should 
be a plan for action and change. And when it comes to change it 
should be an evolution not a revolution.

It is good avoiding the so easily accepted mentality of legacy... 
Why do you do it that way? "It's because we've always done it 
that way."

The reason I like D is often that driven by its community it 
innovates and renovates into a language that is honestly really 
fun to use. (most of the time)

Mar 08 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, March 09, 2018 03:16:03 Taylor Hillegeist via Digitalmars-d 
wrote:

 I wasn't so much asking about auto-decoding in particular more
 about the mentality and methods of breaking changes.

 In a way any change to the compiler is a breaking change when it
 comes to the configuration.

 I for one never expect code to compile on the latest compiler, It
 has to be the same compiler same version for the code base to
 work as expected.

 At one point I envisioned every file with a header that states
 the version of the compiler required for that module. A
 sophisticated configuration tool could take and compile each
 module with its respective version and then one could link. (this
 could very well be the worst idea ever)

 I'm not saying we should be quick to change... oh noo that would
 be very bad. But after you set in the filth of your decisions
 long and hard and are certian that it is indeed bad there should
 be a plan for action and change. And when it comes to change it
 should be an evolution not a revolution.

 It is good avoiding the so easily accepted mentality of legacy...
 Why do you do it that way? "It's because we've always done it
 that way."

 The reason I like D is often that driven by its community it
 innovates and renovates into a language that is honestly really
 fun to use. (most of the time)

Any and all changes need to be weighed for their pros and cons. No one likes
it when their code breaks, and ideally, programs would work pretty much
forever without modification, but there are changes that are worth dealing
with code breakage. Part of the problem is deciding which changes are worth
it, and some of that depends on what the migration path would be. Some stuff
can be changed with minimal pain, and other stuff can't really be changed
without breaking everything. And the more D code that exists, the higher the
cost for any change. The drive to make D perfect and the need to be able to
use and rely on D code working in production without having to keep changing
it are always in conflict.

As Walter likes to say, some folks don't want you to break anything, whereas
some folks want breaking changes, and they're frequently the same people.

Ideally, any D code that you write would work permanently as-is. Also
ideally, any and all problems or pain points with D and its standard library
would be fixed. Those two things are in complete contradiction of one
another, and it's not always easy to judge how to deal with that. Sometimes,
it means that we're stuck with legacy decisions, because fixing them is too
costly, whereas other times, it means that we deprecate something, and some
of the D code out there has to be updated, or it won't compile anymore in a
release somewhere in the future. Either way, outright breaking code
immediately, with no migration process is pretty much always unacceptable.

We'll make breaking changes if we judge the gain to be worth the pain, but
we don't want to be constantly breaking people's code, and some changes are
large enough that there's arguably no justification for them, because they
would simply be too disruptive. Because of how common string processing is
and how integrated auto-decoding is into D's string processing, it is very
difficult to come up with a way to change it which isn't simply too
disruptive to be justified, even though we want to change it. So, this is a
particularly difficult case, and how we're going to end up handling it
remains to be seen. Thus far, we've mainly worked on providing better ways
to get around it, because we can do that without breaking code, whereas
actually removing it is extremely difficult.

- Jonathan M Davis

Mar 08 2018

Chris <wendlec tcd.ie> writes:

On Friday, 9 March 2018 at 06:14:05 UTC, Jonathan M Davis wrote:

 We'll make breaking changes if we judge the gain to be worth 
 the pain, but we don't want to be constantly breaking people's 
 code, and some changes are large enough that there's arguably 
 no justification for them, because they would simply be too 
 disruptive. Because of how common string processing is and how 
 integrated auto-decoding is into D's string processing, it is 
 very difficult to come up with a way to change it which isn't 
 simply too disruptive to be justified, even though we want to 
 change it. So, this is a particularly difficult case, and how 
 we're going to end up handling it remains to be seen. Thus far, 
 we've mainly worked on providing better ways to get around it, 
 because we can do that without breaking code, whereas actually 
 removing it is extremely difficult.

 - Jonathan M Davis

It's aleady been said (by myself and others) that we should 
actually try to remove it (with a compiler switch) and then see 
what happens, how much code actually breaks, and based on that 
experience we can come up with a strategy. I've already said that 
I'm willing to try it on my code (that is almost 100% string 
processing). Why not _try_ it, later we can still philosophize

Mar 09 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via Digitalmars-d
wrote:
 On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d 
 wrote:

[...]
 I'd agree with you, hate the special casing. However it seems to
 me this has been debated to death already, and that auto-decoding
 was successfully advocated by Alexandrescu and al; surviving the
 controversy years ago.

 
 Most everyone who debated in favor of it early on is very much against
 it now (and I'm one of them). Experience and a better understanding of
 Unicode has shown it to be a terrible idea. I question that you will
 find any significant contributor to Phobos who would choose to have it
 if we were starting from scratch, and most of the folks who post in
 the newsgroup agree with that.

[...]

Yeah, the only reason autodecoding survived in the beginning was because
Andrei (wrongly) thought that a Unicode code point was equivalent to a
grapheme.  If that had been the case, the cost associated with
auto-decoding may have been justifiable.  Unfortunately, that is not the
case, which greatly diminishes most of the advantages that autodecoding
was meant to have.  So it ended up being something that incurred a
significant performance hit, yet did not offer the advantages it was
supposed to.  To fully live up to Andrei's original vision, it would
have to include grapheme segmentation as well.  Unfortunately, graphemes
are of arbitrary length and cannot in general fit in a single dchar (or
any fixed-size type), and grapheme segmentation is extremely costly to
compute, so doing it by default would kill D's string manipulation
performance.

In hindsight, it was obviously a failure and a wrong design decision.
Walter is clearly against it after he learned that it comes with a hefty
performance cost, and even Andrei himself would admit today that it was
a mistake.  It's only that he, understandably, does not agree with any
change that would disrupt existing code. And that's what we're faced
with right now.


T

-- 
Frank disagreement binds closer than feigned agreement.

Mar 08 2018

Henrik <henrik nothing.com> writes:

On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
 On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via 
 Digitalmars-d wrote:
 [...]

 [...]
 [...]

 [...]

 Yeah, the only reason autodecoding survived in the beginning 
 was because Andrei (wrongly) thought that a Unicode code point 
 was equivalent to a grapheme.  If that had been the case, the 
 cost associated with auto-decoding may have been justifiable.  
 Unfortunately, that is not the case, which greatly diminishes 
 most of the advantages that autodecoding was meant to have.  So 
 it ended up being something that incurred a significant 
 performance hit, yet did not offer the advantages it was 
 supposed to.  To fully live up to Andrei's original vision, it 
 would have to include grapheme segmentation as well.  
 Unfortunately, graphemes are of arbitrary length and cannot in 
 general fit in a single dchar (or any fixed-size type), and 
 grapheme segmentation is extremely costly to compute, so doing 
 it by default would kill D's string manipulation performance.

 [...]

Which companies are against changing this? They must be powerful 
indeed if their convenience is important enough to protect so 
destructive features. Even C++ managed to give up trigraphs 
against the will of IBM. Surely D can give up something that is 
even more destructive?

Mar 08 2018

Guillaume Piolat <notthat email.com> writes:

On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
 Yeah, the only reason autodecoding survived in the beginning 
 was because Andrei (wrongly) thought that a Unicode code point 
 was equivalent to a grapheme.  If that had been the case, the 
 cost associated with auto-decoding may have been justifiable.  
 Unfortunately, that is not the case, which greatly diminishes 
 most of the advantages that autodecoding was meant to have.  So 
 it ended up being something that incurred a significant 
 performance hit, yet did not offer the advantages it was 
 supposed to.  To fully live up to Andrei's original vision, it 
 would have to include grapheme segmentation as well.  
 Unfortunately, graphemes are of arbitrary length and cannot in 
 general fit in a single dchar (or any fixed-size type), and 
 grapheme segmentation is extremely costly to compute, so doing 
 it by default would kill D's string manipulation performance.


I remember something a bit different last time it was discussed:

  - removing auto-decoding was breaking a lot of code, it's used 
in lots of place
  - performance loss could be mitigated with .byCodeUnit everytime
  - Andrei correctly advocating against breakage

Personally I do use auto-decoding, often iterating by codepoint, 
and uses it for fonts and parsers. It's correct for a large 
subset of languages. You gave us a feature and now we are using 
it ;)

Mar 09 2018

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/7/18 1:00 AM, Taylor Hillegeist wrote:
 So i've seen on the forum over the years arguments about auto-decoding 
 (mostly) and some other things. Things that have been considered 
 mistakes, and cannot be corrected because of the breaking changes it 
 would create. And I always wonder why not make a solution to the tune of 
 a flag that makes things work as they used too, and make the new 
 behavior default.
 
 dmd --UseAutoDecoding
 
 That way the breaking change was easily fixable, and the mistakes of the 
 past not forever. Is it just the cost of maintenance?

Note, autodecoding is NOT a feature of the language, but rather a 
feature of Phobos.

It would be quite interesting I think to create a modified phobos where 
autodecoding was optional and see what happens (could be switched with a 
-version=autodecoding). It wouldn't take much effort -- just take out 
the specializations for strings in std.array.

-Steve

Mar 07 2018

Seb <seb wilzba.ch> writes:

On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer 
wrote:
 On 3/7/18 1:00 AM, Taylor Hillegeist wrote:
 [...]

 Note, autodecoding is NOT a feature of the language, but rather 
 a feature of Phobos.

 It would be quite interesting I think to create a modified 
 phobos where autodecoding was optional and see what happens 
 (could be switched with a -version=autodecoding). It wouldn't 
 take much effort -- just take out the specializations for 
 strings in std.array.

 -Steve

Well, I tried that already:

https://github.com/dlang/phobos/pull/5513

In short: very easy to do, but not much interest at the time.

Mar 07 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 07, 2018 at 04:29:33PM +0000, Seb via Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer wrote:
 On 3/7/18 1:00 AM, Taylor Hillegeist wrote:
 [...]

 
 Note, autodecoding is NOT a feature of the language, but rather a
 feature of Phobos.
 
 It would be quite interesting I think to create a modified phobos
 where autodecoding was optional and see what happens (could be
 switched with a -version=autodecoding). It wouldn't take much effort
 -- just take out the specializations for strings in std.array.
 
 -Steve

 
 Well, I tried that already:
 
 https://github.com/dlang/phobos/pull/5513
 
 In short: very easy to do, but not much interest at the time.

Argh... this really struck a nerve.  "Not much interest"?!  I think a
more accurate description is every passerby going "that looks dangerous
and I don't have enough time to spare to look into it right now, so
better just leave it up to somebody else to stick their neck out and get
beheaded by Andrei later", resulting in nobody taking apparent interest
in the PR, even though many of us *really* want to see autodecoding go
the way of the dodo.


T

-- 
EMACS = Extremely Massive And Cumbersome System

Mar 07 2018

Dukc <ajieskola gmail.com> writes:

On Wednesday, 7 March 2018 at 16:29:33 UTC, Seb wrote:
 Well, I tried that already:

 https://github.com/dlang/phobos/pull/5513

 In short: very easy to do, but not much interest at the time.

No. The main problem with that (and the idea of using a compiler 
flag in general) is that it affects the whole compilation. That 
means that every single third-party library, not only Phobos, has 
to work BOTH with and without the switch.

IMO, if we find a way to enable or disable autodecoding per 
module, not per compilation, that will make deprectating it more 
than worthwhile.

Mar 08 2018

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
wrote:
 So i've seen on the forum over the years arguments about 
 auto-decoding (mostly) and some other things. Things that have 
 been considered mistakes, and cannot be corrected because of 
 the breaking changes it would create. And I always wonder why 
 not make a solution to the tune of a flag that makes things 
 work as they used too, and make the new behavior default.

 dmd --UseAutoDecoding

 That way the breaking change was easily fixable, and the 
 mistakes of the past not forever. Is it just the cost of 
 maintenance?

Auto-decoding is a significant issue for the applications I work 
on (search engines). There is a lot of string manipulation in 
these environments, and performance matters. Auto-decoding is a 
meaningful performance hit. Otherwise, Phobos has a very nice 
collection of algorithms for string manipulation. It would be 
great to have a way to turn auto-decoding off in Phobos.

--Jon

Mar 07 2018

Seb <seb wilzba.ch> writes:

On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
 wrote:
 [...]

 Auto-decoding is a significant issue for the applications I 
 work on (search engines). There is a lot of string manipulation 
 in these environments, and performance matters. Auto-decoding 
 is a meaningful performance hit. Otherwise, Phobos has a very 
 nice collection of algorithms for string manipulation. It would 
 be great to have a way to turn auto-decoding off in Phobos.

 --Jon

Well you can use byCodeUnit, which disables auto-decoding

Though it's not well-known and rather annoying to explicitly add 
it almost everywhere.

Mar 07 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Mar 07, 2018 at 04:33:25PM +0000, Seb via Digitalmars-d wrote:
 On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:

[...]
 Auto-decoding is a significant issue for the applications I work on
 (search engines). There is a lot of string manipulation in these
 environments, and performance matters. Auto-decoding is a meaningful
 performance hit. Otherwise, Phobos has a very nice collection of
 algorithms for string manipulation. It would be great to have a way
 to turn auto-decoding off in Phobos.


[...]
 Well you can use byCodeUnit, which disables auto-decoding
 
 Though it's not well-known and rather annoying to explicitly add it
 almost everywhere.

And therein lies the rub: because it's *auto* decoding, rather than just
decoding, it's implicit everywhere, adding to the performance hit
without the coder being necessarily aware of it. You have to put in the
effort to add .byCodeUnit everywhere.

Worse yet, it gives the false sense of security that you're doing
Unicode "right", when actually that is *not* true at all, because a code
point is not equal to a grapheme (what people normally know as a
"character"). But because operating at the code point level *appears* to
be correct 80% of the time, bugs in string handling often go unnoticed,
unlike operating at the code unit level, where any Unicode handling bugs
are immediately obvious as soon as your string contains non-ASCII
characters.

So you're essentially paying the price of a significant performance hit
for the dubious benefit of non-100%-correct code, but with bugs
conveniently obscured so that it's harder to notice them.

Kill autodecoding, I say. Kill it with fire!!


T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs

Mar 07 2018

Gary Willoughby <dev nomad.so> writes:

On Wednesday, 7 March 2018 at 17:11:55 UTC, H. S. Teoh wrote:
 Kill autodecoding, I say. Kill it with fire!!


 T

Please!!!

Mar 09 2018

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 7 March 2018 at 16:33:25 UTC, Seb wrote:
 On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt 
 wrote:
 On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist 
 wrote:
 [...]

 Auto-decoding is a significant issue for the applications I 
 work on (search engines). There is a lot of string 
 manipulation in these environments, and performance matters. 
 Auto-decoding is a meaningful performance hit. Otherwise, 
 Phobos has a very nice collection of algorithms for string 
 manipulation. It would be great to have a way to turn 
 auto-decoding off in Phobos.

 Well you can use byCodeUnit, which disables auto-decoding

 Though it's not well-known and rather annoying to explicitly 
 add it almost everywhere.

I looked at this once. It didn't appear to be a viable solution, 
though I forget the details. I can probably resurrect them if 
that would be helpful.

Mar 07 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Why not flag away the mistakes of the past?