digitalmars.D - Dealing with Autodecode

Walter Bright (13/13) May 31 2016 It is not practical to just delete or deprecate autodecode - it is too e...

Stefan Koch (13/21) May 31 2016 So does this mean we intend to carry the auto-decoding wart with

Walter Bright (4/5) May 31 2016 Removing it from Phobos and adjusting the documentation as I suggested i...
H. S. Teoh via Digitalmars-d (20/35) May 31 2016 If we can pull off what Walter proposed, it will put us one step closer

Steven Schveighoffer (5/20) May 31 2016 I gotta be honest, if the end of this tunnel doesn't have a char[] array...
Adam D. Ruppe (33/35) May 31 2016 Yes, it is.

Walter Bright (2/4) May 31 2016 PRs please!

Jack Stouffer (2/7) Jun 01 2016 https://github.com/dlang/phobos/pull/4322
Adam D. Ruppe (6/7) Jun 01 2016 https://github.com/dlang/phobos/pull/4384

Walter Bright (9/15) Jun 01 2016 Andrei is in favor of fixing Phobos so it does not depend on autodecode....

Jonathan M Davis via Digitalmars-d (16/35) Jun 02 2016 Just pulling the trigger is far too big a breaking change, and I think t...
Adam D. Ruppe (29/35) Jun 02 2016 Putting the autodecode functions on a compiler switch (with

Andrei Alexandrescu (6/16) Jun 02 2016 The real ticket out of this is RCStr. It solves a major problem in the

deadalnix (4/12) Jun 02 2016 You start to sound like a car salesman. I know nothing about

Andrei Alexandrescu (3/13) Jun 02 2016 I assume that means overselling or false advertising. Where do either of...

deadalnix (10/13) Jun 02 2016 For SDC for instance, autodecode is a problem (in fact, it is the

Andrei Alexandrescu (5/17) Jun 02 2016 What is the issue?

H. S. Teoh via Digitalmars-d (13/25) Jun 02 2016 Same here. It's starting to sound like some unproven newfangled

Andrei Alexandrescu (8/32) Jun 02 2016 I'm sorry, this is completely ridiculous. What is unproven? Reference

Jonathan M Davis via Digitalmars-d (9/12) Jun 02 2016 Unless we're outright getting rid of string, char[], wstring, etc., RCSt...

Andrei Alexandrescu (2/4) Jun 02 2016 It does if you use it. If you don't, it doesn't. -- Andrei

Chris (20/44) Jun 02 2016 I would love to have a compiler switch and finally be able to rid

Nick Sabalausky (5/17) May 31 2016 Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with...

Seb (5/30) Jun 01 2016 How about a poll?

Seb (11/47) Jun 01 2016 Just FYI after a short period of ten hours we got the following

poliklosio (8/18) Jun 01 2016 You failed to mention that there were additional answers:

Joakim (11/30) Jun 02 2016 It has been noted many times that forum users are a small part of

poliklosio (16/28) Jun 02 2016 Its not representative but there is going to be at least some

Kirill Kryukov (57/59) May 31 2016 This.

poliklosio (14/21) Jun 01 2016 +1

default0 (41/43) May 31 2016 Agree with that very much.
Guillaume Chatelet (2/11) Jun 01 2016 +1

Andrea Fontana (3/18) Jun 01 2016 +1

Guillaume Piolat (5/13) Jun 01 2016 Please, just remove auto-decoding, any way you want. I only ever
Kagamin (13/21) Jun 01 2016 version(autodecode_migration)

Jonathan M Davis via Digitalmars-d (16/30) May 31 2016 The other critical thing is to make sure that Phobos in general works wi...

Walter Bright (4/8) May 31 2016 That was posted 3 months ago. No PR to fix it (though it likely is an ea...

Brad Roberts via Digitalmars-d (4/14) May 31 2016 I hope that wasn't a serious question. The answer is trivial. The rate...

tsbockman (4/5) May 31 2016 Speaking of which:

Jack Stouffer (2/7) Jun 01 2016 https://github.com/dlang/phobos/pull/4390

H. S. Teoh via Digitalmars-d (11/16) May 31 2016 This is an example of current Phobos code assuming (sometimes
Jacob Carlborg (4/10) May 31 2016 Don't you get the same behavior using byDchar as with autodecode?

Walter Bright (3/6) Jun 01 2016 Yes (except that byDchar returns the replacement char on invalid Unicode...

Andrei Alexandrescu (14/29) Jun 01 2016 (Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and

Chris (4/19) Jun 01 2016 How would the transition look like? How would it affect existing
Jonathan M Davis via Digitalmars-d (16/26) Jun 01 2016 Since when has it been the goal to get rid of GC-allocated strings? We

Andrei Alexandrescu (3/5) Jun 01 2016 You'll always want to use it. The small string optimization will make it...

Timon Gehr (4/9) Jun 01 2016 - Why is it dependent on the allocation strategy or on the type of the d...

Andrei Alexandrescu (7/18) Jun 01 2016 I agree cases can be created in which straight arrays do sometimes

Timon Gehr (5/19) Jun 01 2016 The small string optimization also works for GC-allocated strings. Why

Andrei Alexandrescu (3/7) Jun 01 2016 Point taken, thanks. Mine was that you can't (reasonably) use the SSO if...

Jonathan M Davis via Digitalmars-d (8/13) Jun 02 2016 Well, ref-counting vs GC aside, optimizations like that are actually

Jon Degenhardt (39/43) Jun 07 2016 Hopefully my perspective on auto-decoding topic is useful rather

FilippoR (6/6) Jun 07 2016 It's possible to add a new alias bstring immutable(ubyte)[]

Walter Bright <newshound2 digitalmars.com> writes:

It is not practical to just delete or deprecate autodecode - it is too embedded 
into things. What we can do, however, is stop using it ourselves and stop 
relying on it in the documentation, much like [] is eschewed in favor of 
std::vector in C++.

The way to deal with it is to replace reliance on autodecode with .byDchar 
(.byDchar has a bonus of not throwing an exception on invalid UTF, but using
the 
replacement dchar instead.)

To that end, and this will be an incremental process:

1. Temporarily break autodecode such that using it will cause a compile error. 
Then, see what breaks in Phobos and fix those to use .byDchar

2. Change examples in the documentation and the Phobos examples to use .byDchar

3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when 
dealing with ranges/arrays of characters to make it clear what is happening.

May 31 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it 
 is too embedded into things.

Which Things ?
 The way to deal with it is to replace reliance on autodecode 
 with .byDchar (.byDchar has a bonus of not throwing an 
 exception on invalid UTF, but using the replacement dchar 
 instead.)

 To that end, and this will be an incremental process:
 ....

So does this mean we intend to carry the auto-decoding wart with 
us into the future. And telling everyone :
"The oblivious way is broken we just have it for backwards 
compatibility ?"

To come back to c++ [] vs. std.vector.

The actually have valid reasons; mainly c compatibility.
To keep [] as a pointer.

I believe As of now D is still flexible enough to make a radical 
change.
We cannot keep putting this off!

It is only going to get harder to remove it.

May 31 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2016 5:56 PM, Stefan Koch wrote:
 It is only going to get harder to remove it.

Removing it from Phobos and adjusting the documentation as I suggested is the 
way forward regardless. If we can't get that done, how can we tell our users 
they have to do the same to their code?

May 31 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Jun 01, 2016 at 12:56:03AM +0000, Stefan Koch via Digitalmars-d wrote:
 On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is
 too embedded into things.

 Which Things ?

 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)

 
 To that end, and this will be an incremental process:
 ....

 
 So does this mean we intend to carry the auto-decoding wart with us
 into the future. And telling everyone :
 "The oblivious way is broken we just have it for backwards
 compatibility ?"

If we can pull off what Walter proposed, it will put us one step closer
to killing autodecode for good. Killing autodecode today is very drastic
and unwise to do in one fell swoop. I see .byDchar as a first step.

First it's introduced as an optional feature so that people can start
using it. We promote its usage everywhere.

Then we make it a deprecation to *not* use it, perhaps with a migration
compiler switch so that people are not forced to migrate immediately,
but they are warned beforehand.

After enough time elapses, the compiler switch becomes the default, with
an option to disable it if the user so chooses.

Then after another while the switch is removed and using .byDchar
becomes required.

Finally, autodecoding is relegated to the dustbin of history and there
will be much rejoicing. :-P  I will personally savor every moment of
pressing the delete-line command in my editor while making the PR to
finally kill off the last of the autodecoding code.


T

-- 
Famous last words: I *think* this will work...

May 31 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/31/16 8:46 PM, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
 when dealing with ranges/arrays of characters to make it clear what is
 happening.

I gotta be honest, if the end of this tunnel doesn't have a char[] array 
which acts like an array in all circumstances, I see little point in 
changing anything.

-Steve

May 31 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode

Yes, it is.

We need to stop holding on to the mistakes of the past. 9 of 10 
dentists agree that autodecoding is a mistake. Not just WAS a 
mistake, IS a mistake. It has ongoing cost. If we don't fix our 
attitude about these problems, we are going to turn into that 
very demon we despise, yea, even the next C++!

And that's not a good thing.

 To that end, and this will be an incremental process:

I have a better one, that we discussed on IRC last night:

1) put the string overloads for front and popFront on a version 
switch:

version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy code, 
would you like to know more?")
/* existing popFront here */


At the same time, make sure the various byWhatever functions and 
structs are easily available.

Our preliminary investigation found about 130 places in Phobos 
that need to be changed. That's not hard to fix! The static 
assert(0) version tells you the top-level call that triggered it. 
You go there, you add .byDchar or whatever, and recompile, it 
just works, migration achieved. Or better yet, you think about 
your code and fix it properly, boom, code quality improved.

D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!


2) After a while, we swap the version conditions, so opting into 
it preserves the old behavior for a while.

3) A wee bit longer, we exterminate all this autodecoding crap 
and enjoy Phobos being a smaller, more efficient library.

May 31 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:
 Our preliminary investigation found about 130 places in Phobos that need to be
 changed. That's not hard to fix!

PRs please!

May 31 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:
 Our preliminary investigation found about 130 places in Phobos 
 that need to be
 changed. That's not hard to fix!

 PRs please!

https://github.com/dlang/phobos/pull/4322

Jun 01 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!

https://github.com/dlang/phobos/pull/4384

You'll notice it is closed.

Now, that one wasn't meant to be merged anyway, but Andrei seems 
to have zero interest in actually accepting the change. That 
doesn't encourage further work.

Jun 01 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 6/1/2016 8:51 PM, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!

 https://github.com/dlang/phobos/pull/4384

 You'll notice it is closed.

 Now, that one wasn't meant to be merged anyway, but Andrei seems to have zero
 interest in actually accepting the change. That doesn't encourage further work.

Andrei is in favor of fixing Phobos so it does not depend on autodecode. He is, 
however, rightfully concerned about the extent of breakage that would happen if 
autocode were removed. So am I.

Interestingly, when I tried to remove autodecoding from path/file code a couple 
years ago, I received quite a bit of resistance. It seems there's been a 
tectonic shift in opinion on autodecode.

What I'd like to see, that we all agree on, is progress in removing autodecode 
reliance from Phobos. Let's see what it takes.

Jun 01 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, June 01, 2016 21:42:49 Walter Bright via Digitalmars-d wrote:
 On 6/1/2016 8:51 PM, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!

 https://github.com/dlang/phobos/pull/4384

 You'll notice it is closed.

 Now, that one wasn't meant to be merged anyway, but Andrei seems to have
 zero interest in actually accepting the change. That doesn't encourage
 further work.

 Andrei is in favor of fixing Phobos so it does not depend on autodecode. He
 is, however, rightfully concerned about the extent of breakage that would
 happen if autocode were removed. So am I.

Just pulling the trigger is far too big a breaking change, and I think that
a number of us who are strongly in favor of getting rid of auto-decoding
agree with you on that. At some point, we may very well need to decide
between breaking code and permanently carrying this technical debt, but the
first thing to do is to work towards mitigating the breaking changes as much
as possible. Then at least, if we do break code, the impact is much lower.

 Interestingly, when I tried to remove autodecoding from path/file code a
 couple years ago, I received quite a bit of resistance. It seems there's
 been a tectonic shift in opinion on autodecode.

A number of us very much liked the idea at first, but part of the problem
was that a lot of us didn't understand Unicode well enough at the time. And
as we've come to better understand it, we've seen how poor a design decision
it is to auto-decode. Also, the number of questions and complaints that
we've had to field over time with regards to auto-decoding has helped
highlight how problematic it is from a usability standpoint. So, most of us
who didn't understand well enough up front have learned better.

 What I'd like to see, that we all agree on, is progress in removing
 autodecode reliance from Phobos. Let's see what it takes.

Agreed.

- Jonathan M Davis

Jun 02 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on 
 autodecode.

Putting the autodecode functions on a compiler switch (with 
-version) is the most straightforward way to achieve that.

We'd have a transition period where people can keep the existing 
behavior or throw the switch and get compile errors - with a 
dead-simple "just add .byCodePoint on this line" fix - to migrate 
their code.

Phobos would be fixed in a day. Everyone else would have up to a 
couple years to fix their code (which, again, is as simple as 
throwing a compiler switch and mechanically adding .byCodePoint* 
where the static asserts tell you to) as we work through the slow 
deprecation cycle.

But then, we'd have light at the end of the tunnel: after this 
deprecation cycle completes, we can kill hundreds of lines of 
confusing, worthless functions. Existing functions that don't 
work with ranges of chars will be able to without trouble. 
Newbies will never again ask "wtf" when they see string.whatever 
yielding dchar[].

* Or byGrapheme or .byCodeUnit or whatever if you want to take 
the time to actually fix the fundamental question of the code, 
but just slapping .byCodePoint in there reverts to the same 
behavior of autodecode.

 Interestingly, when I tried to remove autodecoding from 
 path/file code a couple years ago, I received quite a bit of 
 resistance. It seems there's been a tectonic shift in opinion 
 on autodecode.

Quite a few of us were incompetent ignoramuses on the topic of 
Unicode years ago. That's where the autodecoding mistake came 
from: people smart enough to know UTF-8 from UTF-32, but not 
smart enough to know the real world application of Unicode.

It's one thing to make a mistake. Everyone does that sometimes, 
and nobody is born knowing complex issues. What matters is if 
you're willing to learn new information and correct your errors.

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/02/2016 09:34 AM, Adam D. Ruppe wrote:
 On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on autodecode.

 Putting the autodecode functions on a compiler switch (with -version) is
 the most straightforward way to achieve that.

 We'd have a transition period where people can keep the existing
 behavior or throw the switch and get compile errors - with a dead-simple
 "just add .byCodePoint on this line" fix - to migrate their code.

That is not going to happen.

 It's one thing to make a mistake. Everyone does that sometimes, and
 nobody is born knowing complex issues. What matters is if you're willing
 to learn new information and correct your errors.

The real ticket out of this is RCStr. It solves a major problem in the 
language (compulsive GC) and also a minor occasional annoyance 
(autodecoding).


Andrei

Jun 02 2016

deadalnix <deadalnix gmail.com> writes:

On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu 
wrote:
 It's one thing to make a mistake. Everyone does that 
 sometimes, and
 nobody is born knowing complex issues. What matters is if 
 you're willing
 to learn new information and correct your errors.

 The real ticket out of this is RCStr. It solves a major problem 
 in the language (compulsive GC) and also a minor occasional 
 annoyance (autodecoding).

You start to sound like a car salesman. I know nothing about 
RCStr, but I'm already starting to resent it.

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/02/2016 10:44 AM, deadalnix wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes, and
 nobody is born knowing complex issues. What matters is if you're willing
 to learn new information and correct your errors.

 The real ticket out of this is RCStr. It solves a major problem in the
 language (compulsive GC) and also a minor occasional annoyance
 (autodecoding).

 You start to sound like a car salesman.

I assume that means overselling or false advertising. Where do either of 
these happen? -- Andrei

Jun 02 2016

deadalnix <deadalnix gmail.com> writes:

On Thursday, 2 June 2016 at 15:03:34 UTC, Andrei Alexandrescu 
wrote:
 You start to sound like a car salesman.

 I assume that means overselling or false advertising. Where do 
 either of these happen? -- Andrei

For SDC for instance, autodecode is a problem (in fact, it is the 
very reason I abandoned making the lexer usable as a standalone) 
while RCStr would not help one bit as string are pretty much 
never manipulated directly anywhere.

More generally, using RCStr is at best sidestepping the issue 
rather than solving it.

On the GC side of the issue, I think there are also 
overstatements.

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/02/2016 11:14 AM, deadalnix wrote:
 On Thursday, 2 June 2016 at 15:03:34 UTC, Andrei Alexandrescu wrote:
 You start to sound like a car salesman.

 I assume that means overselling or false advertising. Where do either
 of these happen? -- Andrei

 For SDC for instance, autodecode is a problem (in fact, it is the very
 reason I abandoned making the lexer usable as a standalone) while RCStr
 would not help one bit as string are pretty much never manipulated
 directly anywhere.

Well I'm not sure how SDC works.

 More generally, using RCStr is at best sidestepping the issue rather
 than solving it.

What is the issue?

 On the GC side of the issue, I think there are also overstatements.

What are those?


Andrei

Jun 02 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Jun 02, 2016 at 02:44:21PM +0000, deadalnix via Digitalmars-d wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes,
 and nobody is born knowing complex issues. What matters is if
 you're willing to learn new information and correct your errors.

 
 The real ticket out of this is RCStr. It solves a major problem in
 the language (compulsive GC) and also a minor occasional annoyance
 (autodecoding).
 

 
 You start to sound like a car salesman. I know nothing about RCStr,
 but I'm already starting to resent it.

Same here. It's starting to sound like some unproven newfangled
contraption designed to please the GC-phobic crowd who believe that RC
is the answer to life, the universe, and everything, and who may not
actually adopt D even after we've broken our backs bending over
backwards for them.  (And with a subject like "our sister", this RCStr
business does not sound very appealing at all.)  Whatever happened to
improving *current* string handling for *current* users?

It's making forking Phobos look like a less distant possibility than I
had anticipated. :-(


T

-- 
People say I'm arrogant, and I'm proud of it.

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/02/2016 11:03 AM, H. S. Teoh via Digitalmars-d wrote:
 On Thu, Jun 02, 2016 at 02:44:21PM +0000, deadalnix via Digitalmars-d wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes,
 and nobody is born knowing complex issues. What matters is if
 you're willing to learn new information and correct your errors.

 The real ticket out of this is RCStr. It solves a major problem in
 the language (compulsive GC) and also a minor occasional annoyance
 (autodecoding).

 You start to sound like a car salesman. I know nothing about RCStr,
 but I'm already starting to resent it.

 Same here. It's starting to sound like some unproven newfangled
 contraption designed to please the GC-phobic crowd who believe that RC
 is the answer to life, the universe, and everything, and who may not
 actually adopt D even after we've broken our backs bending over
 backwards for them.

I'm sorry, this is completely ridiculous. What is unproven? Reference 
counting is a long-standing success story for string handling. I'm using 
it because it's good, not to woo users.

 (And with a subject like "our sister", this RCStr
 business does not sound very appealing at all.)

I'm glad this is mentioned as one of the issues with RCStr.

 Whatever happened to
 improving *current* string handling for *current* users?

RCStr will improve string handling for current users.

 It's making forking Phobos look like a less distant possibility than I
 had anticipated. :-(

So you'd fork Phobos because... it adds a good string type?


Andrei

Jun 02 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thursday, June 02, 2016 10:29:28 Andrei Alexandrescu via Digitalmars-d 
wrote:
 The real ticket out of this is RCStr. It solves a major problem in the
 language (compulsive GC) and also a minor occasional annoyance
 (autodecoding).

Unless we're outright getting rid of string, char[], wstring, etc., RCStr
clearly doesn't solve the auto-decoding problem. It will allow a lot of code
to sidestep it, but the existing types will continue to exist and be used
and have to deal with auto-decoding. And every function that works on
strings that cares about efficiency is going to have to continue to special
case strings to avoid auto-decoding.

- Jonathan M Davis

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/02/2016 11:26 AM, Jonathan M Davis via Digitalmars-d wrote:
 Unless we're outright getting rid of string, char[], wstring, etc., RCStr
 clearly doesn't solve the auto-decoding problem.

It does if you use it. If you don't, it doesn't. -- Andrei

Jun 02 2016

Chris <wendlec tcd.ie> writes:

On Thursday, 2 June 2016 at 13:34:18 UTC, Adam D. Ruppe wrote:
 On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on 
 autodecode.

 Putting the autodecode functions on a compiler switch (with 
 -version) is the most straightforward way to achieve that.

 We'd have a transition period where people can keep the 
 existing behavior or throw the switch and get compile errors - 
 with a dead-simple "just add .byCodePoint on this line" fix - 
 to migrate their code.

 Phobos would be fixed in a day. Everyone else would have up to 
 a couple years to fix their code (which, again, is as simple as 
 throwing a compiler switch and mechanically adding 
 .byCodePoint* where the static asserts tell you to) as we work 
 through the slow deprecation cycle.

 But then, we'd have light at the end of the tunnel: after this 
 deprecation cycle completes, we can kill hundreds of lines of 
 confusing, worthless functions. Existing functions that don't 
 work with ranges of chars will be able to without trouble. 
 Newbies will never again ask "wtf" when they see 
 string.whatever yielding dchar[].

 * Or byGrapheme or .byCodeUnit or whatever if you want to take 
 the time to actually fix the fundamental question of the code, 
 but just slapping .byCodePoint in there reverts to the same 
 behavior of autodecode.

I would love to have a compiler switch and finally be able to rid 
my code of auto decoding [1], once and for all - and get a free 
performance boost. There's so much talk about code that _might_ 
break, when we don't even know how much code would actually 
break. It's absurd, we remain inert out of fear of the unknown, 
while it would be pretty easy to just test it and find out 
(std.path is actually a precedence). And it wouldn't even be a 
breaking change in the sense that you cannot go on developing 
with D's latest version because you're stuck with a stone age 
version of dmd forever.

Much in the same vein, I don't know, if we should make the 
question of auto decode dependent on RCString. This will take at 
least another few months of bikeshedding, while what we need to 
do is get rid (or start to get rid) of auto decode right now - 
and maybe this process will teach us something that will later be 
useful when implementing RCString.

[1] As I already mentioned here
http://forum.dlang.org/post/yzeiqvphrqdcmaxaspvx forum.dlang.org

[snip]

Jun 02 2016

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy code, would
 you like to know more?")
 /* existing popFront here */

I vote we use Adam's exact verbiage, too! :)

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with 
C++.

 3) A wee bit longer, we exterminate all this autodecoding crap and enjoy
 Phobos being a smaller, more efficient library.

Yay! Profit!

May 31 2016

Seb <seb wilzba.ch> writes:

On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky wrote:
 On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would
 you like to know more?")
 /* existing popFront here */

 I vote we use Adam's exact verbiage, too! :)

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 Yes. This. If I wanted an endless bucket of baggage, I'd have 
 stuck with C++.

 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy
 Phobos being a smaller, more efficient library.

 Yay! Profit!


How about a poll?

http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d

Results are shown after casting a vote or here:
http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view

Jun 01 2016

Seb <seb wilzba.ch> writes:

On Wednesday, 1 June 2016 at 11:42:06 UTC, Seb wrote:
 On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky 
 wrote:
 On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would
 you like to know more?")
 /* existing popFront here */

 I vote we use Adam's exact verbiage, too! :)

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 Yes. This. If I wanted an endless bucket of baggage, I'd have 
 stuck with C++.

 3) A wee bit longer, we exterminate all this autodecoding 
 crap and enjoy
 Phobos being a smaller, more efficient library.

 Yay! Profit!


 How about a poll?

 http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d

 Results are shown after casting a vote or here:
 http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view

Just FYI after a short period of ten hours we got the following 
45 responses:

Yes, with fire! (hobby user)                                      
     77% (35)
Yeah remove that special behavior (professional user)             
     35% (16)
Wait that is what auto decoding is? wah ugh...                    
     8%  (4)
I don't always decode codeunits, but when I do I use byDChar 
already  6%  (3)

Jun 01 2016

poliklosio <poliklosio happypizza.com> writes:

On Thursday, 2 June 2016 at 00:14:30 UTC, Seb wrote:

 Just FYI after a short period of ten hours we got the following 
 45 responses:

 Yes, with fire! (hobby user)
     77% (35)
 Yeah remove that special behavior (professional user)
     35% (16)
 Wait that is what auto decoding is? wah ugh...
     8%  (4)
 I don't always decode codeunits, but when I do I use byDChar 
 already  6%  (3)

You failed to mention that there were additional answers:

Auto-decoding is great!
0% (0)
No, please don't break my code.
0% (0)

I think those zeroes are actually the most important part of the 
results. :)

Jun 01 2016

Joakim <dlang joakim.fea.st> writes:

On Thursday, 2 June 2016 at 06:53:49 UTC, poliklosio wrote:
 On Thursday, 2 June 2016 at 00:14:30 UTC, Seb wrote:

 Just FYI after a short period of ten hours we got the 
 following 45 responses:

 Yes, with fire! (hobby user)
     77% (35)
 Yeah remove that special behavior (professional user)
     35% (16)
 Wait that is what auto decoding is? wah ugh...
     8%  (4)
 I don't always decode codeunits, but when I do I use byDChar 
 already  6%  (3)

 You failed to mention that there were additional answers:

 Auto-decoding is great!
 0% (0)
 No, please don't break my code.
 0% (0)

 I think those zeroes are actually the most important part of 
 the results. :)

It has been noted many times that forum users are a small part of 
the D userbase, likely the ones who are the most interested in 
evolving the language and thus biased towards changes.  As a 
forum user myself, I'm in that group too and agree with Walter 
that D programmers should be guided by Phobos to explicitly 
declare what level of decoding they want, but this poll may not 
be representative of the wider userbase.

We'll likely only find out what they think once we're a couple 
dmd releases into these changes, as Walter found when he 
submitted PRs for file/path code sometime back.

Jun 02 2016

poliklosio <poliklosio happypizza.com> writes:

On Thursday, 2 June 2016 at 07:21:28 UTC, Joakim wrote:
 On Thursday, 2 June 2016 at 06:53:49 UTC, poliklosio wrote:
 (...)

 It has been noted many times that forum users are a small part 
 of the D userbase, likely the ones who are the most interested 
 in evolving the language and thus biased towards changes.  As a 
 forum user myself, I'm in that group too and agree with Walter 
 that D programmers should be guided by Phobos to explicitly 
 declare what level of decoding they want, but this poll may not 
 be representative of the wider userbase.

 We'll likely only find out what they think once we're a couple 
 dmd releases into these changes, as Walter found when he 
 submitted PRs for file/path code sometime back.

Its not representative but there is going to be at least some 
weak correlation between the forum and proffesional world. We are 
developers after all. Out of 16 proffesional users none selected 
"Please, don't break my code" option, which tells that there is 
some hope that a change wouldn't be that damaging. Of course 
further investigation would be needed to confirm that hypothesis. 
But at least we didn't prove that such investigation is a waste 
of time.

Also, on the issue of wanting/not wanting autodecoding as a 
feature (ignoring the code breakage issue) 0 out of 55 people 
actually want autodecoding. I think its improbable that most 
users outside the forum would have the opposite view. You would 
have at least some of this refrected in the poll.

So the poll does tell something, you just have to know how not to 
overinterpret the results. :)

Jun 02 2016

Kirill Kryukov <kkryukov gmail.com> writes:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

This.

I only recently started full scale use of D, but I lurked here 
for years. D has a few quirks here and there, but overall it's a 
fantastic language. However the biggest putting off factor for me 
is the attitude of the leadership towards fixing the issues and 
completing the language.

The idea of autodecoding is very natural to appear for someone 
who only recently discovered Unicode. Whoa, instead of code pages 
we now have "unicode code points". Great. Only much later the 
person realizes that working with code points isn't always 
correct. So I don't blame anyone for designing/implementing 
autodecoding years ago. But. Not acknowledging that autodecoding 
is seriously wrong now, looks like a complete brain damage.

The entire community seems united in the view that autodecoding 
is both slow and usually wrong. The users are begging for this 
breaking change. There's a number of approaches about handling 
the deprecation. Even the code that for some reason really needs 
to work with code points will benefit from explicitly stating 
that it needs code points. But no we must endure this madness 
forever.

I realize that priorities of a language user might be different 
from those of a language leadership. With fixed (removed) 
autodecoding the user gets a cleaner language. Their program will 
work faster and is easier to reason about. User's brain cycles 
are not wasted for useless crap like working around autodecoding.

On the other hand, the language/stdlib designer now has to admit 
their initial design was sub-optimal. Their books and articles 
are now obsolete. And they will be the ones who receive 
complaints from the inevitable few upset with the change.

However keeping the current situation means for me personally: 1. 
Not switching to D wholesale, but just toying with it. 2. Even 
when using D for work I don't want to talk about it to others. I 
was seriously thinking about starting a D-learning seminar at 
work, and I still might, but the thought that autodecoding is 
going to stay is cooling my enthusiasm.

I just did a numerical app in D, where it shines, I think. 
However much of my work code is dealing with huge texts. I don't 
want to fight with autodecode at every step. I'd like arrays of 
chars be arrays of chars without any magic crap auto-inserted 
behind my back. I don't want to become an expert in avoiding 
language pitfalls (The reason I abandoned C++ years ago). I also 
don't want to re-implement the staple string processing routines 
(though I might, if at least the language constructs work without 
autodecode, which seems not the case here).

Think about it. 99% of code working with code points is _broken_ 
anyway. (In the sense, that the usual assumption is that code 
point represents a character, while in fact it does not). When 
working with code units, the developer will notice the problem 
right away. When working with code points, the problem is not 
apparent until years later (essentially what happened to D 
itself).

Feel free to ignore my non-D-core-dev comment. Even though I 
suspect many D users may agree with me. An even larger number of 
potential D users does not want autodecoding either.

Thanks,
Kirill

May 31 2016

poliklosio <poliklosio happypizza.com> writes:

On Wednesday, 1 June 2016 at 05:46:29 UTC, Kirill Kryukov wrote:
 On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 This.
 (...)
 I don't want to become an expert in avoiding language pitfalls 
 (The reason I abandoned C++ years ago).

+1
If you have too many pitfalls in the language, its not easier to 
learn than C++, just different (regardless of the maximum 
productivity you have when using the language, that's another 
issue).
The worst case is you just want to use ASCII text and suddenly 
you have to spend weeks reading a ton of confusing stuff about 
Unicode, D and autodecoding, just to know how to use char[] 
correctly in D.
Compare that to how trivial it is to process ASCII text in, say, 
C++.
And processing just plain ASCII is a very common case, e.g. 
processing textual logs from tools.

Jun 01 2016

default0 <Kevin.Labschek gmx.de> writes:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

Agree with that very much.
Yes, you still have to think about cost/benefit for breaking 
changes, but in general when I sign up for D I expect it to throw 
out mistakes of the past so long as the correction of them is 
worth the cost of breakage.

So the cost of breakage for autodecoding is that the behaviour of 
roughly all string handling code changes. Now most of this string 
handling code was broken to begin with since VERY VERY VERY 
little string handling code ever cares about code points.
This means the code that is actually broken in terms of being 
buggy after the change when it wasn't buggy before is probably 
not a lot.
The other cost of breakage is to force a user to go through 
potentially thousands of LoC and update their string handling 
code. Personally, I find that cost dramatically reduced if there 
are two prerequisites met: Compiler Errors everywhere we have 
relied on the feature before (we can apparently do that, so 
check) and error/deprecation messages detailed enough to go into 
further reading so I can make meaningful decisions about it (we 
can also do that, I am sure, so check). If I just have to hop 
from one compiler error to the next and fix my broken code with 
confidence after having read about the context for 30-60 minutes, 
even going through vast amounts of code is not actually that big 
of a deal since you really only have to inspect a fraction of it 
(the fraction the compiler tells you about).
Another cost is if we have unmaintained 3rd party libraries, when 
we actually make the change the default in the future, they will 
stop compiling on recent compiler versions. I suppose a tool 
could be made tracking the specific compiler errors and simply 
using .byDchar to make the code "just work" exactly the way it 
used to work (ie unreliably, slowly and with bugs in string 
handling) before the change.

The cost of backwards-compatibility is also two-fold from what I 
can see:
-We will continue to be inefficient and waste time autodecoding 
by default (mobile users are going to be especially happy about 
that).
-By default, string handling code is still broken, just more 
subtly, meaning more string handling bugs in D code make it to 
production

May 31 2016

Guillaume Chatelet <chatelet.guillaume gmail.com> writes:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 I have a better one, that we discussed on IRC last night:

 1) put the string overloads for front and popFront on a version 
 switch:

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 2) After a while, we swap the version conditions, so opting 
 into it preserves the old behavior for a while.

 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy Phobos being a smaller, more efficient library.

+1

Jun 01 2016

Andrea Fontana <nospam example.com> writes:

On Wednesday, 1 June 2016 at 08:21:36 UTC, Guillaume Chatelet 
wrote:
 On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 I have a better one, that we discussed on IRC last night:

 1) put the string overloads for front and popFront on a 
 version switch:

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 2) After a while, we swap the version conditions, so opting 
 into it preserves the old behavior for a while.

 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy Phobos being a smaller, more efficient library.

 +1

+1

Jun 01 2016

Guillaume Piolat <first.last gmail.com> writes:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode

 Yes, it is.

 We need to stop holding on to the mistakes of the past. 9 of 10 
 dentists agree that autodecoding is a mistake. Not just WAS a 
 mistake, IS a mistake. It has ongoing cost. If we don't fix our 
 attitude about these problems, we are going to turn into that 
 very demon we despise, yea, even the next C++!

Please, just remove auto-decoding, any way you want. I only ever 
used it once or twice voluntarily. It's a special case that must 
go.
Maybe with a flag like for -vtls.

Jun 01 2016

Kagamin <spam here.lot> writes:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would you like to know more?")
 /* existing popFront here */

version(autodecode_migration)
deprecated("autodecode attempted, use byDchar instead")
alias popFront=_d_popFront;
else
alias popFront=_d_popFront;

void _d_popFront(T)(ref T t) if(isSomeString!T) {
/* existing popFront here */
}

The migration branch should compile and work or template 
constraints will silently fail. Then deprecation messages can be 
grepped. That said does compiler print deprecation messages 
triggered inside template constraints?

Jun 01 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tuesday, May 31, 2016 17:46:04 Walter Bright via Digitalmars-d wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it ourselves
 and stop relying on it in the documentation, much like [] is eschewed in
 favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with .byDchar
 (.byDchar has a bonus of not throwing an exception on invalid UTF, but using
 the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when
 dealing with ranges/arrays of characters to make it clear what is happening.

The other critical thing is to make sure that Phobos in general works with
byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
trying to use byCodeUnit instead of naked strings, I ran into this:

https://issues.dlang.org/show_bug.cgi?id=15800

But once Phobos no longer relies on autodecoding except maybe in places
where we can't actually excise it completely without breaking code (and
hopefully there are none of those), then we can look at how feasible the
full removal of auto-decoding really is. IMHO, leaving it in is a _huge_
piece of technical debt that we don't want and probably can't afford, so I
really don't think that we should just assume that we can't remove it due to
the breakage that it would cause. But we definitely have work to do before
we can have Phobos in a state where it's reasonable to even make an attempt.
byCodeUnit and friends were a good start, but we need to make it so that
they're treated as first-class citizens, and they're not right now.

- Jonathan M Davis

May 31 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:
 The other critical thing is to make sure that Phobos in general works with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800

That was posted 3 months ago. No PR to fix it (though it likely is an easy
fix). 
If we can't get these things fixed in Phobos, how can we tell everyone else to 
fix their code?

May 31 2016

Brad Roberts via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 5/31/2016 7:40 PM, Walter Bright via Digitalmars-d wrote:
 On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:
 The other critical thing is to make sure that Phobos in general works
 with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800

 That was posted 3 months ago. No PR to fix it (though it likely is an
 easy fix). If we can't get these things fixed in Phobos, how can we tell
 everyone else to fix their code?

I hope that wasn't a serious question.  The answer is trivial.  The rate 
of incoming bug reports exceeds the rate of bug fixing which exceeds the 
rate of fix pulling.  Has since about the dawn of time.

May 31 2016

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 1 June 2016 at 02:58:36 UTC, Brad Roberts wrote:
 ...the rate of bug fixing which exceeds the rate of fix pulling.

Speaking of which:
     https://github.com/dlang/phobos/pull/4345
     https://github.com/dlang/phobos/pull/3973

May 31 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 1 June 2016 at 02:28:04 UTC, Jonathan M Davis wrote:
 The other critical thing is to make sure that Phobos in general 
 works with byDChar, byCodeUnit, etc. For instance, pretty much 
 as soon as I started trying to use byCodeUnit instead of naked 
 strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800

https://github.com/dlang/phobos/pull/4390

Jun 01 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, May 31, 2016 at 07:28:04PM -0700, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 The other critical thing is to make sure that Phobos in general works with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:
 
 https://issues.dlang.org/show_bug.cgi?id=15800

This is an example of current Phobos code assuming (sometimes
implicitly) that strings are ranges of dchar, which leads to subtle
breakage like this one:

	https://issues.dlang.org/show_bug.cgi?id=15972


T

-- 
"640K ought to be enough" -- Bill G. (allegedly), 1984.
"The Internet is not a primary goal for PC usage" -- Bill G., 1995.
"Linux has no impact on Microsoft's strategy" -- Bill G., 1999.

May 31 2016

Jacob Carlborg <doob me.com> writes:

On 2016-06-01 02:46, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar

Don't you get the same behavior using byDchar as with autodecode?

-- 
/Jacob Carlborg

May 31 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 5/31/2016 11:57 PM, Jacob Carlborg wrote:
 The way to deal with it is to replace reliance on autodecode with
 .byDchar

 Don't you get the same behavior using byDchar as with autodecode?


Yes (except that byDchar returns the replacement char on invalid Unicode, while 
autodecode throws an exception). But the point is that byDchar is opt-in.

Jun 01 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 05/31/2016 08:46 PM, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
 when dealing with ranges/arrays of characters to make it clear what is
 happening.

(Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and 
byCodePoint stay as they are.)

4. Rally behind RCStr as the preferred string type of the D language. 
RCStr manages its own memory, is fast, and has the right interface (i.e. 
offers several views for iteration without an implicit one, doesn't 
throw on invalid code points, etc).

This is the key component. We get rid of GC-backed strings, which is 
part of the crucial goal for D we need to achieve, and reap the benefit 
of a better design as a perk. Breaking existing code does not have the 
right benefit for the cost.

Let's keep the eyes on the ball, folks. We want to rid D of the GC. 
That's the prize.


Andrei

Jun 01 2016

Chris <wendlec tcd.ie> writes:

On Wednesday, 1 June 2016 at 12:14:06 UTC, Andrei Alexandrescu 
wrote:
 On 05/31/2016 08:46 PM, Walter Bright wrote:

 (Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and 
 byCodePoint stay as they are.)

 4. Rally behind RCStr as the preferred string type of the D 
 language. RCStr manages its own memory, is fast, and has the 
 right interface (i.e. offers several views for iteration 
 without an implicit one, doesn't throw on invalid code points, 
 etc).

 This is the key component. We get rid of GC-backed strings, 
 which is part of the crucial goal for D we need to achieve, and 
 reap the benefit of a better design as a perk. Breaking 
 existing code does not have the right benefit for the cost.

 Let's keep the eyes on the ball, folks. We want to rid D of the 
 GC. That's the prize.


 Andrei

How would the transition look like? How would it affect existing 
code, like e.g.  `countUntil`, `.length` etc.?

Jun 01 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, June 01, 2016 08:14:06 Andrei Alexandrescu via Digitalmars-d 
wrote:
 4. Rally behind RCStr as the preferred string type of the D language.
 RCStr manages its own memory, is fast, and has the right interface (i.e.
 offers several views for iteration without an implicit one, doesn't
 throw on invalid code points, etc).

 This is the key component. We get rid of GC-backed strings, which is
 part of the crucial goal for D we need to achieve, and reap the benefit
 of a better design as a perk. Breaking existing code does not have the
 right benefit for the cost.

 Let's keep the eyes on the ball, folks. We want to rid D of the GC.
 That's the prize.

Since when has it been the goal to get rid of GC-allocated strings? We
definitely want an alternative to GC-allocated strings for code that can't
afford to use the GC, but auto-decoding issues aside, why would I want to
use RCString instead of string if the GC isn't a problem for my program?
Walter pointed out at dconf that using a GC is often faster than reference
counting; it's just that it can incur a large cost at once when a collection
is run, whereas the cost of ref-counting is amortized across the time that
the program is running.

I expect that RCString will be very important for us going forward, but I
don't see much reason to use it as the default string type in code over just
using string except for the fact that we have the auto-decoding mess to deal
with. It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.

- Jonathan M Davis

Jun 01 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.

You'll always want to use it. The small string optimization will make it 
compelling for all applications. -- Andrei

Jun 01 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.

 You'll always want to use it. The small string optimization will make it
 compelling for all applications. -- Andrei


- Why is it dependent on the allocation strategy or on the type of the data?

- It seems to be a pessimization if I'm taking a lot of small slices.

- It is undesirable if I later want to reference-compare those slices.

Jun 01 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/01/2016 04:28 PM, Timon Gehr wrote:
 On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.

 You'll always want to use it. The small string optimization will make it
 compelling for all applications. -- Andrei


 - Why is it dependent on the allocation strategy or on the type of the
 data?

Not getting this.

 - It seems to be a pessimization if I'm taking a lot of small slices.

I agree cases can be created in which straight arrays do sometimes 
better. They are rare and far between - for strings, the small string 
optimization is to live by.

 - It is undesirable if I later want to reference-compare those slices.

Arrays will still be usable.


Andrei

Jun 01 2016

Timon Gehr <timon.gehr gmx.ch> writes:

On 01.06.2016 22:43, Andrei Alexandrescu wrote:
 On 06/01/2016 04:28 PM, Timon Gehr wrote:
 On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.

 You'll always want to use it. The small string optimization will make it
 compelling for all applications. -- Andrei


 - Why is it dependent on the allocation strategy or on the type of the
 data?

 Not getting this.
 ...

The small string optimization also works for GC-allocated strings. Why 
do I always want to use RCString instead of the corresponding GCString?
(Also, the same approach can be applied to other arrays with value 
semantics.)

Jun 01 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/01/2016 05:03 PM, Timon Gehr wrote:
 The small string optimization also works for GC-allocated strings. Why
 do I always want to use RCString instead of the corresponding GCString?
 (Also, the same approach can be applied to other arrays with value
 semantics.)

Point taken, thanks. Mine was that you can't (reasonably) use the SSO if 
you commit to represent strings as bare slices. -- Andrei

Jun 01 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wednesday, June 01, 2016 11:30:02 Andrei Alexandrescu via Digitalmars-d 
wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.

 You'll always want to use it. The small string optimization will make it
 compelling for all applications. -- Andrei

Well, ref-counting vs GC aside, optimizations like that are actually
something that can clearly make a user-defined type for strings worth using
over naked arrays of code units, whereas it's far less clear that having a
user-defined type for strings because of Unicode-related issues actually
buys us much.

- Jonathan M Davis

Jun 02 2016

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it 
 is too embedded into things. What we can do, however, is stop 
 using it ourselves and stop relying on it in the documentation, 
 much like [] is eschewed in favor of std::vector in C++.

Hopefully my perspective on auto-decoding topic is useful rather 
than disruptive. I work on search applications, both run-time 
engines and data science. Processing multi-lingual text is an 
important aspect of these applications. There are a couple issues 
with D's current auto-decoding implementation for these 
applications.

One is lack of control over error handling when encountering 
corrupt utf-8 text. Real world data contains corrupt utf-8 
sequences, robust applications need to handle them. Proper 
handling is generally application specific. Both replacement 
character and throwing exceptions are useful behaviors, but the 
ability to choose between them is often necessary. At present, 
this behavior is built into the low-level primitives, without 
application control. Notably, 'front' and 'popFront' have 
different behaviors. This is also a consideration for explicitly 
invoked decoding facilities like 'byUTF'.

Another is performance. Iteration triggering auto-decoding is 
apparently an order of magnitude more costly than iteration 
without decoding. This is too large a delta when the algorithm 
doesn't require decoding. (Such algorithms are common.) Frankly, 
I'm surprised the cost is so large. It wouldn't surprise me to 
find out it's partly a compiler artifact, but it doesn't matter.

As to what to do about it - if changing currently built-in auto 
decoding is not an option, then perhaps providing parallel 
facilities that don't auto-decode would do the trick. RCStr would 
seem a real opportunity. Perhaps a raw array of utf-8 code units 
ala ubyte[] that doesn't get auto-decoded? With either, explicit 
decoding would be needed to invoke standard library routines 
operating on unicode code points or graphemes. (Sounds like 
interaction with character literals could still be an issue, as 
the actual representation is not obvious.) Having a consistent 
set of error handling options for explicit decoding facilities 
would be helpful as well.

Another possibility would be support for detecting inadvertent 
auto-decoding. D has very nice support for ensuring or detecting 
code properties (eg. ' nogc', '-vgc' compiler option). If there 
was a way to identify code triggering auto-decoding, that would 
be useful.

Jun 07 2016

FilippoR <ics_fight hotmail.com> writes:

It's possible to add a new      alias bstring immutable(ubyte)[]

a new literal postfix (ustring s = "test string"b) or UFCS  
(ustring s = "test string".b)

add UFCS byCodePoint byGrapheme

and add overload function in phobos where necessary

so we can have a autodecode free string

Jun 07 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Dealing with Autodecode