www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dealing with Autodecode

reply Walter Bright <newshound2 digitalmars.com> writes:
It is not practical to just delete or deprecate autodecode - it is too embedded 
into things. What we can do, however, is stop using it ourselves and stop 
relying on it in the documentation, much like [] is eschewed in favor of 
std::vector in C++.

The way to deal with it is to replace reliance on autodecode with .byDchar 
(.byDchar has a bonus of not throwing an exception on invalid UTF, but using
the 
replacement dchar instead.)

To that end, and this will be an incremental process:

1. Temporarily break autodecode such that using it will cause a compile error. 
Then, see what breaks in Phobos and fix those to use .byDchar

2. Change examples in the documentation and the Phobos examples to use .byDchar

3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when 
dealing with ranges/arrays of characters to make it clear what is happening.
May 31 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it 
 is too embedded into things.
Which Things ?
 The way to deal with it is to replace reliance on autodecode 
 with .byDchar (.byDchar has a bonus of not throwing an 
 exception on invalid UTF, but using the replacement dchar 
 instead.)
 To that end, and this will be an incremental process:
 ....
So does this mean we intend to carry the auto-decoding wart with us into the future. And telling everyone : "The oblivious way is broken we just have it for backwards compatibility ?" To come back to c++ [] vs. std.vector. The actually have valid reasons; mainly c compatibility. To keep [] as a pointer. I believe As of now D is still flexible enough to make a radical change. We cannot keep putting this off! It is only going to get harder to remove it.
May 31 2016
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/31/2016 5:56 PM, Stefan Koch wrote:
 It is only going to get harder to remove it.
Removing it from Phobos and adjusting the documentation as I suggested is the way forward regardless. If we can't get that done, how can we tell our users they have to do the same to their code?
May 31 2016
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Jun 01, 2016 at 12:56:03AM +0000, Stefan Koch via Digitalmars-d wrote:
 On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is
 too embedded into things.
Which Things ?
 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)
 To that end, and this will be an incremental process:
 ....
So does this mean we intend to carry the auto-decoding wart with us into the future. And telling everyone : "The oblivious way is broken we just have it for backwards compatibility ?"
If we can pull off what Walter proposed, it will put us one step closer to killing autodecode for good. Killing autodecode today is very drastic and unwise to do in one fell swoop. I see .byDchar as a first step. First it's introduced as an optional feature so that people can start using it. We promote its usage everywhere. Then we make it a deprecation to *not* use it, perhaps with a migration compiler switch so that people are not forced to migrate immediately, but they are warned beforehand. After enough time elapses, the compiler switch becomes the default, with an option to disable it if the user so chooses. Then after another while the switch is removed and using .byDchar becomes required. Finally, autodecoding is relegated to the dustbin of history and there will be much rejoicing. :-P I will personally savor every moment of pressing the delete-line command in my editor while making the PR to finally kill off the last of the autodecoding code. T -- Famous last words: I *think* this will work...
May 31 2016
prev sibling next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/31/16 8:46 PM, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
 when dealing with ranges/arrays of characters to make it clear what is
 happening.
I gotta be honest, if the end of this tunnel doesn't have a char[] array which acts like an array in all circumstances, I see little point in changing anything. -Steve
May 31 2016
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode
Yes, it is. We need to stop holding on to the mistakes of the past. 9 of 10 dentists agree that autodecoding is a mistake. Not just WAS a mistake, IS a mistake. It has ongoing cost. If we don't fix our attitude about these problems, we are going to turn into that very demon we despise, yea, even the next C++! And that's not a good thing.
 To that end, and this will be an incremental process:
I have a better one, that we discussed on IRC last night: 1) put the string overloads for front and popFront on a version switch: version(string_migration) deprecated void popFront(T)(ref T t) if(isSomeString!T) { static assert(0, "this is crap, fix your code."); } else deprecated("use -versionstring_migration to fix your buggy code, would you like to know more?") /* existing popFront here */ At the same time, make sure the various byWhatever functions and structs are easily available. Our preliminary investigation found about 130 places in Phobos that need to be changed. That's not hard to fix! The static assert(0) version tells you the top-level call that triggered it. You go there, you add .byDchar or whatever, and recompile, it just works, migration achieved. Or better yet, you think about your code and fix it properly, boom, code quality improved. D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!! 2) After a while, we swap the version conditions, so opting into it preserves the old behavior for a while. 3) A wee bit longer, we exterminate all this autodecoding crap and enjoy Phobos being a smaller, more efficient library.
May 31 2016
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:
 Our preliminary investigation found about 130 places in Phobos that need to be
 changed. That's not hard to fix!
PRs please!
May 31 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:
 Our preliminary investigation found about 130 places in Phobos 
 that need to be
 changed. That's not hard to fix!
PRs please!
https://github.com/dlang/phobos/pull/4322
Jun 01 2016
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!
https://github.com/dlang/phobos/pull/4384 You'll notice it is closed. Now, that one wasn't meant to be merged anyway, but Andrei seems to have zero interest in actually accepting the change. That doesn't encourage further work.
Jun 01 2016
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/1/2016 8:51 PM, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!
https://github.com/dlang/phobos/pull/4384 You'll notice it is closed. Now, that one wasn't meant to be merged anyway, but Andrei seems to have zero interest in actually accepting the change. That doesn't encourage further work.
Andrei is in favor of fixing Phobos so it does not depend on autodecode. He is, however, rightfully concerned about the extent of breakage that would happen if autocode were removed. So am I. Interestingly, when I tried to remove autodecoding from path/file code a couple years ago, I received quite a bit of resistance. It seems there's been a tectonic shift in opinion on autodecode. What I'd like to see, that we all agree on, is progress in removing autodecode reliance from Phobos. Let's see what it takes.
Jun 01 2016
next sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, June 01, 2016 21:42:49 Walter Bright via Digitalmars-d wrote:
 On 6/1/2016 8:51 PM, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:
 PRs please!
https://github.com/dlang/phobos/pull/4384 You'll notice it is closed. Now, that one wasn't meant to be merged anyway, but Andrei seems to have zero interest in actually accepting the change. That doesn't encourage further work.
Andrei is in favor of fixing Phobos so it does not depend on autodecode. He is, however, rightfully concerned about the extent of breakage that would happen if autocode were removed. So am I.
Just pulling the trigger is far too big a breaking change, and I think that a number of us who are strongly in favor of getting rid of auto-decoding agree with you on that. At some point, we may very well need to decide between breaking code and permanently carrying this technical debt, but the first thing to do is to work towards mitigating the breaking changes as much as possible. Then at least, if we do break code, the impact is much lower.
 Interestingly, when I tried to remove autodecoding from path/file code a
 couple years ago, I received quite a bit of resistance. It seems there's
 been a tectonic shift in opinion on autodecode.
A number of us very much liked the idea at first, but part of the problem was that a lot of us didn't understand Unicode well enough at the time. And as we've come to better understand it, we've seen how poor a design decision it is to auto-decode. Also, the number of questions and complaints that we've had to field over time with regards to auto-decoding has helped highlight how problematic it is from a usability standpoint. So, most of us who didn't understand well enough up front have learned better.
 What I'd like to see, that we all agree on, is progress in removing
 autodecode reliance from Phobos. Let's see what it takes.
Agreed. - Jonathan M Davis
Jun 02 2016
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on 
 autodecode.
Putting the autodecode functions on a compiler switch (with -version) is the most straightforward way to achieve that. We'd have a transition period where people can keep the existing behavior or throw the switch and get compile errors - with a dead-simple "just add .byCodePoint on this line" fix - to migrate their code. Phobos would be fixed in a day. Everyone else would have up to a couple years to fix their code (which, again, is as simple as throwing a compiler switch and mechanically adding .byCodePoint* where the static asserts tell you to) as we work through the slow deprecation cycle. But then, we'd have light at the end of the tunnel: after this deprecation cycle completes, we can kill hundreds of lines of confusing, worthless functions. Existing functions that don't work with ranges of chars will be able to without trouble. Newbies will never again ask "wtf" when they see string.whatever yielding dchar[]. * Or byGrapheme or .byCodeUnit or whatever if you want to take the time to actually fix the fundamental question of the code, but just slapping .byCodePoint in there reverts to the same behavior of autodecode.
 Interestingly, when I tried to remove autodecoding from 
 path/file code a couple years ago, I received quite a bit of 
 resistance. It seems there's been a tectonic shift in opinion 
 on autodecode.
Quite a few of us were incompetent ignoramuses on the topic of Unicode years ago. That's where the autodecoding mistake came from: people smart enough to know UTF-8 from UTF-32, but not smart enough to know the real world application of Unicode. It's one thing to make a mistake. Everyone does that sometimes, and nobody is born knowing complex issues. What matters is if you're willing to learn new information and correct your errors.
Jun 02 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/02/2016 09:34 AM, Adam D. Ruppe wrote:
 On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on autodecode.
Putting the autodecode functions on a compiler switch (with -version) is the most straightforward way to achieve that. We'd have a transition period where people can keep the existing behavior or throw the switch and get compile errors - with a dead-simple "just add .byCodePoint on this line" fix - to migrate their code.
That is not going to happen.
 It's one thing to make a mistake. Everyone does that sometimes, and
 nobody is born knowing complex issues. What matters is if you're willing
 to learn new information and correct your errors.
The real ticket out of this is RCStr. It solves a major problem in the language (compulsive GC) and also a minor occasional annoyance (autodecoding). Andrei
Jun 02 2016
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu 
wrote:
 It's one thing to make a mistake. Everyone does that 
 sometimes, and
 nobody is born knowing complex issues. What matters is if 
 you're willing
 to learn new information and correct your errors.
The real ticket out of this is RCStr. It solves a major problem in the language (compulsive GC) and also a minor occasional annoyance (autodecoding).
You start to sound like a car salesman. I know nothing about RCStr, but I'm already starting to resent it.
Jun 02 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/02/2016 10:44 AM, deadalnix wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes, and
 nobody is born knowing complex issues. What matters is if you're willing
 to learn new information and correct your errors.
The real ticket out of this is RCStr. It solves a major problem in the language (compulsive GC) and also a minor occasional annoyance (autodecoding).
You start to sound like a car salesman.
I assume that means overselling or false advertising. Where do either of these happen? -- Andrei
Jun 02 2016
parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 2 June 2016 at 15:03:34 UTC, Andrei Alexandrescu 
wrote:
 You start to sound like a car salesman.
I assume that means overselling or false advertising. Where do either of these happen? -- Andrei
For SDC for instance, autodecode is a problem (in fact, it is the very reason I abandoned making the lexer usable as a standalone) while RCStr would not help one bit as string are pretty much never manipulated directly anywhere. More generally, using RCStr is at best sidestepping the issue rather than solving it. On the GC side of the issue, I think there are also overstatements.
Jun 02 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/02/2016 11:14 AM, deadalnix wrote:
 On Thursday, 2 June 2016 at 15:03:34 UTC, Andrei Alexandrescu wrote:
 You start to sound like a car salesman.
I assume that means overselling or false advertising. Where do either of these happen? -- Andrei
For SDC for instance, autodecode is a problem (in fact, it is the very reason I abandoned making the lexer usable as a standalone) while RCStr would not help one bit as string are pretty much never manipulated directly anywhere.
Well I'm not sure how SDC works.
 More generally, using RCStr is at best sidestepping the issue rather
 than solving it.
What is the issue?
 On the GC side of the issue, I think there are also overstatements.
What are those? Andrei
Jun 02 2016
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, Jun 02, 2016 at 02:44:21PM +0000, deadalnix via Digitalmars-d wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes,
 and nobody is born knowing complex issues. What matters is if
 you're willing to learn new information and correct your errors.
The real ticket out of this is RCStr. It solves a major problem in the language (compulsive GC) and also a minor occasional annoyance (autodecoding).
You start to sound like a car salesman. I know nothing about RCStr, but I'm already starting to resent it.
Same here. It's starting to sound like some unproven newfangled contraption designed to please the GC-phobic crowd who believe that RC is the answer to life, the universe, and everything, and who may not actually adopt D even after we've broken our backs bending over backwards for them. (And with a subject like "our sister", this RCStr business does not sound very appealing at all.) Whatever happened to improving *current* string handling for *current* users? It's making forking Phobos look like a less distant possibility than I had anticipated. :-( T -- People say I'm arrogant, and I'm proud of it.
Jun 02 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/02/2016 11:03 AM, H. S. Teoh via Digitalmars-d wrote:
 On Thu, Jun 02, 2016 at 02:44:21PM +0000, deadalnix via Digitalmars-d wrote:
 On Thursday, 2 June 2016 at 14:29:28 UTC, Andrei Alexandrescu wrote:
 It's one thing to make a mistake. Everyone does that sometimes,
 and nobody is born knowing complex issues. What matters is if
 you're willing to learn new information and correct your errors.
The real ticket out of this is RCStr. It solves a major problem in the language (compulsive GC) and also a minor occasional annoyance (autodecoding).
You start to sound like a car salesman. I know nothing about RCStr, but I'm already starting to resent it.
Same here. It's starting to sound like some unproven newfangled contraption designed to please the GC-phobic crowd who believe that RC is the answer to life, the universe, and everything, and who may not actually adopt D even after we've broken our backs bending over backwards for them.
I'm sorry, this is completely ridiculous. What is unproven? Reference counting is a long-standing success story for string handling. I'm using it because it's good, not to woo users.
 (And with a subject like "our sister", this RCStr
 business does not sound very appealing at all.)
I'm glad this is mentioned as one of the issues with RCStr.
 Whatever happened to
 improving *current* string handling for *current* users?
RCStr will improve string handling for current users.
 It's making forking Phobos look like a less distant possibility than I
 had anticipated. :-(
So you'd fork Phobos because... it adds a good string type? Andrei
Jun 02 2016
prev sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, June 02, 2016 10:29:28 Andrei Alexandrescu via Digitalmars-d 
wrote:
 The real ticket out of this is RCStr. It solves a major problem in the
 language (compulsive GC) and also a minor occasional annoyance
 (autodecoding).
Unless we're outright getting rid of string, char[], wstring, etc., RCStr clearly doesn't solve the auto-decoding problem. It will allow a lot of code to sidestep it, but the existing types will continue to exist and be used and have to deal with auto-decoding. And every function that works on strings that cares about efficiency is going to have to continue to special case strings to avoid auto-decoding. - Jonathan M Davis
Jun 02 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/02/2016 11:26 AM, Jonathan M Davis via Digitalmars-d wrote:
 Unless we're outright getting rid of string, char[], wstring, etc., RCStr
 clearly doesn't solve the auto-decoding problem.
It does if you use it. If you don't, it doesn't. -- Andrei
Jun 02 2016
prev sibling parent Chris <wendlec tcd.ie> writes:
On Thursday, 2 June 2016 at 13:34:18 UTC, Adam D. Ruppe wrote:
 On Thursday, 2 June 2016 at 04:42:49 UTC, Walter Bright wrote:
 Andrei is in favor of fixing Phobos so it does not depend on 
 autodecode.
Putting the autodecode functions on a compiler switch (with -version) is the most straightforward way to achieve that. We'd have a transition period where people can keep the existing behavior or throw the switch and get compile errors - with a dead-simple "just add .byCodePoint on this line" fix - to migrate their code. Phobos would be fixed in a day. Everyone else would have up to a couple years to fix their code (which, again, is as simple as throwing a compiler switch and mechanically adding .byCodePoint* where the static asserts tell you to) as we work through the slow deprecation cycle. But then, we'd have light at the end of the tunnel: after this deprecation cycle completes, we can kill hundreds of lines of confusing, worthless functions. Existing functions that don't work with ranges of chars will be able to without trouble. Newbies will never again ask "wtf" when they see string.whatever yielding dchar[]. * Or byGrapheme or .byCodeUnit or whatever if you want to take the time to actually fix the fundamental question of the code, but just slapping .byCodePoint in there reverts to the same behavior of autodecode.
I would love to have a compiler switch and finally be able to rid my code of auto decoding [1], once and for all - and get a free performance boost. There's so much talk about code that _might_ break, when we don't even know how much code would actually break. It's absurd, we remain inert out of fear of the unknown, while it would be pretty easy to just test it and find out (std.path is actually a precedence). And it wouldn't even be a breaking change in the sense that you cannot go on developing with D's latest version because you're stuck with a stone age version of dmd forever. Much in the same vein, I don't know, if we should make the question of auto decode dependent on RCString. This will take at least another few months of bikeshedding, while what we need to do is get rid (or start to get rid) of auto decode right now - and maybe this process will teach us something that will later be useful when implementing RCString. [1] As I already mentioned here http://forum.dlang.org/post/yzeiqvphrqdcmaxaspvx forum.dlang.org [snip]
Jun 02 2016
prev sibling next sibling parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy code, would
 you like to know more?")
 /* existing popFront here */
I vote we use Adam's exact verbiage, too! :)
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with C++.
 3) A wee bit longer, we exterminate all this autodecoding crap and enjoy
 Phobos being a smaller, more efficient library.
Yay! Profit!
May 31 2016
parent reply Seb <seb wilzba.ch> writes:
On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky wrote:
 On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would
 you like to know more?")
 /* existing popFront here */
I vote we use Adam's exact verbiage, too! :)
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with C++.
 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy
 Phobos being a smaller, more efficient library.
Yay! Profit!
How about a poll? http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d Results are shown after casting a vote or here: http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view
Jun 01 2016
parent reply Seb <seb wilzba.ch> writes:
On Wednesday, 1 June 2016 at 11:42:06 UTC, Seb wrote:
 On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky 
 wrote:
 On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
    static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would
 you like to know more?")
 /* existing popFront here */
I vote we use Adam's exact verbiage, too! :)
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY
 WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with C++.
 3) A wee bit longer, we exterminate all this autodecoding 
 crap and enjoy
 Phobos being a smaller, more efficient library.
Yay! Profit!
How about a poll? http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d Results are shown after casting a vote or here: http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view
Just FYI after a short period of ten hours we got the following 45 responses: Yes, with fire! (hobby user) 77% (35) Yeah remove that special behavior (professional user) 35% (16) Wait that is what auto decoding is? wah ugh... 8% (4) I don't always decode codeunits, but when I do I use byDChar already 6% (3)
Jun 01 2016
parent reply poliklosio <poliklosio happypizza.com> writes:
On Thursday, 2 June 2016 at 00:14:30 UTC, Seb wrote:

 Just FYI after a short period of ten hours we got the following 
 45 responses:

 Yes, with fire! (hobby user)
     77% (35)
 Yeah remove that special behavior (professional user)
     35% (16)
 Wait that is what auto decoding is? wah ugh...
     8%  (4)
 I don't always decode codeunits, but when I do I use byDChar 
 already  6%  (3)
You failed to mention that there were additional answers: Auto-decoding is great! 0% (0) No, please don't break my code. 0% (0) I think those zeroes are actually the most important part of the results. :)
Jun 01 2016
parent reply Joakim <dlang joakim.fea.st> writes:
On Thursday, 2 June 2016 at 06:53:49 UTC, poliklosio wrote:
 On Thursday, 2 June 2016 at 00:14:30 UTC, Seb wrote:

 Just FYI after a short period of ten hours we got the 
 following 45 responses:

 Yes, with fire! (hobby user)
     77% (35)
 Yeah remove that special behavior (professional user)
     35% (16)
 Wait that is what auto decoding is? wah ugh...
     8%  (4)
 I don't always decode codeunits, but when I do I use byDChar 
 already  6%  (3)
You failed to mention that there were additional answers: Auto-decoding is great! 0% (0) No, please don't break my code. 0% (0) I think those zeroes are actually the most important part of the results. :)
It has been noted many times that forum users are a small part of the D userbase, likely the ones who are the most interested in evolving the language and thus biased towards changes. As a forum user myself, I'm in that group too and agree with Walter that D programmers should be guided by Phobos to explicitly declare what level of decoding they want, but this poll may not be representative of the wider userbase. We'll likely only find out what they think once we're a couple dmd releases into these changes, as Walter found when he submitted PRs for file/path code sometime back.
Jun 02 2016
parent poliklosio <poliklosio happypizza.com> writes:
On Thursday, 2 June 2016 at 07:21:28 UTC, Joakim wrote:
 On Thursday, 2 June 2016 at 06:53:49 UTC, poliklosio wrote:
 (...)
It has been noted many times that forum users are a small part of the D userbase, likely the ones who are the most interested in evolving the language and thus biased towards changes. As a forum user myself, I'm in that group too and agree with Walter that D programmers should be guided by Phobos to explicitly declare what level of decoding they want, but this poll may not be representative of the wider userbase. We'll likely only find out what they think once we're a couple dmd releases into these changes, as Walter found when he submitted PRs for file/path code sometime back.
Its not representative but there is going to be at least some weak correlation between the forum and proffesional world. We are developers after all. Out of 16 proffesional users none selected "Please, don't break my code" option, which tells that there is some hope that a change wouldn't be that damaging. Of course further investigation would be needed to confirm that hypothesis. But at least we didn't prove that such investigation is a waste of time. Also, on the issue of wanting/not wanting autodecoding as a feature (ignoring the code breakage issue) 0 out of 55 people actually want autodecoding. I think its improbable that most users outside the forum would have the opposite view. You would have at least some of this refrected in the poll. So the poll does tell something, you just have to know how not to overinterpret the results. :)
Jun 02 2016
prev sibling next sibling parent reply Kirill Kryukov <kkryukov gmail.com> writes:
On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
This. I only recently started full scale use of D, but I lurked here for years. D has a few quirks here and there, but overall it's a fantastic language. However the biggest putting off factor for me is the attitude of the leadership towards fixing the issues and completing the language. The idea of autodecoding is very natural to appear for someone who only recently discovered Unicode. Whoa, instead of code pages we now have "unicode code points". Great. Only much later the person realizes that working with code points isn't always correct. So I don't blame anyone for designing/implementing autodecoding years ago. But. Not acknowledging that autodecoding is seriously wrong now, looks like a complete brain damage. The entire community seems united in the view that autodecoding is both slow and usually wrong. The users are begging for this breaking change. There's a number of approaches about handling the deprecation. Even the code that for some reason really needs to work with code points will benefit from explicitly stating that it needs code points. But no we must endure this madness forever. I realize that priorities of a language user might be different from those of a language leadership. With fixed (removed) autodecoding the user gets a cleaner language. Their program will work faster and is easier to reason about. User's brain cycles are not wasted for useless crap like working around autodecoding. On the other hand, the language/stdlib designer now has to admit their initial design was sub-optimal. Their books and articles are now obsolete. And they will be the ones who receive complaints from the inevitable few upset with the change. However keeping the current situation means for me personally: 1. Not switching to D wholesale, but just toying with it. 2. Even when using D for work I don't want to talk about it to others. I was seriously thinking about starting a D-learning seminar at work, and I still might, but the thought that autodecoding is going to stay is cooling my enthusiasm. I just did a numerical app in D, where it shines, I think. However much of my work code is dealing with huge texts. I don't want to fight with autodecode at every step. I'd like arrays of chars be arrays of chars without any magic crap auto-inserted behind my back. I don't want to become an expert in avoiding language pitfalls (The reason I abandoned C++ years ago). I also don't want to re-implement the staple string processing routines (though I might, if at least the language constructs work without autodecode, which seems not the case here). Think about it. 99% of code working with code points is _broken_ anyway. (In the sense, that the usual assumption is that code point represents a character, while in fact it does not). When working with code units, the developer will notice the problem right away. When working with code points, the problem is not apparent until years later (essentially what happened to D itself). Feel free to ignore my non-D-core-dev comment. Even though I suspect many D users may agree with me. An even larger number of potential D users does not want autodecoding either. Thanks, Kirill
May 31 2016
parent poliklosio <poliklosio happypizza.com> writes:
On Wednesday, 1 June 2016 at 05:46:29 UTC, Kirill Kryukov wrote:
 On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
This. (...) I don't want to become an expert in avoiding language pitfalls (The reason I abandoned C++ years ago).
+1 If you have too many pitfalls in the language, its not easier to learn than C++, just different (regardless of the maximum productivity you have when using the language, that's another issue). The worst case is you just want to use ASCII text and suddenly you have to spend weeks reading a ton of confusing stuff about Unicode, D and autodecoding, just to know how to use char[] correctly in D. Compare that to how trivial it is to process ASCII text in, say, C++. And processing just plain ASCII is a very common case, e.g. processing textual logs from tools.
Jun 01 2016
prev sibling next sibling parent default0 <Kevin.Labschek gmx.de> writes:
On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!
Agree with that very much. Yes, you still have to think about cost/benefit for breaking changes, but in general when I sign up for D I expect it to throw out mistakes of the past so long as the correction of them is worth the cost of breakage. So the cost of breakage for autodecoding is that the behaviour of roughly all string handling code changes. Now most of this string handling code was broken to begin with since VERY VERY VERY little string handling code ever cares about code points. This means the code that is actually broken in terms of being buggy after the change when it wasn't buggy before is probably not a lot. The other cost of breakage is to force a user to go through potentially thousands of LoC and update their string handling code. Personally, I find that cost dramatically reduced if there are two prerequisites met: Compiler Errors everywhere we have relied on the feature before (we can apparently do that, so check) and error/deprecation messages detailed enough to go into further reading so I can make meaningful decisions about it (we can also do that, I am sure, so check). If I just have to hop from one compiler error to the next and fix my broken code with confidence after having read about the context for 30-60 minutes, even going through vast amounts of code is not actually that big of a deal since you really only have to inspect a fraction of it (the fraction the compiler tells you about). Another cost is if we have unmaintained 3rd party libraries, when we actually make the change the default in the future, they will stop compiling on recent compiler versions. I suppose a tool could be made tracking the specific compiler errors and simply using .byDchar to make the code "just work" exactly the way it used to work (ie unreliably, slowly and with bugs in string handling) before the change. The cost of backwards-compatibility is also two-fold from what I can see: -We will continue to be inefficient and waste time autodecoding by default (mobile users are going to be especially happy about that). -By default, string handling code is still broken, just more subtly, meaning more string handling bugs in D code make it to production
May 31 2016
prev sibling next sibling parent reply Guillaume Chatelet <chatelet.guillaume gmail.com> writes:
On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 I have a better one, that we discussed on IRC last night:

 1) put the string overloads for front and popFront on a version 
 switch:

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 2) After a while, we swap the version conditions, so opting 
 into it preserves the old behavior for a while.

 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy Phobos being a smaller, more efficient library.
+1
Jun 01 2016
parent Andrea Fontana <nospam example.com> writes:
On Wednesday, 1 June 2016 at 08:21:36 UTC, Guillaume Chatelet 
wrote:
 On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 I have a better one, that we discussed on IRC last night:

 1) put the string overloads for front and popFront on a 
 version switch:

 D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
 QUALITY WITH A SIMPLE MIGRATION PATH!!!!!!!!!!!!!!!!!!!!

 2) After a while, we swap the version conditions, so opting 
 into it preserves the old behavior for a while.

 3) A wee bit longer, we exterminate all this autodecoding crap 
 and enjoy Phobos being a smaller, more efficient library.
+1
+1
Jun 01 2016
prev sibling next sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode
Yes, it is. We need to stop holding on to the mistakes of the past. 9 of 10 dentists agree that autodecoding is a mistake. Not just WAS a mistake, IS a mistake. It has ongoing cost. If we don't fix our attitude about these problems, we are going to turn into that very demon we despise, yea, even the next C++!
Please, just remove auto-decoding, any way you want. I only ever used it once or twice voluntarily. It's a special case that must go. Maybe with a flag like for -vtls.
Jun 01 2016
prev sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
 version(string_migration)
 deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
 }
 else
 deprecated("use -versionstring_migration to fix your buggy 
 code, would you like to know more?")
 /* existing popFront here */
version(autodecode_migration) deprecated("autodecode attempted, use byDchar instead") alias popFront=_d_popFront; else alias popFront=_d_popFront; void _d_popFront(T)(ref T t) if(isSomeString!T) { /* existing popFront here */ } The migration branch should compile and work or template constraints will silently fail. Then deprecation messages can be grepped. That said does compiler print deprecation messages triggered inside template constraints?
Jun 01 2016
prev sibling next sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Tuesday, May 31, 2016 17:46:04 Walter Bright via Digitalmars-d wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it ourselves
 and stop relying on it in the documentation, much like [] is eschewed in
 favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with .byDchar
 (.byDchar has a bonus of not throwing an exception on invalid UTF, but using
 the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when
 dealing with ranges/arrays of characters to make it clear what is happening.
The other critical thing is to make sure that Phobos in general works with byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started trying to use byCodeUnit instead of naked strings, I ran into this: https://issues.dlang.org/show_bug.cgi?id=15800 But once Phobos no longer relies on autodecoding except maybe in places where we can't actually excise it completely without breaking code (and hopefully there are none of those), then we can look at how feasible the full removal of auto-decoding really is. IMHO, leaving it in is a _huge_ piece of technical debt that we don't want and probably can't afford, so I really don't think that we should just assume that we can't remove it due to the breakage that it would cause. But we definitely have work to do before we can have Phobos in a state where it's reasonable to even make an attempt. byCodeUnit and friends were a good start, but we need to make it so that they're treated as first-class citizens, and they're not right now. - Jonathan M Davis
May 31 2016
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:
 The other critical thing is to make sure that Phobos in general works with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800
That was posted 3 months ago. No PR to fix it (though it likely is an easy fix). If we can't get these things fixed in Phobos, how can we tell everyone else to fix their code?
May 31 2016
parent reply Brad Roberts via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 5/31/2016 7:40 PM, Walter Bright via Digitalmars-d wrote:
 On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:
 The other critical thing is to make sure that Phobos in general works
 with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800
That was posted 3 months ago. No PR to fix it (though it likely is an easy fix). If we can't get these things fixed in Phobos, how can we tell everyone else to fix their code?
I hope that wasn't a serious question. The answer is trivial. The rate of incoming bug reports exceeds the rate of bug fixing which exceeds the rate of fix pulling. Has since about the dawn of time.
May 31 2016
parent tsbockman <thomas.bockman gmail.com> writes:
On Wednesday, 1 June 2016 at 02:58:36 UTC, Brad Roberts wrote:
 ...the rate of bug fixing which exceeds the rate of fix pulling.
Speaking of which: https://github.com/dlang/phobos/pull/4345 https://github.com/dlang/phobos/pull/3973
May 31 2016
prev sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 1 June 2016 at 02:28:04 UTC, Jonathan M Davis wrote:
 The other critical thing is to make sure that Phobos in general 
 works with byDChar, byCodeUnit, etc. For instance, pretty much 
 as soon as I started trying to use byCodeUnit instead of naked 
 strings, I ran into this:

 https://issues.dlang.org/show_bug.cgi?id=15800
https://github.com/dlang/phobos/pull/4390
Jun 01 2016
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, May 31, 2016 at 07:28:04PM -0700, Jonathan M Davis via Digitalmars-d
wrote:
[...]
 The other critical thing is to make sure that Phobos in general works with
 byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
 trying to use byCodeUnit instead of naked strings, I ran into this:
 
 https://issues.dlang.org/show_bug.cgi?id=15800
This is an example of current Phobos code assuming (sometimes implicitly) that strings are ranges of dchar, which leads to subtle breakage like this one: https://issues.dlang.org/show_bug.cgi?id=15972 T -- "640K ought to be enough" -- Bill G. (allegedly), 1984. "The Internet is not a primary goal for PC usage" -- Bill G., 1995. "Linux has no impact on Microsoft's strategy" -- Bill G., 1999.
May 31 2016
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2016-06-01 02:46, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar
Don't you get the same behavior using byDchar as with autodecode? -- /Jacob Carlborg
May 31 2016
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/31/2016 11:57 PM, Jacob Carlborg wrote:
 The way to deal with it is to replace reliance on autodecode with
 .byDchar
Don't you get the same behavior using byDchar as with autodecode?
Yes (except that byDchar returns the replacement char on invalid Unicode, while autodecode throws an exception). But the point is that byDchar is opt-in.
Jun 01 2016
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/31/2016 08:46 PM, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it is too
 embedded into things. What we can do, however, is stop using it
 ourselves and stop relying on it in the documentation, much like [] is
 eschewed in favor of std::vector in C++.

 The way to deal with it is to replace reliance on autodecode with
 .byDchar (.byDchar has a bonus of not throwing an exception on invalid
 UTF, but using the replacement dchar instead.)

 To that end, and this will be an incremental process:

 1. Temporarily break autodecode such that using it will cause a compile
 error. Then, see what breaks in Phobos and fix those to use .byDchar

 2. Change examples in the documentation and the Phobos examples to use
 .byDchar

 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
 when dealing with ranges/arrays of characters to make it clear what is
 happening.
(Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and byCodePoint stay as they are.) 4. Rally behind RCStr as the preferred string type of the D language. RCStr manages its own memory, is fast, and has the right interface (i.e. offers several views for iteration without an implicit one, doesn't throw on invalid code points, etc). This is the key component. We get rid of GC-backed strings, which is part of the crucial goal for D we need to achieve, and reap the benefit of a better design as a perk. Breaking existing code does not have the right benefit for the cost. Let's keep the eyes on the ball, folks. We want to rid D of the GC. That's the prize. Andrei
Jun 01 2016
next sibling parent Chris <wendlec tcd.ie> writes:
On Wednesday, 1 June 2016 at 12:14:06 UTC, Andrei Alexandrescu 
wrote:
 On 05/31/2016 08:46 PM, Walter Bright wrote:

 (Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and 
 byCodePoint stay as they are.)

 4. Rally behind RCStr as the preferred string type of the D 
 language. RCStr manages its own memory, is fast, and has the 
 right interface (i.e. offers several views for iteration 
 without an implicit one, doesn't throw on invalid code points, 
 etc).

 This is the key component. We get rid of GC-backed strings, 
 which is part of the crucial goal for D we need to achieve, and 
 reap the benefit of a better design as a perk. Breaking 
 existing code does not have the right benefit for the cost.

 Let's keep the eyes on the ball, folks. We want to rid D of the 
 GC. That's the prize.


 Andrei
How would the transition look like? How would it affect existing code, like e.g. `countUntil`, `.length` etc.?
Jun 01 2016
prev sibling parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, June 01, 2016 08:14:06 Andrei Alexandrescu via Digitalmars-d 
wrote:
 4. Rally behind RCStr as the preferred string type of the D language.
 RCStr manages its own memory, is fast, and has the right interface (i.e.
 offers several views for iteration without an implicit one, doesn't
 throw on invalid code points, etc).

 This is the key component. We get rid of GC-backed strings, which is
 part of the crucial goal for D we need to achieve, and reap the benefit
 of a better design as a perk. Breaking existing code does not have the
 right benefit for the cost.

 Let's keep the eyes on the ball, folks. We want to rid D of the GC.
 That's the prize.
Since when has it been the goal to get rid of GC-allocated strings? We definitely want an alternative to GC-allocated strings for code that can't afford to use the GC, but auto-decoding issues aside, why would I want to use RCString instead of string if the GC isn't a problem for my program? Walter pointed out at dconf that using a GC is often faster than reference counting; it's just that it can incur a large cost at once when a collection is run, whereas the cost of ref-counting is amortized across the time that the program is running. I expect that RCString will be very important for us going forward, but I don't see much reason to use it as the default string type in code over just using string except for the fact that we have the auto-decoding mess to deal with. It seems more like RCString is an optimization for certain types of programs than what you'd want to use by default. - Jonathan M Davis
Jun 01 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.
You'll always want to use it. The small string optimization will make it compelling for all applications. -- Andrei
Jun 01 2016
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.
You'll always want to use it. The small string optimization will make it compelling for all applications. -- Andrei
- Why is it dependent on the allocation strategy or on the type of the data? - It seems to be a pessimization if I'm taking a lot of small slices. - It is undesirable if I later want to reference-compare those slices.
Jun 01 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/01/2016 04:28 PM, Timon Gehr wrote:
 On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.
You'll always want to use it. The small string optimization will make it compelling for all applications. -- Andrei
- Why is it dependent on the allocation strategy or on the type of the data?
Not getting this.
 - It seems to be a pessimization if I'm taking a lot of small slices.
I agree cases can be created in which straight arrays do sometimes better. They are rare and far between - for strings, the small string optimization is to live by.
 - It is undesirable if I later want to reference-compare those slices.
Arrays will still be usable. Andrei
Jun 01 2016
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01.06.2016 22:43, Andrei Alexandrescu wrote:
 On 06/01/2016 04:28 PM, Timon Gehr wrote:
 On 01.06.2016 17:30, Andrei Alexandrescu wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.
You'll always want to use it. The small string optimization will make it compelling for all applications. -- Andrei
- Why is it dependent on the allocation strategy or on the type of the data?
Not getting this. ...
The small string optimization also works for GC-allocated strings. Why do I always want to use RCString instead of the corresponding GCString? (Also, the same approach can be applied to other arrays with value semantics.)
Jun 01 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/01/2016 05:03 PM, Timon Gehr wrote:
 The small string optimization also works for GC-allocated strings. Why
 do I always want to use RCString instead of the corresponding GCString?
 (Also, the same approach can be applied to other arrays with value
 semantics.)
Point taken, thanks. Mine was that you can't (reasonably) use the SSO if you commit to represent strings as bare slices. -- Andrei
Jun 01 2016
prev sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, June 01, 2016 11:30:02 Andrei Alexandrescu via Digitalmars-d 
wrote:
 On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:
 It seems more like RCString is an optimization for certain types of
 programs than what you'd want to use by default.
You'll always want to use it. The small string optimization will make it compelling for all applications. -- Andrei
Well, ref-counting vs GC aside, optimizations like that are actually something that can clearly make a user-defined type for strings worth using over naked arrays of code units, whereas it's far less clear that having a user-defined type for strings because of Unicode-related issues actually buys us much. - Jonathan M Davis
Jun 02 2016
prev sibling parent reply Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
 It is not practical to just delete or deprecate autodecode - it 
 is too embedded into things. What we can do, however, is stop 
 using it ourselves and stop relying on it in the documentation, 
 much like [] is eschewed in favor of std::vector in C++.
Hopefully my perspective on auto-decoding topic is useful rather than disruptive. I work on search applications, both run-time engines and data science. Processing multi-lingual text is an important aspect of these applications. There are a couple issues with D's current auto-decoding implementation for these applications. One is lack of control over error handling when encountering corrupt utf-8 text. Real world data contains corrupt utf-8 sequences, robust applications need to handle them. Proper handling is generally application specific. Both replacement character and throwing exceptions are useful behaviors, but the ability to choose between them is often necessary. At present, this behavior is built into the low-level primitives, without application control. Notably, 'front' and 'popFront' have different behaviors. This is also a consideration for explicitly invoked decoding facilities like 'byUTF'. Another is performance. Iteration triggering auto-decoding is apparently an order of magnitude more costly than iteration without decoding. This is too large a delta when the algorithm doesn't require decoding. (Such algorithms are common.) Frankly, I'm surprised the cost is so large. It wouldn't surprise me to find out it's partly a compiler artifact, but it doesn't matter. As to what to do about it - if changing currently built-in auto decoding is not an option, then perhaps providing parallel facilities that don't auto-decode would do the trick. RCStr would seem a real opportunity. Perhaps a raw array of utf-8 code units ala ubyte[] that doesn't get auto-decoded? With either, explicit decoding would be needed to invoke standard library routines operating on unicode code points or graphemes. (Sounds like interaction with character literals could still be an issue, as the actual representation is not obvious.) Having a consistent set of error handling options for explicit decoding facilities would be helpful as well. Another possibility would be support for detecting inadvertent auto-decoding. D has very nice support for ensuring or detecting code properties (eg. ' nogc', '-vgc' compiler option). If there was a way to identify code triggering auto-decoding, that would be useful.
Jun 07 2016
parent FilippoR <ics_fight hotmail.com> writes:
It's possible to add a new      alias bstring immutable(ubyte)[]

a new literal postfix (ustring s = "test string"b) or UFCS  
(ustring s = "test string".b)

add UFCS byCodePoint byGrapheme

and add overload function in phobos where necessary

so we can have a autodecode free string
Jun 07 2016