digitalmars.D - Why not flag away the mistakes of the past?
- Taylor Hillegeist (9/9) Mar 06 2018 So i've seen on the forum over the years arguments about
- FeepingCreature (8/8) Mar 07 2018 For what it's worth, I like autodecoding.
- jmh530 (5/15) Mar 07 2018 That's the approach used for most things, but there's a lot of
- Guillaume Piolat (5/8) Mar 07 2018 auto-decoding problem was mostly that it couldn't be @nogc since
- Jonathan M Davis (21/29) Mar 07 2018 I'd actually argue that that's the lesser of the problems with
- Nick Treleaven (9/12) Mar 07 2018 For me the fundamental problem is having char[] in the language
- Jonathan M Davis (35/47) Mar 07 2018 In principle, char is supposed to be a UTF-8 code unit, and strings are
- Guillaume Piolat (6/28) Mar 08 2018 I'd agree with you, hate the special casing. However it seems to
- Jonathan M Davis (18/48) Mar 08 2018 Most everyone who debated in favor of it early on is very much against i...
- Taylor Hillegeist (24/62) Mar 08 2018 I wasn't so much asking about auto-decoding in particular more
- Jonathan M Davis (35/58) Mar 08 2018 Any and all changes need to be weighed for their pros and cons. No one l...
- Chris (7/21) Mar 09 2018 It's aleady been said (by myself and others) that we should
- H. S. Teoh (25/38) Mar 08 2018 [...]
- Henrik (6/27) Mar 08 2018 Which companies are against changing this? They must be powerful
- Guillaume Piolat (10/24) Mar 09 2018 I remember something a bit different last time it was discussed:
- Steven Schveighoffer (8/19) Mar 07 2018 Note, autodecoding is NOT a feature of the language, but rather a
- Seb (5/15) Mar 07 2018 Well, I tried that already:
- H. S. Teoh (11/30) Mar 07 2018 Argh... this really struck a nerve. "Not much interest"?! I think a
- Dukc (8/11) Mar 08 2018 No. The main problem with that (and the idea of using a compiler
- Jon Degenhardt (9/19) Mar 07 2018 Auto-decoding is a significant issue for the applications I work
- Seb (4/14) Mar 07 2018 Well you can use byCodeUnit, which disables auto-decoding
- H. S. Teoh (22/33) Mar 07 2018 [...]
- Gary Willoughby (2/4) Mar 09 2018 Please!!!
- Jon Degenhardt (4/20) Mar 07 2018 I looked at this once. It didn't appear to be a viable solution,
So i've seen on the forum over the years arguments about auto-decoding (mostly) and some other things. Things that have been considered mistakes, and cannot be corrected because of the breaking changes it would create. And I always wonder why not make a solution to the tune of a flag that makes things work as they used too, and make the new behavior default. dmd --UseAutoDecoding That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?
Mar 06 2018
For what it's worth, I like autodecoding. I worry we could be in a situation where a moderate number of people are strong opponents and a lot of people are weak fans, none of which individually care enough to post. Hopefully the D survey results will shed some light on this, though I don't remember if it was written to actually ask people's opinion of autodecoding or just list it as a possible issue to raise, which would fall into the same trap.
Mar 07 2018
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:So i've seen on the forum over the years arguments about auto-decoding (mostly) and some other things. Things that have been considered mistakes, and cannot be corrected because of the breaking changes it would create. And I always wonder why not make a solution to the tune of a flag that makes things work as they used too, and make the new behavior default. dmd --UseAutoDecoding That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?That's the approach used for most things, but there's a lot of things that rely on auto-decoding, so it would be a big effort to actually implement that.
Mar 07 2018
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?auto-decoding problem was mostly that it couldn't be nogc since throwing, but with further releases exception throwing will get nogc. So it's getting fixed.
Mar 07 2018
On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d wrote:On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:I'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at. The programmer needs to be in control of whether the code is operating on code units, code points, or graphemes, and because of auto-decoding, we have to constantly avoid using the range primitives for arrays on strings. Tons of range-based code has to special case for strings in order to work around auto-decoding. We're constantly fighting our own API in order to process strings sanely and efficiently. IMHO, nogc and nothrow don't matter much in comparison. Yes, it would be nice if range-based code operating on strings were nogc and nothrow, but most D code really doesn't care. It uses the GC anyway, and most of the time, no exceptions are thrown, because the strings are valid Unicode. Yes, the fact that the range primitives for strings throw UTFExceptions instead of using the Unicode replacement character is a problem, but that problem is small in comparison to the problems caused by the auto-decoding itself. Even if front and popFront used the variant of decode that used the replacement character, auto-decoding would still be a huge problem. - Jonathan M DavisThat way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?auto-decoding problem was mostly that it couldn't be nogc since throwing, but with further releases exception throwing will get nogc. So it's getting fixed.
Mar 07 2018
On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis wrote:I'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at.For me the fundamental problem is having char[] in the language at all, meaning a Unicode string. Arbitrary slicing and indexing are not Unicode compatible, if we revisit this we need a String type that doesn't support those operations. Plus the issue of string validation - a Unicode string type should be assumed to have valid contents - unsafe data should only be checked at string construction time, so iterating should always be nothrow.
Mar 07 2018
On Wednesday, March 07, 2018 13:40:20 Nick Treleaven via Digitalmars-d wrote:On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis wrote:In principle, char is supposed to be a UTF-8 code unit, and strings are supposed to be validated up front rather than constantly validated, but it's never been that way in practice. Regardless, having char[] be sliceable is actually perfectly fine and desirable. That's exactly what you want whenever you operate on code units, and it's frequently the case that you want to be operating at the code unit level. But the programmer needs to be able to reasonably control when code units, code points, or graphemes are used, because each has their time and place. If we had a string type, it would need to provide access to each of those levels and likely would not be directly sliceable at all, because slicing a string is kind of meaningless, because in principle, a string is just on opaque piece of character data. It's when you're dealing at the code unit, code point, or grapheme level that you actually start operating on pieces of a string, and that means that the level that you're operating at needs to be defined. Having char[] be an array of code units works quite well, because then you have efficiency by default. You then need to wrap it in another range type when appropriate to get a range of code points or graphemes, or you need to explicitly decode when appropriate. Whereas right now, what we have is Phobos being "helpful" and constantly decoding for us such that we get needlessy inefficient code, and it's at the code point level, which is usually not the level you want to operate at. So, you don't have efficiency or correctness. Ultimately, it really doesn't work to hide the details of Unicode and not have the programmer worry about code units, code points, and graphemes unless you don't care about efficency. As such, what we really need is to cleanly give the programmer the tools to manage Unicode without the language or library assuming what the programmer wants - especially assuming an inefficient default. The language itself actually does a decent job of that. It's Phobos that dropped the ball on that one, because Andrei didn't know about graphemes and tried to make Phobos Unicode-correct by default. Instead, we get inefficient and incorrect by defaltu. - Jonathan M DavisI'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at.For me the fundamental problem is having char[] in the language at all, meaning a Unicode string. Arbitrary slicing and indexing are not Unicode compatible, if we revisit this we need a String type that doesn't support those operations. Plus the issue of string validation - a Unicode string type should be assumed to have valid contents - unsafe data should only be checked at string construction time, so iterating should always be nothrow.
Mar 07 2018
On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis wrote:On Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d wrote:I'd agree with you, hate the special casing. However it seems to me this has been debated to death already, and that auto-decoding was successfully advocated by Alexandrescu and al; surviving the controversy years ago.On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:I'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at. The programmer needs to be in control of whether the code is operating on code units, code points, or graphemes, and because of auto-decoding, we have to constantly avoid using the range primitives for arrays on strings. Tons of range-based code has to special case for strings in order to work around auto-decoding. We're constantly fighting our own API in order to process strings sanely and efficiently.That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?auto-decoding problem was mostly that it couldn't be nogc since throwing, but with further releases exception throwing will get nogc. So it's getting fixed.
Mar 08 2018
On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d wrote:On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis wrote:Most everyone who debated in favor of it early on is very much against it now (and I'm one of them). Experience and a better understanding of Unicode has shown it to be a terrible idea. I question that you will find any significant contributor to Phobos who would choose to have it if we were starting from scratch, and most of the folks who post in the newsgroup agree with that. The problem is what to do given that we don't want it and that no one has come up with a way to remove it without breaking tons of code in the process or even providing a clean migration path. So, given how difficult it is to remove at this point, you'll find disagreement about how that should be handled ranging from deciding that we're just stuck with it to wanting to remove it regardless of the cost. But there seems to be almost universal agreement now (certainly among the folks who might make such a decision) that auto-decoding was a mistake. So, there's agreement that it would ideally go, but there isn't agreement on what we should actually do given the situation that we're in. - Jonathan M DavisOn Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d wrote:I'd agree with you, hate the special casing. However it seems to me this has been debated to death already, and that auto-decoding was successfully advocated by Alexandrescu and al; surviving the controversy years ago.On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:I'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at. The programmer needs to be in control of whether the code is operating on code units, code points, or graphemes, and because of auto-decoding, we have to constantly avoid using the range primitives for arrays on strings. Tons of range-based code has to special case for strings in order to work around auto-decoding. We're constantly fighting our own API in order to process strings sanely and efficiently.That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?auto-decoding problem was mostly that it couldn't be nogc since throwing, but with further releases exception throwing will get nogc. So it's getting fixed.
Mar 08 2018
On Thursday, 8 March 2018 at 17:14:16 UTC, Jonathan M Davis wrote:On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d wrote:I wasn't so much asking about auto-decoding in particular more about the mentality and methods of breaking changes. In a way any change to the compiler is a breaking change when it comes to the configuration. I for one never expect code to compile on the latest compiler, It has to be the same compiler same version for the code base to work as expected. At one point I envisioned every file with a header that states the version of the compiler required for that module. A sophisticated configuration tool could take and compile each module with its respective version and then one could link. (this could very well be the worst idea ever) I'm not saying we should be quick to change... oh noo that would be very bad. But after you set in the filth of your decisions long and hard and are certian that it is indeed bad there should be a plan for action and change. And when it comes to change it should be an evolution not a revolution. It is good avoiding the so easily accepted mentality of legacy... Why do you do it that way? "It's because we've always done it that way." The reason I like D is often that driven by its community it innovates and renovates into a language that is honestly really fun to use. (most of the time)On Wednesday, 7 March 2018 at 13:24:25 UTC, Jonathan M Davis wrote:Most everyone who debated in favor of it early on is very much against it now (and I'm one of them). Experience and a betterOn Wednesday, March 07, 2018 12:53:16 Guillaume Piolat via Digitalmars-d wrote:I'd agree with you, hate the special casing. However it seems to me this has been debated to death already, and that auto-decoding was successfully advocated by Alexandrescu and al; surviving the controversy years ago.On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:I'd actually argue that that's the lesser of the problems with auto-decoding. The big problem is that it's auto-decoding. Code points are almost always the wrong level to be operating at. The programmer needs to be in control of whether the code is operating on code units, code points, or graphemes, and because of auto-decoding, we have to constantly avoid using the range primitives for arrays on strings. Tons of range-based code has to special case for strings in order to work around auto-decoding. We're constantly fighting our own API in order to process strings sanely and efficiently.That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?auto-decoding problem was mostly that it couldn't be nogc since throwing, but with further releases exception throwing will get nogc. So it's getting fixed.
Mar 08 2018
On Friday, March 09, 2018 03:16:03 Taylor Hillegeist via Digitalmars-d wrote:I wasn't so much asking about auto-decoding in particular more about the mentality and methods of breaking changes. In a way any change to the compiler is a breaking change when it comes to the configuration. I for one never expect code to compile on the latest compiler, It has to be the same compiler same version for the code base to work as expected. At one point I envisioned every file with a header that states the version of the compiler required for that module. A sophisticated configuration tool could take and compile each module with its respective version and then one could link. (this could very well be the worst idea ever) I'm not saying we should be quick to change... oh noo that would be very bad. But after you set in the filth of your decisions long and hard and are certian that it is indeed bad there should be a plan for action and change. And when it comes to change it should be an evolution not a revolution. It is good avoiding the so easily accepted mentality of legacy... Why do you do it that way? "It's because we've always done it that way." The reason I like D is often that driven by its community it innovates and renovates into a language that is honestly really fun to use. (most of the time)Any and all changes need to be weighed for their pros and cons. No one likes it when their code breaks, and ideally, programs would work pretty much forever without modification, but there are changes that are worth dealing with code breakage. Part of the problem is deciding which changes are worth it, and some of that depends on what the migration path would be. Some stuff can be changed with minimal pain, and other stuff can't really be changed without breaking everything. And the more D code that exists, the higher the cost for any change. The drive to make D perfect and the need to be able to use and rely on D code working in production without having to keep changing it are always in conflict. As Walter likes to say, some folks don't want you to break anything, whereas some folks want breaking changes, and they're frequently the same people. Ideally, any D code that you write would work permanently as-is. Also ideally, any and all problems or pain points with D and its standard library would be fixed. Those two things are in complete contradiction of one another, and it's not always easy to judge how to deal with that. Sometimes, it means that we're stuck with legacy decisions, because fixing them is too costly, whereas other times, it means that we deprecate something, and some of the D code out there has to be updated, or it won't compile anymore in a release somewhere in the future. Either way, outright breaking code immediately, with no migration process is pretty much always unacceptable. We'll make breaking changes if we judge the gain to be worth the pain, but we don't want to be constantly breaking people's code, and some changes are large enough that there's arguably no justification for them, because they would simply be too disruptive. Because of how common string processing is and how integrated auto-decoding is into D's string processing, it is very difficult to come up with a way to change it which isn't simply too disruptive to be justified, even though we want to change it. So, this is a particularly difficult case, and how we're going to end up handling it remains to be seen. Thus far, we've mainly worked on providing better ways to get around it, because we can do that without breaking code, whereas actually removing it is extremely difficult. - Jonathan M Davis
Mar 08 2018
On Friday, 9 March 2018 at 06:14:05 UTC, Jonathan M Davis wrote:We'll make breaking changes if we judge the gain to be worth the pain, but we don't want to be constantly breaking people's code, and some changes are large enough that there's arguably no justification for them, because they would simply be too disruptive. Because of how common string processing is and how integrated auto-decoding is into D's string processing, it is very difficult to come up with a way to change it which isn't simply too disruptive to be justified, even though we want to change it. So, this is a particularly difficult case, and how we're going to end up handling it remains to be seen. Thus far, we've mainly worked on providing better ways to get around it, because we can do that without breaking code, whereas actually removing it is extremely difficult. - Jonathan M DavisIt's aleady been said (by myself and others) that we should actually try to remove it (with a compiler switch) and then see what happens, how much code actually breaks, and based on that experience we can come up with a strategy. I've already said that I'm willing to try it on my code (that is almost 100% string processing). Why not _try_ it, later we can still philosophize
Mar 09 2018
On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via Digitalmars-d wrote:On Thursday, March 08, 2018 16:34:11 Guillaume Piolat via Digitalmars-d wrote:[...][...] Yeah, the only reason autodecoding survived in the beginning was because Andrei (wrongly) thought that a Unicode code point was equivalent to a grapheme. If that had been the case, the cost associated with auto-decoding may have been justifiable. Unfortunately, that is not the case, which greatly diminishes most of the advantages that autodecoding was meant to have. So it ended up being something that incurred a significant performance hit, yet did not offer the advantages it was supposed to. To fully live up to Andrei's original vision, it would have to include grapheme segmentation as well. Unfortunately, graphemes are of arbitrary length and cannot in general fit in a single dchar (or any fixed-size type), and grapheme segmentation is extremely costly to compute, so doing it by default would kill D's string manipulation performance. In hindsight, it was obviously a failure and a wrong design decision. Walter is clearly against it after he learned that it comes with a hefty performance cost, and even Andrei himself would admit today that it was a mistake. It's only that he, understandably, does not agree with any change that would disrupt existing code. And that's what we're faced with right now. T -- Frank disagreement binds closer than feigned agreement.I'd agree with you, hate the special casing. However it seems to me this has been debated to death already, and that auto-decoding was successfully advocated by Alexandrescu and al; surviving the controversy years ago.Most everyone who debated in favor of it early on is very much against it now (and I'm one of them). Experience and a better understanding of Unicode has shown it to be a terrible idea. I question that you will find any significant contributor to Phobos who would choose to have it if we were starting from scratch, and most of the folks who post in the newsgroup agree with that.
Mar 08 2018
On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:On Thu, Mar 08, 2018 at 10:14:16AM -0700, Jonathan M Davis via Digitalmars-d wrote:Which companies are against changing this? They must be powerful indeed if their convenience is important enough to protect so destructive features. Even C++ managed to give up trigraphs against the will of IBM. Surely D can give up something that is even more destructive?[...][...][...][...] Yeah, the only reason autodecoding survived in the beginning was because Andrei (wrongly) thought that a Unicode code point was equivalent to a grapheme. If that had been the case, the cost associated with auto-decoding may have been justifiable. Unfortunately, that is not the case, which greatly diminishes most of the advantages that autodecoding was meant to have. So it ended up being something that incurred a significant performance hit, yet did not offer the advantages it was supposed to. To fully live up to Andrei's original vision, it would have to include grapheme segmentation as well. Unfortunately, graphemes are of arbitrary length and cannot in general fit in a single dchar (or any fixed-size type), and grapheme segmentation is extremely costly to compute, so doing it by default would kill D's string manipulation performance. [...]
Mar 08 2018
On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:Yeah, the only reason autodecoding survived in the beginning was because Andrei (wrongly) thought that a Unicode code point was equivalent to a grapheme. If that had been the case, the cost associated with auto-decoding may have been justifiable. Unfortunately, that is not the case, which greatly diminishes most of the advantages that autodecoding was meant to have. So it ended up being something that incurred a significant performance hit, yet did not offer the advantages it was supposed to. To fully live up to Andrei's original vision, it would have to include grapheme segmentation as well. Unfortunately, graphemes are of arbitrary length and cannot in general fit in a single dchar (or any fixed-size type), and grapheme segmentation is extremely costly to compute, so doing it by default would kill D's string manipulation performance.I remember something a bit different last time it was discussed: - removing auto-decoding was breaking a lot of code, it's used in lots of place - performance loss could be mitigated with .byCodeUnit everytime - Andrei correctly advocating against breakage Personally I do use auto-decoding, often iterating by codepoint, and uses it for fonts and parsers. It's correct for a large subset of languages. You gave us a feature and now we are using it ;)
Mar 09 2018
On 3/7/18 1:00 AM, Taylor Hillegeist wrote:So i've seen on the forum over the years arguments about auto-decoding (mostly) and some other things. Things that have been considered mistakes, and cannot be corrected because of the breaking changes it would create. And I always wonder why not make a solution to the tune of a flag that makes things work as they used too, and make the new behavior default. dmd --UseAutoDecoding That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?Note, autodecoding is NOT a feature of the language, but rather a feature of Phobos. It would be quite interesting I think to create a modified phobos where autodecoding was optional and see what happens (could be switched with a -version=autodecoding). It wouldn't take much effort -- just take out the specializations for strings in std.array. -Steve
Mar 07 2018
On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer wrote:On 3/7/18 1:00 AM, Taylor Hillegeist wrote:Well, I tried that already: https://github.com/dlang/phobos/pull/5513 In short: very easy to do, but not much interest at the time.[...]Note, autodecoding is NOT a feature of the language, but rather a feature of Phobos. It would be quite interesting I think to create a modified phobos where autodecoding was optional and see what happens (could be switched with a -version=autodecoding). It wouldn't take much effort -- just take out the specializations for strings in std.array. -Steve
Mar 07 2018
On Wed, Mar 07, 2018 at 04:29:33PM +0000, Seb via Digitalmars-d wrote:On Wednesday, 7 March 2018 at 14:59:35 UTC, Steven Schveighoffer wrote:Argh... this really struck a nerve. "Not much interest"?! I think a more accurate description is every passerby going "that looks dangerous and I don't have enough time to spare to look into it right now, so better just leave it up to somebody else to stick their neck out and get beheaded by Andrei later", resulting in nobody taking apparent interest in the PR, even though many of us *really* want to see autodecoding go the way of the dodo. T -- EMACS = Extremely Massive And Cumbersome SystemOn 3/7/18 1:00 AM, Taylor Hillegeist wrote:Well, I tried that already: https://github.com/dlang/phobos/pull/5513 In short: very easy to do, but not much interest at the time.[...]Note, autodecoding is NOT a feature of the language, but rather a feature of Phobos. It would be quite interesting I think to create a modified phobos where autodecoding was optional and see what happens (could be switched with a -version=autodecoding). It wouldn't take much effort -- just take out the specializations for strings in std.array. -Steve
Mar 07 2018
On Wednesday, 7 March 2018 at 16:29:33 UTC, Seb wrote:Well, I tried that already: https://github.com/dlang/phobos/pull/5513 In short: very easy to do, but not much interest at the time.No. The main problem with that (and the idea of using a compiler flag in general) is that it affects the whole compilation. That means that every single third-party library, not only Phobos, has to work BOTH with and without the switch. IMO, if we find a way to enable or disable autodecoding per module, not per compilation, that will make deprectating it more than worthwhile.
Mar 08 2018
On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:So i've seen on the forum over the years arguments about auto-decoding (mostly) and some other things. Things that have been considered mistakes, and cannot be corrected because of the breaking changes it would create. And I always wonder why not make a solution to the tune of a flag that makes things work as they used too, and make the new behavior default. dmd --UseAutoDecoding That way the breaking change was easily fixable, and the mistakes of the past not forever. Is it just the cost of maintenance?Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos. --Jon
Mar 07 2018
On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:Well you can use byCodeUnit, which disables auto-decoding Though it's not well-known and rather annoying to explicitly add it almost everywhere.[...]Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos. --Jon
Mar 07 2018
On Wed, Mar 07, 2018 at 04:33:25PM +0000, Seb via Digitalmars-d wrote:On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:[...][...]Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos.Well you can use byCodeUnit, which disables auto-decoding Though it's not well-known and rather annoying to explicitly add it almost everywhere.And therein lies the rub: because it's *auto* decoding, rather than just decoding, it's implicit everywhere, adding to the performance hit without the coder being necessarily aware of it. You have to put in the effort to add .byCodeUnit everywhere. Worse yet, it gives the false sense of security that you're doing Unicode "right", when actually that is *not* true at all, because a code point is not equal to a grapheme (what people normally know as a "character"). But because operating at the code point level *appears* to be correct 80% of the time, bugs in string handling often go unnoticed, unlike operating at the code unit level, where any Unicode handling bugs are immediately obvious as soon as your string contains non-ASCII characters. So you're essentially paying the price of a significant performance hit for the dubious benefit of non-100%-correct code, but with bugs conveniently obscured so that it's harder to notice them. Kill autodecoding, I say. Kill it with fire!! T -- MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs
Mar 07 2018
On Wednesday, 7 March 2018 at 17:11:55 UTC, H. S. Teoh wrote:Kill autodecoding, I say. Kill it with fire!! TPlease!!!
Mar 09 2018
On Wednesday, 7 March 2018 at 16:33:25 UTC, Seb wrote:On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote:I looked at this once. It didn't appear to be a viable solution, though I forget the details. I can probably resurrect them if that would be helpful.On Wednesday, 7 March 2018 at 06:00:30 UTC, Taylor Hillegeist wrote:Well you can use byCodeUnit, which disables auto-decoding Though it's not well-known and rather annoying to explicitly add it almost everywhere.[...]Auto-decoding is a significant issue for the applications I work on (search engines). There is a lot of string manipulation in these environments, and performance matters. Auto-decoding is a meaningful performance hit. Otherwise, Phobos has a very nice collection of algorithms for string manipulation. It would be great to have a way to turn auto-decoding off in Phobos.
Mar 07 2018