digitalmars.D - More complexity creep in Phobos

Andrei Alexandrescu (21/21) Mar 27 2019 I started an alphabetical search through std modules, and as luck would

Seb (8/33) Mar 27 2019 See https://github.com/dlang/phobos/pull/3770 for the historical

Andrei Alexandrescu (4/11) Mar 27 2019 "More than half" would be an exaggeration and as such of limited

Meta (38/63) Mar 27 2019 Maybe the implementation of the fix is not ideal, but the reason

Andrei Alexandrescu (23/30) Mar 27 2019 "Mistakes made by people" is missing from that list. Which include mine

Meta (29/51) Mar 27 2019 I agree with your first principles arguments, but not your

Andrei Alexandrescu (2/3) Mar 28 2019 No, please refer to the "Generality creep" thread.

Walter Bright (2/3) Mar 28 2019 https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867...
Kagamin (8/11) Mar 28 2019 DirEntry is implicitly convertible to string because it needs to
Meta (5/8) Mar 28 2019 Last time I checked, curt dismissal is not an argument. I

Andrei Alexandrescu (16/107) Mar 28 2019 Sorry. It was a pertinent response dressed as a curt dismissal. The

Meta (4/24) Mar 28 2019 Thanks, stopped reading your post here. I'll leave the discussion

Andrei Alexandrescu (5/32) Mar 28 2019 No need to convert this into an exchange of broadsides. The appropriate
Andrei Alexandrescu (11/38) Mar 30 2019 The author of this post wrote me a very nice private note in lieu of an

Meta (4/15) Apr 01 2019 I appreciate the gesture Andrei, and have the utmost respect for

Andrei Alexandrescu (2/2) Mar 28 2019 BTW the intended title was "More generality creep in Phobos". Turns out

David Gileadi (2/4) Mar 28 2019 Creepy!
touchaa (4/6) May 02 2019 DirEntry is implicitly convertible to string because it needs to

Jonathan M Davis (60/81) Mar 30 2019 This particular case is one that's a bit tricky. The correct solution wi...

Andrei Alexandrescu (7/8) Mar 30 2019 Word.

H. S. Teoh (6/16) Apr 01 2019 +1. That's the correct approach. If std.v2 ever happens, this is
Olivier FAURE (17/23) Apr 01 2019 Why?

Andrei Alexandrescu (19/20) Apr 01 2019 You can't do that. New strings are created frequently by concatenation,
FeepingCreature (9/14) May 03 2019 Well, you see, strings have the disadvantage of being constructs

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I started an alphabetical search through std modules, and as luck would 
have it I got as far as the first module: std/algorithm/comparison.d. In 
there, there's these overloads:

size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
     (Range1 s, Range2 t)
if (isForwardRange!(Range1) && isForwardRange!(Range2));

size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
     (auto ref Range1 s, auto ref Range2 t)
if (isConvertibleToString!Range1 || isConvertibleToString!Range2)

(similar for levenshteinDistanceAndPath)

What's with the second overload nonsense? The Levenshtein algorithm 
works on forward ranges. And that's about it, in a profound sense: 
Platonically what the algorithm needs is two forward ranges to operate. 
(We ought to be pretty proud of it, too: in all other languages I looked 
at, it's misimplemented to require random access.)

The second overload comes from a warped sense of DWIM: well if this type 
converts to a string, we commit to support that too. Really. How about a 
struct that converts to an array? Should we put in a pull request to 
that, too? Where do we even stop?

I hope there's not much more of this nonsense, because it all should be 
deprecated with fire.

Mar 27 2019

Seb <seb wilzba.ch> writes:

On Thursday, 28 March 2019 at 02:17:35 UTC, Andrei Alexandrescu 
wrote:
 I started an alphabetical search through std modules, and as 
 luck would have it I got as far as the first module: 
 std/algorithm/comparison.d. In there, there's these overloads:

 size_t levenshteinDistance(alias equals = (a,b) => a == b, 
 Range1, Range2)
     (Range1 s, Range2 t)
 if (isForwardRange!(Range1) && isForwardRange!(Range2));

 size_t levenshteinDistance(alias equals = (a,b) => a == b, 
 Range1, Range2)
     (auto ref Range1 s, auto ref Range2 t)
 if (isConvertibleToString!Range1 || 
 isConvertibleToString!Range2)

 (similar for levenshteinDistanceAndPath)

 What's with the second overload nonsense? The Levenshtein 
 algorithm works on forward ranges. And that's about it, in a 
 profound sense: Platonically what the algorithm needs is two 
 forward ranges to operate. (We ought to be pretty proud of it, 
 too: in all other languages I looked at, it's misimplemented to 
 require random access.)

 The second overload comes from a warped sense of DWIM: well if 
 this type converts to a string, we commit to support that too. 
 Really. How about a struct that converts to an array? Should we 
 put in a pull request to that, too? Where do we even stop?

See https://github.com/dlang/phobos/pull/3770 for the historical 
reason.

 I hope there's not much more of this nonsense, because it all 
 should be deprecated with fire.

Then we would have to deprecate more than half of Phobos, because 
a lot of similar cruft got aggregated over the last ten years. 
Many of these crufts can't be as easily deprecated as this 
example...

Mar 27 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/19 10:45 PM, Seb wrote:
 See https://github.com/dlang/phobos/pull/3770 for the historical reason.

Thanks. Yep, the proverbial good intentions paving the road to hell.

 I hope there's not much more of this nonsense, because it all should 
 be deprecated with fire.

 
 Then we would have to deprecate more than half of Phobos, because a lot 
 of similar cruft got aggregated over the last ten years. Many of these 
 crufts can't be as easily deprecated as this example...

"More than half" would be an exaggeration and as such of limited 
usefulness. I looked at a few more modules and they're in better shape.

Mar 27 2019

Meta <jared771 gmail.com> writes:

On Thursday, 28 March 2019 at 02:17:35 UTC, Andrei Alexandrescu 
wrote:
 I started an alphabetical search through std modules, and as 
 luck would have it I got as far as the first module: 
 std/algorithm/comparison.d. In there, there's these overloads:

 size_t levenshteinDistance(alias equals = (a,b) => a == b, 
 Range1, Range2)
     (Range1 s, Range2 t)
 if (isForwardRange!(Range1) && isForwardRange!(Range2));

 size_t levenshteinDistance(alias equals = (a,b) => a == b, 
 Range1, Range2)
     (auto ref Range1 s, auto ref Range2 t)
 if (isConvertibleToString!Range1 || 
 isConvertibleToString!Range2)

 (similar for levenshteinDistanceAndPath)

 What's with the second overload nonsense? The Levenshtein 
 algorithm works on forward ranges. And that's about it, in a 
 profound sense: Platonically what the algorithm needs is two 
 forward ranges to operate. (We ought to be pretty proud of it, 
 too: in all other languages I looked at, it's misimplemented to 
 require random access.)

 The second overload comes from a warped sense of DWIM: well if 
 this type converts to a string, we commit to support that too. 
 Really. How about a struct that converts to an array? Should we 
 put in a pull request to that, too? Where do we even stop?

 I hope there's not much more of this nonsense, because it all 
 should be deprecated with fire.

Maybe the implementation of the fix is not ideal, but the reason 
it was added in the first place is valid, IMO. Looking at the 
original issue[1], the minimized example is as follows:

void popFront(T)(ref T[] a) { a = a[1..$]; }

enum bool isInputRange(R) = is(typeof(
{
     R r;
     r.popFront();
}));

struct DirEntry
{
      property string name() { return ""; }
     alias name this;
}
pragma(msg, isInputRange!DirEntry); // prints 'false'
pragma(msg, isInputRange!(typeof(DirEntry.init.name))); // prints 
'true'

bool isDir(R)(R r) if (isInputRange!R) { return true; }

void main()
{
     DirEntry de;
     bool c = isDir(de); // Error: isDir cannot deduce function 
from argument types !()(DirEntry)
}

And trying this code out on dmd-nightly[2], it still fails today. 
Personally, I think it would be a bad thing if code that uses a 
DirEntry like above stopped compiling, but I agree that the fix 
could be implemented differently.

It seems like if you see some weird code in Phobos but you don't 
understand why it was written that way, there are 3 main reasons 
that account for 99% of these cases:

- string (autodecoding)
- alias this
- enums

1. https://issues.dlang.org/show_bug.cgi?id=15027
2. https://run.dlang.io/is/zMtHs0

Mar 27 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/19 11:22 PM, Meta wrote:
 It seems like if you see some weird code in Phobos but you don't 
 understand why it was written that way, there are 3 main reasons that 
 account for 99% of these cases:
 
 - string (autodecoding)
 - alias this
 - enums

"Mistakes made by people" is missing from that list. Which include mine 
of course.

There's this nice notion of reasoning by first principles vs. reasoning 
by analogy:

https://fs.blog/2018/04/first-principles/

A very nice essay - recommended outside this discussion's context, too. 
The levenshteinDistance issue is a direct application of the two kinds 
of reasoning. This is reasoning by analogy:

"DirEntry converts to string and is used by some people as such. They 
pass it to functions, and some accept it but some don't. It follows by 
analogy that the Levenshtein distance algorithm should be worked out to 
accept it, too."

In contrast, the reasoning from first principles should powerfully 
override that:

"Levenshtein distance operates on forward ranges. All that stuff that 
changes the signature of levenshteinDistance to accept other artifacts 
is nonsense. Whoever wants to use it should carry the conversion to 
forward range themselves."

It follows that the correct answer to this generalization frenzy should 
be: "We work with ranges and character types. If you've got enums and 
alias this and whatnot, more power to you but to use the standard 
library you must convert those to the stuff we support."

Mar 27 2019

Meta <jared771 gmail.com> writes:

On Thursday, 28 March 2019 at 04:20:55 UTC, Andrei Alexandrescu 
wrote:
 There's this nice notion of reasoning by first principles vs. 
 reasoning by analogy:

 https://fs.blog/2018/04/first-principles/

 A very nice essay - recommended outside this discussion's 
 context, too. The levenshteinDistance issue is a direct 
 application of the two kinds of reasoning. This is reasoning by 
 analogy:

 "DirEntry converts to string and is used by some people as 
 such. They pass it to functions, and some accept it but some 
 don't. It follows by analogy that the Levenshtein distance 
 algorithm should be worked out to accept it, too."
 In contrast, the reasoning from first principles should 
 powerfully override that:

 "Levenshtein distance operates on forward ranges. All that 
 stuff that changes the signature of levenshteinDistance to 
 accept other artifacts is nonsense. Whoever wants to use it 
 should carry the conversion to forward range themselves."

 It follows that the correct answer to this generalization 
 frenzy should be: "We work with ranges and character types. If 
 you've got enums and alias this and whatnot, more power to you 
 but to use the standard library you must convert those to the 
 stuff we support."

I agree with your first principles arguments, but not your 
conclusion. I believe you are starting with the faulty base 
assumption that DirEntry is not a range. DirEntry IS a range, by 
the rules of the current language.

DirEntry states that it is a subtype of `string` by declaring 
`alias name this`. It follows that a DirEntry may be 
transparently substituted wherever a string is accepted. As a 
string is a range, and DirEntry is a subtype of string, then 
DirEntry is also a range.

`levenshteinDistance` accepts two ranges as its arguments, either 
of which may be strings; therefore, either of its arguments may 
also be a DirEntry substituted in the place of a string. By writ, 
we must support passing a DirEntry to levenshteinDistance.



The real problem here is with `alias this`.

`alias this` claims to allow one type to become a subtype of 
another, but that's not true; this language feature violates the 
Liskov Substitution Principle (a fact that I have mentioned 
before, and likely others). `alias this` fails the 
substitutability test - this thread and the defect linked are 
proof of that.

Because `alias this` claims to allow one type A to subtype 
another type B, but does not actually make good on that promise 
and implements this subtyping improperly, it follows that any 
code working with type B that wants to also support type A has to 
use these ugly workarounds.

The code in question is working around a defect in the language, 
and would not be necessary if the language itself were fixed.

Mar 27 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

No, please refer to the "Generality creep" thread.

Mar 28 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2019 12:32 AM, Andrei Alexandrescu wrote:
 No, please refer to the "Generality creep" thread.

https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html

Mar 28 2019

Kagamin <spam here.lot> writes:

On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu 
wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

DirEntry is implicitly convertible to string because it needs to 
be usable with std.file, which is a big untyped ball of strings.
Compare with scriptlike 
https://github.com/abscissa/scriptlike#filepaths that has typed 
wrappers for paths (though I prefer separate wrappers for files 
and folders).

Mar 28 2019

Meta <jared771 gmail.com> writes:

On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu 
wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

Last time I checked, curt dismissal is not an argument. I 
maintain that this is not "generality creep", but working around 
a broken language feature.

Mar 28 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/19 10:49 AM, Meta wrote:
 On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

 
 Last time I checked, curt dismissal is not an argument.

Sorry. It was a pertinent response dressed as a curt dismissal. The 
opening post of the "Generality creep" is trivial to find and contains 
an explanation of exactly this matter. Walter posted a link to it, too 
(thanks). Here it is:

https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html

What remains is that I copy and paste it for you, which I'll do at the 
end of this message.

I actually stopped reading your post following your assertion, because I 
inferred it would be increasingly wrong because it is based on a wrong 
assumption. Just now I went and read it, and indeed it is so.

 I maintain that 
 this is not "generality creep", but working around a broken language 
 feature.

You are wrong in a sense (factually that is not subtyping, it's 
coercion) and right in another (it's questionable to allow odd forms of 
alias this in the language in the first place). I wouldn't disagree 
either way. Depends on the angle.

Here's the text explaining why alias this to an rvalue is not subtyping:

 Walter and I were looking at indexOf in the standard library. It has several
overloads, of which let's look at the first two:
 
 https://dlang.org/library/std/string/index_of.html
 
 Now, why couldn't they be merged into a single function with this constraint
(their bodies are identical):
 
 ptrdiff_t indexOf(Range) (
   Range s,
   dchar c,
   CaseSensitive cs = Yes.caseSensitive
 )
 if (isInputRange!Range && isSomeChar!(ElementType!Range));
 
 It makes sense to look for a character in any range of characters.
 
 Attempting to build that fails because of this type fails to work:
 
     struct TestAliasedString
     {
         string get()  safe  nogc pure nothrow { return _s; }
         alias get this;
          disable this(this);
         string _s;
     }
 
 The intuition is that the function should be general enough to figure out
that, hey, TestAliasedString is kinda sorta a subtype of string. So the call
should work.
 
 So let's see why it doesn't work - i.e. why is TestAliasedString an input
range? The definition of isInputRange (in std.range.primitives) is:
 
 enum bool isInputRange(R) =
     is(typeof(R.init) == R)
     && is(ReturnType!((R r) => r.empty) == bool)
     && is(typeof((return ref R r) => r.front))
     && !is(ReturnType!((R r) => r.front) == void)
     && is(typeof((R r) => r.popFront));
 
 Turns out the second clause fails. That takes us to the definition of empty in
the same module:
 
  property bool empty(T)(auto ref scope const(T) a)
 if (is(typeof(a.length) : size_t))
 {
     return !a.length;
 }
 
 The intent is fairly clear - if a range defines empty as a size_t (somewhat
oddly relaxed to "convertible to size_t"), then empty can be nicely defined in
terms of length. Cool. But empty doesn't work with TestAliasedString due to an
overlooked matter: the "const". A mutable TestAliasedString converts to a
string, but a const or immutable TestAliasedString does NOT convert to a const
string! So this fixes that matter:
 
     struct TestAliasedString
     {
         string get()  safe  nogc pure nothrow { return _s; }
         const(string) get()  safe  nogc pure nothrow const { return _s; }
         alias get this;
          disable this(this);
         string _s;
     }
 
 That makes empty() work, but also raises a nagging question: what was the
relationship of TestAliasedString to string before this change? Surely that
wasn't subtyping. (My response would be: "Odd.") And why was Phobos under the
obligation to cater for such a type and its tenuous relationship to a range?
 
 But wait, there's more. Things still don't work because of popFront. Looking
at its definition:
 
 void popFront(C)(scope ref inout(C)[] str)  trusted pure nothrow
 if (isNarrowString!(C[]))
 { ... }
 
 So, reasonably this function takes the range by reference so it can modify its
internals. HOWEVER! The implementation of TestAliasedString.get() returns an
rvalue, i.e. it's the equivalent of a conversion involving a temporary. Surely
that's not to match, whether in the current language or the one after the
rvalue DIP.
 
 The change that does make the code work is:
 
     struct TestAliasedString
     {
         ref string get()  safe  nogc pure nothrow { return _s; }
         ref const(string) get()  safe  nogc pure nothrow const { return _s; }
         alias get this;
          disable this(this);
         string _s;
     }
 
 This indeed does implement a subtyping relationship, and passes the
isInputRange test.
 
 What's the moral of the story here? Generality is good, but it seems in
several places in phobos (of which this is just one example), a combination of
vague specification and aiming for a nice ideal of "work with anything remotely
reasonable" has backfired into a morass of inconsistently defined and supported
corner cases.
 
 For this case in particular - I don't think we should support all types that
support some half-hearted form of subtyping, at the cost of reducing generality
and deprecating working code.

Mar 28 2019

Meta <jared771 gmail.com> writes:

On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu 
wrote:
 On 3/28/19 10:49 AM, Meta wrote:
 On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei 
 Alexandrescu wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

 
 Last time I checked, curt dismissal is not an argument.

 Sorry. It was a pertinent response dressed as a curt dismissal. 
 The opening post of the "Generality creep" is trivial to find 
 and contains an explanation of exactly this matter. Walter 
 posted a link to it, too (thanks). Here it is:

 https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html

 What remains is that I copy and paste it for you, which I'll do 
 at the end of this message.

 I actually stopped reading your post following your assertion, 
 because I inferred it would be increasingly wrong because it is 
 based on a wrong assumption. Just now I went and read it, and 
 indeed it is so.

Thanks, stopped reading your post here. I'll leave the discussion 
to those more qualified.

Mar 28 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/19 12:23 PM, Meta wrote:
 On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu wrote:
 On 3/28/19 10:49 AM, Meta wrote:
 On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

 Last time I checked, curt dismissal is not an argument.

 Sorry. It was a pertinent response dressed as a curt dismissal. The 
 opening post of the "Generality creep" is trivial to find and contains 
 an explanation of exactly this matter. Walter posted a link to it, too 
 (thanks). Here it is:

 https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html 


 What remains is that I copy and paste it for you, which I'll do at the 
 end of this message.

 I actually stopped reading your post following your assertion, because 
 I inferred it would be increasingly wrong because it is based on a 
 wrong assumption. Just now I went and read it, and indeed it is so.

 
 Thanks, stopped reading your post here. I'll leave the discussion to 
 those more qualified.

No need to convert this into an exchange of broadsides. The appropriate 
response is as simple as "oh ok so that's not subtyping, interesting" - 
which was my reaction as well. It's a subtle matter but once seen it is 
clear.

Mar 28 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/28/19 12:23 PM, Meta wrote:
 On Thursday, 28 March 2019 at 16:12:34 UTC, Andrei Alexandrescu wrote:
 On 3/28/19 10:49 AM, Meta wrote:
 On Thursday, 28 March 2019 at 07:32:33 UTC, Andrei Alexandrescu wrote:
 On 3/28/19 1:36 AM, Meta wrote:
 DirEntry states that it is a subtype of `string`

 No, please refer to the "Generality creep" thread.

 Last time I checked, curt dismissal is not an argument.

 Sorry. It was a pertinent response dressed as a curt dismissal. The 
 opening post of the "Generality creep" is trivial to find and contains 
 an explanation of exactly this matter. Walter posted a link to it, too 
 (thanks). Here it is:

 https://digitalmars.com/d/archives/digitalmars/D/Generality_creep_324867.html 


 What remains is that I copy and paste it for you, which I'll do at the 
 end of this message.

 I actually stopped reading your post following your assertion, because 
 I inferred it would be increasingly wrong because it is based on a 
 wrong assumption. Just now I went and read it, and indeed it is so.

 
 Thanks, stopped reading your post here. I'll leave the discussion to 
 those more qualified.

The author of this post wrote me a very nice private note in lieu of an 
unpleasant public response to my nasty way of handling this exchange. He 
refrained from posting in the spirit of "write letters to people you 
hate, then burn them" (as the joke goes, "okay, then what do I do to the 
letters?")

Anyhow, I wanted to apologize to Meta in the same place where the 
offense take place. I am sorry for the unkind tone I took to.

An explanation (but in no way a justification) of this came from a 
friend: "The more sure you are of being right, the more insufferable you 
become."

Mar 30 2019

Meta <jared771 gmail.com> writes:

On Saturday, 30 March 2019 at 22:28:32 UTC, Andrei Alexandrescu 
wrote:
 The author of this post wrote me a very nice private note in 
 lieu of an unpleasant public response to my nasty way of 
 handling this exchange. He refrained from posting in the spirit 
 of "write letters to people you hate, then burn them" (as the 
 joke goes, "okay, then what do I do to the letters?")

 Anyhow, I wanted to apologize to Meta in the same place where 
 the offense take place. I am sorry for the unkind tone I took 
 to.

 An explanation (but in no way a justification) of this came 
 from a friend: "The more sure you are of being right, the more 
 insufferable you become."

I appreciate the gesture Andrei, and have the utmost respect for 
people who are willing to apologize.

Apr 01 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BTW the intended title was "More generality creep in Phobos". Turns out 
generality creep begets complexity creep, too...

Mar 28 2019

David Gileadi <gileadisNOSPM gmail.com> writes:

On 3/28/19 5:38 AM, Andrei Alexandrescu wrote:
 BTW the intended title was "More generality creep in Phobos". Turns out 
 generality creep begets complexity creep, too...

Creepy!

Mar 28 2019

touchaa <yasiin.jakorey bullbeer.net> writes:

On Thursday, 28 March 2019 at 12:38:58 UTC, Andrei Alexandrescu 
wrote:
 BTW the intended title was "More generality creep in Phobos". 
 Turns out generality creep begets complexity creep, too...

DirEntry is implicitly convertible to string because it needs to 
be usable with std.file, which is a big untyped ball of strings.

May 02 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, March 27, 2019 8:17:35 PM MDT Andrei Alexandrescu via 
Digitalmars-d wrote:
 I started an alphabetical search through std modules, and as luck would
 have it I got as far as the first module: std/algorithm/comparison.d. In
 there, there's these overloads:

 size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
 (Range1 s, Range2 t)
 if (isForwardRange!(Range1) && isForwardRange!(Range2));

 size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
 (auto ref Range1 s, auto ref Range2 t)
 if (isConvertibleToString!Range1 || isConvertibleToString!Range2)

 (similar for levenshteinDistanceAndPath)

 What's with the second overload nonsense? The Levenshtein algorithm
 works on forward ranges. And that's about it, in a profound sense:
 Platonically what the algorithm needs is two forward ranges to operate.
 (We ought to be pretty proud of it, too: in all other languages I looked
 at, it's misimplemented to require random access.)

 The second overload comes from a warped sense of DWIM: well if this type
 converts to a string, we commit to support that too. Really. How about a
 struct that converts to an array? Should we put in a pull request to
 that, too? Where do we even stop?

 I hope there's not much more of this nonsense, because it all should be
 deprecated with fire.

This particular case is one that's a bit tricky. The correct solution with
stuff like this IMHO is to require that you pass actual forward ranges and
not types that convert to types that are forward ranges (like strings). The
problem is that a lot of code like this was originally written to take
strings and then generalized later, and when a function takes something like
string or const(char)[], it's going to implicitly convert varous types to
the target type at the call site. In contrast, you get no such implicit
conversion with a template constraint.

The result is that a bunch of functions in Phobos now use
isConvertibleToString on a separate overload to then convert the argument to
a string and pass it to the primary function. This is very, very wrong even
if it's not obvious. The problem is that the conversion then happens inside
the function instead of at the call site, which means that you get fun stuff
like when a static array is passed in, it's sliced, and if the function
returns any portion of the range it's given, then you're returning a slice
of the stack that is then invalid. The only way to make this work is to have
two overloads - the primary one, and one that accepts arrays specifically.
e.g.

auto foo(R)(R range)
    if(isForwardRange!R && ...)
{
    return fooImpl(range);
}

auto foo(C)(C[] str)
{
    return fooImpl(range);
}

private auto fooImpl(R)(R range)
{
}

That way, the conversion happens at the call site like it did when the
function accepted string. Similar problems exist in general with accepting
implicit conversions with a template constraint and is generally not
something that we should be doing unless it's for something really simple
like bool.

This is not at all a nice way to have to write this code, but it is required
to avoid code breakage when making the function work on ranges rather than
just strings. isConvertibleToString _could_ be used internally with static
if to avoid the additional overload, but again, it runs into the problem
that the implicit conversion is done internally. So, to support the implicit
conversion, a second overload is required.

The way that these functions _should_ have been written was to have them
operate on ranges in the first place, and then they wouldn't accept any
implict conversions from stuff like enums, but many of these functions
weren't. I don't know if that's quite the problem with levenshteinDistance,
but it's definitely causing a lot of the ugly overloads in Phobos with
anything string-related.

Unfortunately, I don't think that it's actually possible to just deprecate
the overload that accepts implicit conversions, because that's also the
overload that accepts strings. So, for many of these functions, I think that
we're stuck with having two overloads unless we do something like std.v2 and
have more of a hard breakage.

Either way, isConvertibleToString must go. Using it in a template constraint
is just begging for bugs, because the implicit conversion won't happen at
the call site like it needs to. I've done some work towards removing it from
Phobos, but due to health issues, I haven't had enough time to finish the
job.

- Jonathan M Davis

Mar 30 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/19 3:22 PM, Jonathan M Davis wrote:
 Either way, isConvertibleToString must go.

Word.

We should have one rcstring type that is reference counted, uses UTF8, 
is not a range, offers ranges (bytes, codepoints, graphemes), and 
consolidate around it. Algorithms should only use ranges, and whenever 
someone wants to use an algorithm with a string, they choose the 
iteration mode that fits the application.

Mar 30 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Mar 30, 2019 at 06:31:05PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 3/30/19 3:22 PM, Jonathan M Davis wrote:
 Either way, isConvertibleToString must go.

 
 Word.
 
 We should have one rcstring type that is reference counted, uses UTF8,
 is not a range, offers ranges (bytes, codepoints, graphemes), and
 consolidate around it. Algorithms should only use ranges, and whenever
 someone wants to use an algorithm with a string, they choose the
 iteration mode that fits the application.

+1.  That's the correct approach.  If std.v2 ever happens, this is
definitely the way to go.


T

-- 
Insanity is doing the same thing over and over again and expecting different
results.

Apr 01 2019

Olivier FAURE <couteaubleu gmail.com> writes:

On Saturday, 30 March 2019 at 22:31:05 UTC, Andrei Alexandrescu 
wrote:
 We should have one rcstring type that is reference counted, 
 uses UTF8, is not a range, offers ranges (bytes, codepoints, 
 graphemes), and consolidate around it. Algorithms should only 
 use ranges, and whenever someone wants to use an algorithm with 
 a string, they choose the iteration mode that fits the 
 application.

Why?

Strings have the advantage of being extremely simple constructs 
that represent exactly the right abstraction: they're a slice of 
chars, period. They can be scoped, sliced and concatenated just 
like any other range.

Getting rid of auto-decoding would be good, but adding a whole 
new layer of abstraction and special cases on top of it, with a 
dedicated allocation strategy C++ style, seems superfluous at 
best.

Honestly, it sounds like the kind of thing that will make 
reference-counting the new class, where 10 years from now people 
will be told "Yeah, it would be more convenient to do things this 
way, but back when this standard library was written we were 
using reference-counting everywhere, and now we're stuck with it".

tl;dr Keep strings and reference-counting separate.

Apr 01 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/1/19 2:13 PM, Olivier FAURE wrote:
 tl;dr Keep strings and reference-counting separate.

You can't do that. New strings are created frequently by concatenation, 
they're best thought of as units - like int. They need to handle their 
own allocation, there's no two ways about it.

Autodecoding hurts "half" of the users - those who don't need it for the 
task. De-encapsulating string as an unstructured slice hurts the other 
"half" - those who need things like Unicode semantics, simple string 
manipulation, and good use of memory.

The view that strings must be char[] and that that's the "simple" choice 
has been a disaster for D. (I believe I have started to convince Walter 
of that.) It's simple like Go is simple - forcing complexity realities 
on users. Incomparably larger than autodecoding. It has forced virtually 
all string-processing code in D to use the garbage collector. Immutable 
was a short of adrenaline that prolonged the lifetime of strings by ten 
years. Before that, having unstructured strings that were also weirdly 
mutable was a double disaster.

We must have a UTF8 reference counted string that assists built-in 
literals of type immutable(char)[] and rally the entire language 
ecosystem around those.

Apr 01 2019

FeepingCreature <feepingcreature gmail.com> writes:

On Monday, 1 April 2019 at 18:13:55 UTC, Olivier FAURE wrote:
 Why?

 Strings have the advantage of being extremely simple constructs 
 that represent exactly the right abstraction: they're a slice 
 of chars, period. They can be scoped, sliced and concatenated 
 just like any other range.

Well, you see, strings have the disadvantage of being constructs 
that represent exactly the wrong abstraction: they're a slice of 
chars, period.

To see the problem with this, consider:

writefln(
   "äöü are %s letters; the second one is %s",
   "äöü".length,
   cast(ubyte[]) "äöü"[1..2]);

May 03 2019

D Programming

C/C++ Programming

Other

digitalmars.D - More complexity creep in Phobos