www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More fun with autodecoding

reply Steven Schveighoffer <schveiguy gmail.com> writes:
I wanted to share a story where I actually tried to add a new type with 
autodecoding and failed.

I want to create a wrapper type that forwards an underlying range type 
but adds one feature -- tracking in the original range where you were. 
This is in a new library I'm writing for parsing.

So my first idea was I will just forward all methods from a given range 
manually -- I need to override certain ones which affect the offset into 
the original range.

However, typically parsing is done from text.

I realized, strings are a range of dchar, but I need the length and 
other things forwarded so they can be drop-in replacements for strings 
(I treat strings wstrings as character buffers in iopipe). However, 
phobos will then assume length() as the number of dchar elements, and 
assume it has indexing, etc.! Here is a case where I can't repeat the 
mistakes of phobos of auto-decoding for my own type! I never thought I'd 
have that problem...

So I thought, maybe I'll just alias this the underlying range and only 
override the parts that are needed. I end up with a nice tiny 
definition, and things are looking pretty good:

     static struct Result
     {
         private size_t pos;
         B _buffer;
         alias _buffer this;

         // implement the slice operations
         size_t[2] opSlice(size_t dim)(int start, int end) if (dim == 0)
         in
         { assert(start >= 0 && end <= _buffer.length); }
         do
         {
             return [start, end];
         }

         Result opIndex(size_t[2] dims)
         {
             return Result(pos + dims[0], _buffer[dims[0] .. dims[1]]);
         }

         void popFront()
         {
             import std.traits : isNarrowString;
             static if(isNarrowString!B)
             {
                 auto prevLen = _buffer.length;
                 _buffer.popFront;
                 pos += prevLen - _buffer.length;
             }
             else
             {
                 _buffer.popFront;
                 ++pos;
             }
         }

         // the specialized buffer reference accessor.
          property auto bufRef()
         {
             return BufRef(pos, _buffer.length);
         }
     }

Note already the sucky part in popFront.

But then I got a surprise when I went to use it:

     import std.algorithm : splitter;
     auto buf = "hi there this is a sentence";
     auto split1 = buf.bwin.splitter; // specialized split range
     auto split2 = buf.splitter; // normal split range
     while(!split1.empty)
     {
         assert(split1.front == split2.front);
         assert(split1.front.bufRef.concrete(buf) == split2.front); // 
FAILS!
         split1.popFront;
         split2.popfront;
     }

What happened? It turns out, the splitter looks for length and indexing 
*OR* that it is a narrow string. Splitter is trying to ignore the fact 
that Phobos forces autodecoding on char arrays to achieve performance. 
With this taken into account, I think my type does not pass any of the 
constraints for any of the overloads (not 100% sure on that), so it 
devolves to just using the alias this'd element directly, completely 
circumventing the point of my wrapper. The error I get is "no member 
`bufRef` for type `string`".

My next attempt will be to use byCodeUnit when I detect a narrow string, 
which hopefully will work OK. But I'm not sure if the performance is 
going to be the same, since now it will likely FORCE autodecoding on the 
algorithms that have specialized versions to AVOID autodecoding (I think).

I'm very tempted to start writing my own parsing utilities and avoid 
using Phobos algorithms...

-Steve
Aug 06 2018
next sibling parent bauss <jj_1337 live.dk> writes:
On Monday, 6 August 2018 at 13:57:10 UTC, Steven Schveighoffer 
wrote:
 I'm very tempted to start writing my own parsing utilities and 
 avoid using Phobos algorithms...

 -Steve
Oh yes; the good old autodecoding.
Aug 08 2018
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/6/2018 6:57 AM, Steven Schveighoffer wrote:
 But I'm not sure if the performance is going to be the 
 same, since now it will likely FORCE autodecoding on the algorithms that have 
 specialized versions to AVOID autodecoding (I think).
Autodecoding is expensive which is why the algorithms defeat it. Nearly none actually need it. You can get decoding if needed by using .byDchar or .by!dchar (forgot which it was).
Aug 08 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/8/18 4:13 PM, Walter Bright wrote:
 On 8/6/2018 6:57 AM, Steven Schveighoffer wrote:
 But I'm not sure if the performance is going to be the same, since now 
 it will likely FORCE autodecoding on the algorithms that have 
 specialized versions to AVOID autodecoding (I think).
Autodecoding is expensive which is why the algorithms defeat it. Nearly none actually need it. You can get decoding if needed by using .byDchar or .by!dchar (forgot which it was).
There is byCodePoint and byCodeUnit, whereas byCodePoint forces auto decoding. The problem is, I want to use this wrapper just like it was a string in all respects (including the performance gains had by ignoring auto-decoding). Not trying to give too much away about the library I'm writing, but the problem I'm trying to solve is parsing out tokens from a buffer. I want to delineate the whole, as well as the parts, but it's difficult to get back to the original buffer once you split and slice up the buffer using phobos functions. Consider that you are searching for something in a buffer. Phobos provides all you need to narrow down your range to the thing you are looking for. But it doesn't give you a way to figure out where you are in the whole buffer. Up till now, I've done it by weird length math, but it gets tiring (see for instance: https://github.com/schveiguy/fastaq/blob/master/source/fasta/fasta.d#L125). I just want to know where the darned thing I've narrowed down is in the original range! So this wrapper I thought would be a way to use things like you always do, but at any point, you just extract a piece of information (a buffer reference) that shows where it is in the original buffer. It's quite easy to do that part, the problem is getting it to be a drop-in replacement for the original type. Here's where I'm struggling -- because a string provides indexing, slicing, length, etc. but Phobos ignores that. I can't make a new type that does the same thing. Not only that, but I'm finding the specializations of algorithms only work on the type "string", and nothing else. I'll try using byCodeUnit and see how it fares. -Steve
Aug 08 2018
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
 Here's where I'm struggling -- because a string provides indexing, slicing, 
 length, etc. but Phobos ignores that. I can't make a new type that does the
same 
 thing. Not only that, but I'm finding the specializations of algorithms only 
 work on the type "string", and nothing else.
One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit.
Aug 08 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/9/18 2:44 AM, Walter Bright wrote:
 On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
 Here's where I'm struggling -- because a string provides indexing, 
 slicing, length, etc. but Phobos ignores that. I can't make a new type 
 that does the same thing. Not only that, but I'm finding the 
 specializations of algorithms only work on the type "string", and 
 nothing else.
One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit.
So it turns out that technically the problem here, even though it seemed like an autodecoding problem, is a problem with splitter. splitter doesn't deal with encodings of character ranges at all. For instance, when you have this: "abc 123".byCodeUnit.splitter; What happens is splitter only has one overload that takes one parameter, and that requires a character *array*, not a range. So the byCodeUnit result is aliased-this to its original, and surprise! the elements from that splitter are string. Next, I tried to use a parameter: "abc 123".byCodeUnit.splitter(" "); Nope, still devolves to string. It turns out it can't figure out how to split character ranges using a character array as input. The only thing that does seem to work is this: "abc 123".byCodeUnit.splitter(" ".byCodeUnit); But this goes against most algorithms in Phobos that deal with character ranges -- generally you can use any width character range, and it just works. Having a drop-in replacement for string would require splitter to handle these transcodings (and I think in general, algorithms should be able to handle them as well). Not only that, but the specialized splitter that takes no separator can split on multiple spaces, a feature I want to have for my drop-in replacement. I'll work on adding some issues to the tracker, and potentially doing some PRs so they can be fixed. -Steve
Sep 08 2018
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/8/18 8:36 AM, Steven Schveighoffer wrote:


Sent this when I was on a plane, and for some reason it posted with the 
timestamp when I hit "send later", not when I connected just now. So 
this is to bring the previous message back to the forefront.

-Steve
Sep 09 2018
prev sibling next sibling parent Jon Degenhardt <jond noreply.com> writes:
On Saturday, 8 September 2018 at 15:36:25 UTC, Steven 
Schveighoffer wrote:
 On 8/9/18 2:44 AM, Walter Bright wrote:
 On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
 Here's where I'm struggling -- because a string provides 
 indexing, slicing, length, etc. but Phobos ignores that. I 
 can't make a new type that does the same thing. Not only 
 that, but I'm finding the specializations of algorithms only 
 work on the type "string", and nothing else.
One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit.
So it turns out that technically the problem here, even though it seemed like an autodecoding problem, is a problem with splitter. splitter doesn't deal with encodings of character ranges at all.
This could partially explain why when I tried byCodeUnit and friends awhile ago I concluded it wasn't a reasonable approach: splitter is in the middle of much of what I've written. Even if splitter is changed I'll still be very doubtful about the byCodeUnit approach as a work-around. An automated way to validate that it is engaged only when necessary would be very helpful ( noautodecode perhaps? :)) --Jon
Sep 09 2018
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, September 8, 2018 9:36:25 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
 On 8/9/18 2:44 AM, Walter Bright wrote:
 On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
 Here's where I'm struggling -- because a string provides indexing,
 slicing, length, etc. but Phobos ignores that. I can't make a new type
 that does the same thing. Not only that, but I'm finding the
 specializations of algorithms only work on the type "string", and
 nothing else.
One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit.
So it turns out that technically the problem here, even though it seemed like an autodecoding problem, is a problem with splitter. splitter doesn't deal with encodings of character ranges at all. For instance, when you have this: "abc 123".byCodeUnit.splitter; What happens is splitter only has one overload that takes one parameter, and that requires a character *array*, not a range. So the byCodeUnit result is aliased-this to its original, and surprise! the elements from that splitter are string. Next, I tried to use a parameter: "abc 123".byCodeUnit.splitter(" "); Nope, still devolves to string. It turns out it can't figure out how to split character ranges using a character array as input. The only thing that does seem to work is this: "abc 123".byCodeUnit.splitter(" ".byCodeUnit); But this goes against most algorithms in Phobos that deal with character ranges -- generally you can use any width character range, and it just works. Having a drop-in replacement for string would require splitter to handle these transcodings (and I think in general, algorithms should be able to handle them as well). Not only that, but the specialized splitter that takes no separator can split on multiple spaces, a feature I want to have for my drop-in replacement. I'll work on adding some issues to the tracker, and potentially doing some PRs so they can be fixed.
Well, plenty of algorithms don't care one whit about strings specifically and thus their behavior is really dependent on what the element type of the range is (e.g. for byCodeUnit, filter would filter code units, and sort would sort code units, and arguably, that's what they should do). However, a big problem with with a number of the functions in Phobos that specifically operate on ranges of characters is that they tend to assume that a range of characters means a range of dchar. Some of the functions in Phobos have been fixed to be more flexible and operate on arbitrary ranges of char, wchar, or dchar, but it's mostly happened because of a bug report about a particular function not working with something like byCodeUnit, whereas what we really need to happen is to have tests added for all of the functions in Phobos which specifically operate on ranges of characters to ensure that they do the correct thing when given a range of char, wchar, dchar - or graphemes (much as we talk about graphemes being the correct level for a some types of string processing, nothing in Phobos outside of std.uni currently does anything with byGrapheme, even in tests). And of course, with those tests, we'll inevitably find that a number of those functions won't work correctly and will need to be fixed. But as annoying as all of that is, it's work that needs to be done regardless of the situation with auto-decoding, since these functions need to work with arbitrary ranges of characters and not just ranges of dchar. And for those functions that don't need to try to avoid auto-decoding, they should then not even care whether strings are ranges of code units or code points, which should then reduce the impact of auto-decoding. And actually, a lot of the code that specializes on narrow strings to avoid auto-decoding would probably work whether auto-decoding was there or not. So, once we've actually managed to ensure that Phobos in general works with arbitrary ranges of characters, the main breakage that would be caused by removing auto-decoding (in Phobos at least) would be any code that used strings with functions that weren't specifically written to do something special for strings, and while I'm not at all convinced that we then have a path towards removing auto-decoding, it would minimize auto-decoding's impact, and with auto-decoding's impact minimized as much as possible, maybe at some point, we'll actually manage to figure out how to remove it. But in any case, the issues that you're running into with splitter are a symptom of a larger problem with how Phobos currently handles ranges of characters. And when this sort of thing comes up, I'm reminded that I should take the time to start adding the appropriate tests to Phobos, and then I never get around to it - as with too many things. I really should fix that. :| - Jonathan M Davis
Sep 10 2018
prev sibling next sibling parent reply Chris <wendlec tcd.ie> writes:
On Saturday, 8 September 2018 at 15:36:25 UTC, Steven 
Schveighoffer wrote:
 On 8/9/18 2:44 AM, Walter Bright wrote:
 So it turns out that technically the problem here, even though 
 it seemed like an autodecoding problem, is a problem with 
 splitter.

 splitter doesn't deal with encodings of character ranges at all.

 For instance, when you have this:

 "abc 123".byCodeUnit.splitter;

 What happens is splitter only has one overload that takes one 
 parameter, and that requires a character *array*, not a range.

 So the byCodeUnit result is aliased-this to its original, and 
 surprise! the elements from that splitter are string.

 Next, I tried to use a parameter:

 "abc 123".byCodeUnit.splitter(" ");

 Nope, still devolves to string. It turns out it can't figure 
 out how to split character ranges using a character array as 
 input.

 The only thing that does seem to work is this:

 "abc 123".byCodeUnit.splitter(" ".byCodeUnit);
After a while your code will be cluttered with absurd stuff like this. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my experience with `splitter` et. al. I tried to create my own parser to have better control over every step. After a few *minutes* of testing things I ran into this bug [1] that didn't get fixed till early 2018. I never started to write my own step-by-step parser. I'm glad I didn't. I wish people began to realize that string handling is a basic necessity and that the correct handling of strings is of utmost importance. Please keep us updated on how things work out (or not) for you. [Please, nobody answer my post pointing out that a) we don't understand Unicode and b) that it's an insult to the Universe to draw attention to flaws that keep pestering us on an almost daily basis - without trying to fix them ourselves stante pede. As is clear from Steve's efforts, the Universe doesn't seem to care.) [1] https://issues.dlang.org/show_bug.cgi?id=16739 [snip]
Sep 10 2018
next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, September 10, 2018 2:45:27 AM MDT Chris via Digitalmars-d wrote:

 After a while your code will be cluttered with absurd stuff like
 this. `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my
 experience with `splitter` et. al. I tried to create my own
 parser to have better control over every step. After a few
 *minutes* of testing things I ran into this bug [1] that didn't
 get fixed till early 2018. I never started to write my own
 step-by-step parser. I'm glad I didn't.

 [1] https://issues.dlang.org/show_bug.cgi?id=16739

 [snip]
I suspect that that that didn't get found sooner simply because using Unicode in a switch statement is rare. Usually, Unicode characters are found in program input and not in the program itself. And grammars typically only involve ASCII characters (even D, which supports Unicode characters in identfiers, doesn't have any Unicode in any of its symbols). So, while I completely agree that using Unicode in switch statements should work, it doesn't really surprise me that it was broken. That's really a large part of the Unicode problem. Regardless of how a particular language or library attempst to make using Unicode sane, a large percentage of programmers don't ever do anything with Unicode characters (even if their programs are often used in environments where they will end up processing Unicode characters), and even when a programmer's native tongue requires Unicode characters, their programs frequently do not. So, it becomes very easy to write code that doesn't work properly with Unicode and have no clue that it doesn't. Fortunately, D does provide better tools than many languages for handling Unicode, but the auto-decoding mess has made it considerably worse. Still, even if we'd gotten it right, some portion of the code out there have to have something like byCodeUnit, byCodePoint, or byGrapheme, because efficient Unicode processing requires that you deal with all of that mess. The code that doesn't have to do any of that is generally code that treats strings as opaque data. Once you actually have to do string processing, you're pretty much screwed. Doing everything at the grapheme level would eliminate most of the problems with regards to user-friendliness, but it would kill efficiency. So, as far as I can tell, there really isn't a great solution to be had. Unicode is simply too complicated and messy by its very nature. Now, we've definitely made mistakes with Phobos that make it worse, but the only programs that are going to avoid this whole mess either do so by not dealing with Unicode, handling it incorrectly, or by handling it inefficiently. I think that it's pretty much a pipe dream to be able to have completely sane and efficient string handling using Unicode as its currently defined. Regardless, we need to do a better job of it in D than we have been. - Jonathan M Davis
Sep 10 2018
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/10/18 1:45 AM, Chris wrote:

 After a while your code will be cluttered with absurd stuff like this. 
 `.byCodeUnit`, `.byGrapheme`, `.array` etc. Due to my experience with 
 `splitter` et. al. I tried to create my own parser to have better 
 control over every step.
I considered that, but I'm still trying to make this buffer reference thing work. Phobos just needs to be fixed. This is actually not as hopeless as I once thought. But what needs to happen is all of Phobos algorithms need to be tested with byCodeUnit et. al.
 After a few *minutes* of testing things I ran 
 into this bug [1] that didn't get fixed till early 2018. I never started 
 to write my own step-by-step parser. I'm glad I didn't.
It actually was fixed accidentally in 2017 in this PR: https://github.com/dlang/druntime/pull/1952. The bug was closed in 2018 when someone noticed the code no longer failed. Essentially, the whole string switch algorithm was replaced with a completely rewritten better approach. This is a great example of why we should be moving more of the compiler magic into the library -- it's just easier to write and understand there.
 I wish people began to realize that string handling is a basic necessity 
 and that the correct handling of strings is of utmost importance. Please 
 keep us updated on how things work out (or not) for you.
Absolutely, D needs to have great support for string parsing and manipulation. The potential is awesome. I will keep it up, what I'm trying to fix is the fact that using std.algorithm to extract pieces from a buffer, but then using the position in that buffer to determine things (i.e. parsing) is really difficult without some stupid requirements like pointer math.
 [Please, nobody answer my post pointing out that a) we don't understand 
 Unicode and b) that it's an insult to the Universe to draw attention to 
 flaws that keep pestering us on an almost daily basis - without trying 
 to fix them ourselves stante pede. As is clear from Steve's efforts, the 
 Universe doesn't seem to care.)
I don't characterize it as the universe not caring. Phobos has a legacy problem with string handling, and it needs to somehow be addressed -- either by painfully extracting the problem, or painfully working around it. I don't think anyone here thinks there isn't a problem or that it's insulting to bring it up. But anything that needs to be done is painful either way, which is why it's not happening very fast. -Steve
Sep 10 2018
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/8/18 8:36 AM, Steven Schveighoffer wrote:
 I'll work on adding some issues to the tracker, and potentially doing 
 some PRs so they can be fixed.
https://issues.dlang.org/show_bug.cgi?id=19238 https://github.com/dlang/phobos/pull/6700 -Steve
Sep 10 2018
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/8/18 8:36 AM, Steven Schveighoffer wrote:
 On 8/9/18 2:44 AM, Walter Bright wrote:
 On 8/8/2018 2:01 PM, Steven Schveighoffer wrote:
 Here's where I'm struggling -- because a string provides indexing, 
 slicing, length, etc. but Phobos ignores that. I can't make a new 
 type that does the same thing. Not only that, but I'm finding the 
 specializations of algorithms only work on the type "string", and 
 nothing else.
One of the worst things about autodecoding is it is special, it *only* steps in for strings. Fortunately, however, that specialness enabled us to save things with byCodePoint and byCodeUnit.
So it turns out that technically the problem here, even though it seemed like an autodecoding problem, is a problem with splitter. splitter doesn't deal with encodings of character ranges at all. For instance, when you have this: "abc 123".byCodeUnit.splitter; What happens is splitter only has one overload that takes one parameter, and that requires a character *array*, not a range. So the byCodeUnit result is aliased-this to its original, and surprise! the elements from that splitter are string. Next, I tried to use a parameter: "abc 123".byCodeUnit.splitter(" "); Nope, still devolves to string. It turns out it can't figure out how to split character ranges using a character array as input.
Hm... I made some erroneous assumptions in determining these problems. 1. There is no alias this for the source in ByCodeUnitImpl. I'm not sure how it was working when I tested before, but byCodeUnit definitely doesn't have it, and doesn't compile with the no-arg splitter call. 2. The .splitter(" ") does actually work and return a range of ByCodeUnitImpl elements. So some of my analysis must have been based on bad testing. However, the issue with the no-arg splitter is still there, and I still think it should be fixed. I'll have to figure out why my specialized range doesn't allow splitting based on " ". -Steve
Sep 10 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't allow splitting 
 based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t. -Steve
Sep 10 2018
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/10/18 12:46 PM, Steven Schveighoffer wrote:
 On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't allow 
 splitting based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t.
I guess a better error message would be in order.
Sep 10 2018
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 10 September 2018 at 20:44:46 UTC, Andrei Alexandrescu 
wrote:
 On 9/10/18 12:46 PM, Steven Schveighoffer wrote:
 On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't 
 allow splitting based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t.
I guess a better error message would be in order.
https://github.com/dlang/DIPs/pull/131 will help narrow down the cause.
Sep 10 2018
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/10/18 7:00 PM, Nicholas Wilson wrote:
 On Monday, 10 September 2018 at 20:44:46 UTC, Andrei Alexandrescu wrote:
 On 9/10/18 12:46 PM, Steven Schveighoffer wrote:
 On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't allow 
 splitting based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t.
I guess a better error message would be in order.
https://github.com/dlang/DIPs/pull/131 will help narrow down the cause.
While this would help eventually, I'd prefer something that just transforms all the existing code into useful error messages. See my response to Andrei. -Steve
Sep 11 2018
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Tuesday, 11 September 2018 at 13:08:46 UTC, Steven 
Schveighoffer wrote:
 On 9/10/18 7:00 PM, Nicholas Wilson wrote:
 On Monday, 10 September 2018 at 20:44:46 UTC, Andrei 
 Alexandrescu wrote:
 On 9/10/18 12:46 PM, Steven Schveighoffer wrote:
 On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't 
 allow splitting based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t.
I guess a better error message would be in order.
https://github.com/dlang/DIPs/pull/131 will help narrow down the cause.
While this would help eventually, I'd prefer something that just transforms all the existing code into useful error messages. See my response to Andrei. -Steve
Please tell me where to get one of those! But yeah, that DIP will tell you that has slicing is you problem straight away. Extracting useful information to present to the user on why hasSlicing!R is false is much trickier for the same reason that providing useful information in the current template constraint format is hard: it is a bunch of potentially unstructured logic that has already been const-folded in order to evaluate it in the first place, so you can't re-evaluate it without flushing the template cache. That's not to say that the situation can't be improved beyond what the DIP specifies, but I haven't had any brilliant ideas (and the Idea for that DIP was stolen from someone else anyway).
Sep 11 2018
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 11 September 2018 at 02:00:29 UTC, Nicholas Wilson 
wrote:
 [snip]

 https://github.com/dlang/DIPs/pull/131 will help narrow down 
 the cause.
I like it, but I worry people would find multiple ifs confusing. The first line of the comment is about using static asserts and in contracts, but it looks like static asserts are allowed in in contracts for functions [1]. You can do the same thing in structs/classes with invariant blocks (but in contracts are not allowed). So basically, the same behavior for if can be reduced to in contracts with static asserts already. Multiple ifs would just be a slightly less verbose way to accomplish the same thing. I suppose one issue might be that contracts are not compiled in during release mode, but I think release only impacts normal asserts, not static asserts. Is there any reason why this is not sufficient? [1] https://run.dlang.io/is/lu6nQ0
Sep 11 2018
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/11/18 7:58 AM, jmh530 wrote:

 Is there any reason why this is not sufficient?
 
 [1] https://run.dlang.io/is/lu6nQ0
That's OK if you are the only one defining S. But what if float is handled elsewhere? -Steve
Sep 12 2018
prev sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Tuesday, 11 September 2018 at 14:58:21 UTC, jmh530 wrote:
 Is there any reason why this is not sufficient?

 [1] https://run.dlang.io/is/lu6nQ0
Overloads: https://run.dlang.io/is/m5HGOh The static asserts being in the constraint affects the template candidacy viability. Being in the function body/runtime contract does not so you'll end up with onlineapp.d(17): Error: onlineapp.foo called with argument types (float) matches both: onlineapp.d(1): onlineapp.foo!float.foo(float x) and: onlineapp.d(7): onlineapp.foo!float.foo(float x) despite the fact only one of them is viable, whereas bar is fine.
Sep 12 2018
parent jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 12 September 2018 at 12:45:15 UTC, Nicholas Wilson 
wrote:
 

 Overloads:

 [snip]
Good point.
Sep 12 2018
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/10/18 1:44 PM, Andrei Alexandrescu wrote:
 On 9/10/18 12:46 PM, Steven Schveighoffer wrote:
 On 9/10/18 8:58 AM, Steven Schveighoffer wrote:
 I'll have to figure out why my specialized range doesn't allow 
 splitting based on " ".
And the answer is: I'm an idiot. Forgot to define empty :) Also my slicing operator accepted ints and not size_t.
I guess a better error message would be in order.
A better error message would help prevent the painful diagnosis that I had to do to actually find the issue. So the error I got was this: source/bufref.d(346,36): Error: template std.algorithm.iteration.splitter cannot deduce function from argument types !()(Result, string), candidates are: /Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/ teration.d(3792,6): std.algorithm.iteration.splitter(alias pred = "a == b", Range, Separator)(Range r, Separator s) if (is(typeof(binaryFun!pred(r.front, s)) : bool) && (hasSlicing!Range && hasLength!Range || isNarrowString!Range)) /Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/ teration.d(4163,6): std.algorithm.iteration.splitter(alias pred = "a == b", Range, Separator)(Range r, Separator s) if (is(typeof(binaryFun!pred(r.front, s.front)) : bool) && (hasSlicing!Range || isNarrowString!Range) && isForwardRange!Separator && (hasLength!Separator || isNarrowString!Separator)) /Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/ teration.d(4350,6): std.algorithm.iteration.splitter(alias isTerminator, Range)(Range r) if (isForwardRange!Range && is(typeof(unaryFun!isTerminator(r.front)))) /Users/steves/.dvm/compilers/dmd-2.081.0/osx/bin/../../src/phobos/std/algorithm/ teration.d(4573,6): std.algorithm.iteration.splitter(C)(C[] s) if (isSomeChar!C) This means I had to look at each line, figure out which overload I'm calling, and then copy all the constraints locally, seeing which ones were true and which ones false. But it didn't stop there. The problem was hasSlicing!Range. If you look at hasSlicing, it looks like this: enum bool hasSlicing(R) = isForwardRange!R && !isNarrowString!R && is(ReturnType!((R r) => r[1 .. 1].length) == size_t) && (is(typeof(lvalueOf!R[1 .. 1]) == R) || isInfinite!R) && (!is(typeof(lvalueOf!R[0 .. $])) || is(typeof(lvalueOf!R[0 .. $]) == R)) && (!is(typeof(lvalueOf!R[0 .. $])) || isInfinite!R || is(typeof(lvalueOf!R[0 .. $ - 1]) == R)) && is(typeof((ref R r) { static assert(isForwardRange!(typeof(r[1 .. 2]))); })); Now I had to instrument a whole slew of items. I pasted this whole thing this into my code, added an alias to my range type for R, and then changed the big boolean expression to a bunch of static asserts. Then I found the true culprit was isForwardRange!R. This led me to requestion my sanity, and finally realized I forgot the empty function. A fabulous fantastic mechanism that would have saved me some time is simply coloring the clauses of the template constraint that failed red, the ones that passed green, and the ones that weren't evaluated grey. Furthermore, it would be good to either recursively continue this for red clauses like `hasSlicing` which have so much underneath. Either that or a way to trigger the colored evaluation on demand. If I were a dmd guru, I'd look at doing this myself. I may still try and hack it in just to see if I can do it. ------ Finally, there is a possible bug in the definition of hasSlicing: it doesn't require the slice parameters be size_t, but there are places (e.g. inside std.algorithm.searching.find) that pass in range.length .. range.length for slicing the range. In my implementation I had used ints as the parameters for opSlice. So I started seeing errors deep inside std.algorithm saying there was no overload for slicing. Again the sanity was questioned, and I figured out the error and now it's actually working. -Steve
Sep 11 2018
parent reply "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
 
 Then I found the true culprit was isForwardRange!R. This led me to 
 requestion my sanity, and finally realized I forgot the empty function.
This is one reason template-based interfaces like ranges should be required to declare themselves as deliberately implementing said interface. Sure, we can tell people they should always `static assert(isForwardRage!MyType)`, but that's coding by convention and clearly isn't always going to happen.
Sep 13 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via
Digitalmars-d wrote:
 On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
 
 Then I found the true culprit was isForwardRange!R. This led me to
 requestion my sanity, and finally realized I forgot the empty
 function.
This is one reason template-based interfaces like ranges should be required to declare themselves as deliberately implementing said interface. Sure, we can tell people they should always `static assert(isForwardRage!MyType)`, but that's coding by convention and clearly isn't always going to happen.
Yeah, I find myself writing `static assert(isInputRange!MyType)` all the time these days, because you just never can be too sure you didn't screw up and cause things to mysteriously fail, even though they shouldn't. Although I used to be a supporter of free-form sig constraints (and still am to some extent) and a hater of Concepts like in C++, more and more I'm beginning to realize the wisdom of Concepts rather than free-for-all ducktyping. It's one of those things that work well in small programs and fast, one-shot projects, but don't generalize so well as you scale up to larger and larger projects. T -- A program should be written to model the concepts of the task it performs rather than the physical world or a process because this maximizes the potential for it to be applied to tasks that are conceptually similar and, more important, to tasks that have not yet been conceived. -- Michael B. Allen
Sep 13 2018
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/13/18 3:53 PM, H. S. Teoh wrote:
 On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via
Digitalmars-d wrote:
 On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
 Then I found the true culprit was isForwardRange!R. This led me to
 requestion my sanity, and finally realized I forgot the empty
 function.
This is one reason template-based interfaces like ranges should be required to declare themselves as deliberately implementing said interface. Sure, we can tell people they should always `static assert(isForwardRage!MyType)`, but that's coding by convention and clearly isn't always going to happen.
duck typing.
 Yeah, I find myself writing `static assert(isInputRange!MyType)` all the
 time these days, because you just never can be too sure you didn't screw
 up and cause things to mysteriously fail, even though they shouldn't.
 
 Although I used to be a supporter of free-form sig constraints (and
 still am to some extent) and a hater of Concepts like in C++, more and
 more I'm beginning to realize the wisdom of Concepts rather than
 free-for-all ducktyping.  It's one of those things that work well in
 small programs and fast, one-shot projects, but don't generalize so well
 as you scale up to larger and larger projects.
The problem I had was that it wasn't clear to me which constraint was failing. My bias brought me to "it must be autodecoding again!". But objectively, I should have examined all the constraints to see what was wrong. All C++ concepts seem to do (haven't used them) is help identify easier which requirements are failing. We can fix all these problems by simply identifying the constraint clauses that fail. By color coding the error message identifying which ones are true and which are false, we can pinpoint the error without changing the language. Once you fix the issue, it doesn't error any more, so the idea of duck typing and constraints is sound, it's just difficult to diagnose. -Steve
Sep 15 2018
next sibling parent reply Neia Neutuladh <neia ikeran.org> writes:
On Saturday, 15 September 2018 at 15:31:00 UTC, Steven 
Schveighoffer wrote:
 The problem I had was that it wasn't clear to me which 
 constraint was failing. My bias brought me to "it must be 
 autodecoding again!". But objectively, I should have examined 
 all the constraints to see what was wrong. All C++ concepts 
 seem to do (haven't used them) is help identify easier which 
 requirements are failing.
They also make it so your automated documentation can post a link to something that describes the type in more cases. std.algorithm would still be relatively horked, but a lot of functions could be declared as yielding, for instance, ForwardRange!(ElementType!(TRange)).
 We can fix all these problems by simply identifying the 
 constraint clauses that fail. By color coding the error message 
 identifying which ones are true and which are false, we can 
 pinpoint the error without changing the language.
I wish. I had a look at std.algorithm.searching.canFind as the first thing I thought to check. Its constraints are of the form: bool canFind(Range)(Range haystack) if (is(typeof(find!pred(haystack)))) The compiler can helpfully point out that the specific constraint that failed was is(...), which does absolutely no good in trying to track down the problem.
Sep 15 2018
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/15/18 12:04 PM, Neia Neutuladh wrote:
 On Saturday, 15 September 2018 at 15:31:00 UTC, Steven Schveighoffer wrote:
 The problem I had was that it wasn't clear to me which constraint was 
 failing. My bias brought me to "it must be autodecoding again!". But 
 objectively, I should have examined all the constraints to see what 
 was wrong. All C++ concepts seem to do (haven't used them) is help 
 identify easier which requirements are failing.
They also make it so your automated documentation can post a link to something that describes the type in more cases. std.algorithm would still be relatively horked, but a lot of functions could be declared as yielding, for instance, ForwardRange!(ElementType!(TRange)).
True, we currently rely on convention there. But this really is simply documentation at a different (admittedly more verified) level.
 
 We can fix all these problems by simply identifying the constraint 
 clauses that fail. By color coding the error message identifying which 
 ones are true and which are false, we can pinpoint the error without 
 changing the language.
I wish. I had a look at std.algorithm.searching.canFind as the first thing I thought to check. Its constraints are of the form:     bool canFind(Range)(Range haystack)     if (is(typeof(find!pred(haystack)))) The compiler can helpfully point out that the specific constraint that failed was is(...), which does absolutely no good in trying to track down the problem.
is(typeof(...)) constraints might be useless here, but we have started to move away from such things in general (see for instance isInputRange and friends). But there could actually be a solution -- just recursively play out the items at compile time (probably with the verbose switch) to see what underlying cause there is. Other than that, you can then write find(myrange) and see what comes up. In my case even, the problem was hasSlicing, which itself is a complicated template, and wouldn't have helped me diagnose the real problem. A recursive display of what things failed would help, but even if I could trigger a way to diagnose hasSlicing, instead of copying all the constraints locally, it's still a much better situation. I'm really thinking of exploring how this could play out, just toying with the compiler to do this would give me experience in how the thing works. -Steve
Sep 15 2018
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, September 15, 2018 9:31:00 AM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
 On 9/13/18 3:53 PM, H. S. Teoh wrote:
 On Thu, Sep 13, 2018 at 06:32:54PM -0400, Nick Sabalausky (Abscissa) via 
Digitalmars-d wrote:
 On 09/11/2018 09:06 AM, Steven Schveighoffer wrote:
 Then I found the true culprit was isForwardRange!R. This led me to
 requestion my sanity, and finally realized I forgot the empty
 function.
This is one reason template-based interfaces like ranges should be required to declare themselves as deliberately implementing said interface. Sure, we can tell people they should always `static assert(isForwardRage!MyType)`, but that's coding by convention and clearly isn't always going to happen.
duck typing.
 Yeah, I find myself writing `static assert(isInputRange!MyType)` all the
 time these days, because you just never can be too sure you didn't screw
 up and cause things to mysteriously fail, even though they shouldn't.

 Although I used to be a supporter of free-form sig constraints (and
 still am to some extent) and a hater of Concepts like in C++, more and
 more I'm beginning to realize the wisdom of Concepts rather than
 free-for-all ducktyping.  It's one of those things that work well in
 small programs and fast, one-shot projects, but don't generalize so well
 as you scale up to larger and larger projects.
The problem I had was that it wasn't clear to me which constraint was failing. My bias brought me to "it must be autodecoding again!". But objectively, I should have examined all the constraints to see what was wrong. All C++ concepts seem to do (haven't used them) is help identify easier which requirements are failing. We can fix all these problems by simply identifying the constraint clauses that fail. By color coding the error message identifying which ones are true and which are false, we can pinpoint the error without changing the language. Once you fix the issue, it doesn't error any more, so the idea of duck typing and constraints is sound, it's just difficult to diagnose.
The other two things that come to mind are that 1. Design by Introspection is pretty much the opposite of Concepts, and while I'm not convinced that DbI is a great idea in general, there clearly are cases where it makes a lot of sense (e.g. allocators), and it's something that Andrei wants to push (whereas unless something has changed, he's very much against Concepts). Adding any sort of Concepts feature to D would be very much at odds with DbI. And honestly, in general, I don't think that it's at all necessary. As you point out, it's really the error reporting that's the problem. Aside from that, template constraints tend to work quite well. 2. Improving the error reporting for constraints improves templates in general and not just those that use traits like isInputRange. While we do create traits for the really common stuff, there's plenty of code that is going to do stuff like is(typeof(...)), because it's a one-off thing, and it would be overkill to create a trait for it. So, improving the error reporting would ultimately be very useful in general, whereas trying to do something with Concepts would only help with part of the problem. And of course, there's always going with Atila's approach of providing a separate template that goes with the trait and tells you which piece fails for a particular template argument (though that obviously doesn't scale). Overall though, I don't think that there's really any disagreement that it would be very desirable to get the compiler to provide better information about which parts of a template constraint are true and which are false. The problem is really that someone needs to come up with a scheme to do so that will work reasonably well and then implement it, and no on has done that yet. - Jonathan M Davis
Sep 15 2018
parent "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 09/15/2018 04:29 PM, Jonathan M Davis wrote:
 
 Adding any sort of Concepts feature to D
 would be very much at odds with DbI.
I'm not very familiar with C++'s attempted approaches to concepts, so maybe we're thinking of two different things by "concepts", but I don't see why it would be at odds with DbI. If anything, they would seem to compliment each other well, each one filling in where the other hits its limit. Even just a simple example: void foo(ForwardRange r1, InputRange r2) if(hasLength!r1) {...} Andrei is right that a no-DbI version of that would suck: Hierarchies are no good for a series of orthogonal options. But at the same time, the equivalent current-D code would comparatively be a mess, too. Although...and maybe I'm just typing out of my &% here, maybe some kind of templated concept: void foo(ForwardRange!WithLength r1, InputRange r2) {...}
 Overall though, I don't think that there's really any disagreement that it
 would be very desirable to get the compiler to provide better information
 about which parts of a template constraint are true and which are false. The
 problem is really that someone needs to come up with a scheme to do so that
 will work reasonably well and then implement it, and no on has done that
 yet.
Agreed, but let's be realistic, this *is* D: How many years has it been since assertPred was rejected in favor of the improved-assert-messages vaporware? It's hard to have much faith in such a thing happening here either. (Not that I'm under any illusion that concept-like stuff would be any more likely.) Though, I'd be glad to be proven wrong either way. -- Danny Downer
Sep 16 2018
prev sibling parent Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 8 August 2018 at 21:01:18 UTC, Steven Schveighoffer 
wrote:
 Not trying to give too much away about the library I'm writing, 
 but the problem I'm trying to solve is parsing out tokens from 
 a buffer. I want to delineate the whole, as well as the parts, 
 but it's difficult to get back to the original buffer once you 
 split and slice up the buffer using phobos functions.
I wonder if there are some parallels in the tsv utilities I wrote. The tsv parser is extremely simple, byLine and splitter on a char buffer. Most of the tools just iterate the split result in order, but a couple do things like operate on a subset of fields, potentially reordered. For these a separate structure is created that maps back the to original buffer to avoid copying. Likely quite simple compared to what you are doing. The csv2tsv tool may be more interesting. Parsing is relatively simple, mostly identifying field values in the context of CSV escape syntax. It's modeled as reading an infinite stream of utf-8 characters, byte-by-byte. Occasionally the bytes forming the value need to be modified due to the escape syntax, but most of the time the characters in the original buffer remain untouched and parsing is identifying the start and end positions. The infinite stream is constructed by reading fixed size blocks from the input stream and concatenating them with joiner. This eliminates the need to worry about utf-8 characters spanning block boundaries, but it comes at a cost: either write byte-at-a-time, or make an extra copy (also byte-at-a-time). Making an extra copy is faster, that what the code does. But, as a practical matter, most of the time large blocks could often be written directly from the original input buffer. If I wanted it make it faster than current I'd do this. But I don't see an easy way to do this with phobos ranges. At minimum I'd have to be able to run code when the joiner operation hits block boundaries. And it'd also be necessary to create a mapping back to the original input buffer. Autodecoding comes into play of course. Basically, splitter on char arrays is fine, but in a number of cases it's necessary to work using ubtye to avoid the performance penalty. --Jon
Aug 09 2018