www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - splitter trouble

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
While working on a solution for Alfred Newman's thread, I came up with 
the following interim solution, which compiled but failed:

auto parse(R, S)(R range, S separators) {
     import std.algorithm : splitter, filter, canFind;
     import std.range : empty;

     static bool pred(E, S)(E e, S s) {
         return s.canFind(e);
     }

     return range.splitter!pred(separators).filter!(token => !token.empty);
}

unittest {
     import std.algorithm : equal;
     import std.string : format;
     auto parsed = parse("_My   input.string", " _,.");
     assert(parsed.equal([ "My", "input", "string" ]), format("%s", 
parsed));
}

void main() {
}

The unit test fails and prints

["put", "ing"]

not the expected

["My", "input", "string"].

How is that happening? Am I unintentionally hitting a weird overload of 
splitter?

Ali
Oct 30 2016
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Sunday, 30 October 2016 at 23:57:11 UTC, Ali Çehreli wrote:
 While working on a solution for Alfred Newman's thread, I came 
 up with the following interim solution, which compiled but 
 failed:

 auto parse(R, S)(R range, S separators) {
     import std.algorithm : splitter, filter, canFind;
     import std.range : empty;

     static bool pred(E, S)(E e, S s) {
         return s.canFind(e);
     }

     return range.splitter!pred(separators).filter!(token => 
 !token.empty);
 }

 unittest {
     import std.algorithm : equal;
     import std.string : format;
     auto parsed = parse("_My   input.string", " _,.");
     assert(parsed.equal([ "My", "input", "string" ]), 
 format("%s", parsed));
 }

 void main() {
 }

 The unit test fails and prints

 ["put", "ing"]

 not the expected

 ["My", "input", "string"].

 How is that happening? Am I unintentionally hitting a weird 
 overload of splitter?

 Ali
As usual, auto-decoding has plumbed the sewage line straight in to the drinking water... Splitter needs to know how far to skip when it hits a match. Normally speaking - for the pred(r.front, s) overload that you're using here - the answer to that question is always 1. Except in the case of narrow strings, where it's whatever the encoded length of the separator is in the encoding of the source range (in this case utf-8), in order to skip e.g. a big dchar.* But in your case, your separator is more than one character, but you only want to skip forward one, because your separator isn't really a separator. * see https://github.com/dlang/phobos/blob/d6572c2a44d69f449bfe2b07461b2f0a1d6503f9/std/algorithm/iteration.d#L3710 Basically, what you're doing isn't going to work. A separator is considered to be a separator, i.e. something to be skipped over and twisting the definition causes problems. This will work, but I can't see any way to make it nogc: auto parse(R, S)(R range, S separators) { import std.algorithm : splitter, filter, canFind; import std.range : save, empty; return range .splitter!(e => separators.save.canFind(e)) .filter!(token => !token.empty); }
Nov 01 2016