digitalmars.D.learn - Splitting Ranges using Lambda Predicates
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (8/8) Jun 10 2014 I'm missing a version of splitter that can be used to split
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/4) Jun 10 2014 My example is dumb and incorrect.
- monarch_dodra (18/22) Jun 10 2014 The current splitter works on the notion of splitter "tokens",
- monarch_dodra (11/14) Jun 10 2014 Thinking about this more: Do you *actually* have two different
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/6) Jun 10 2014 What do you mean by basing it off splitter!pred - should I start
- monarch_dodra (126/132) Jun 11 2014 I meant mostly copy pasting it, and modifying it to your needs.
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (18/26) Oct 09 2014 Your solution copied here
- monarch_dodra (5/31) Oct 09 2014 My quick guess is you are missing the *global* imports for the
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (6/11) Oct 09 2014 I don't understand. The restraints are commented out at
- monarch_dodra (6/18) Oct 10 2014 Sorry, I read your compile error wrong, and was on my phone.
- Artur Skawina via Digitalmars-d-learn (34/37) Jun 11 2014 Starting from scratch is actually not a bad idea, at least for this kind
- monarch_dodra (14/29) Jun 11 2014 I don't know about "starting from scratch" entirely. Maybe not
- Artur Skawina via Digitalmars-d-learn (17/21) Jun 11 2014 That's a problem with D's string handling. For both examples and rarely
- monarch_dodra (7/9) Jun 11 2014 Well, (IMO) it's a problem with no real solution. But for what
- Artur Skawina via Digitalmars-d-learn (13/14) Jun 11 2014 If the recommended approach for writing a really trivial string transfor...
- Artur Skawina via Digitalmars-d-learn (7/15) Jun 11 2014 Well, for safety, you'd most likely want to pass around the range too
I'm missing a version of splitter that can be used to split ranges based on arbitrary predicates. I need to for conversion between different symbol casings, typically: 1. someName => SomeName In this case the lambda should take two arguments (a,b) where in 1. a should be lowercase and b should be uppercase Have i missed a reusable component that can be applied here?
Jun 10 2014
1. someName => SomeNameMy example is dumb and incorrect. I actually want this to do the following 2. "_someGreatVariableName" => "Some Great Varible Name"
Jun 10 2014
On Tuesday, 10 June 2014 at 21:11:17 UTC, Nordlöw wrote:The current splitter works on the notion of splitter "tokens", eg, it splits when it find an element or range that corresponds to the passed value/pred. What exactly are you requesting though? - Split on the "edge" lowercase to uppercase? - Split on uppercase but keep the uppercase element? Either way, your example should look like this: "SomeGreatVariableName" => ["Some", "Great", "Variable", "Name"] Since you are splitting up your range into subranges. And also either way, AFAIK, yeah, we don't have any splitter that does that. We are also missing the version that takes both a range an predicate, which would allow things like: "This Cool thing is COOL!!!".split((a, b)=>a == b.toLower())("cool") => ["This ", " thing is ", "!!!"] Looks like you'll have to roll your own :/1. someName => SomeNameMy example is dumb and incorrect. I actually want this to do the following 2. "_someGreatVariableName" => "Some Great Varible Name"
Jun 10 2014
On Tuesday, 10 June 2014 at 21:26:50 UTC, monarch_dodrWhat exactly are you requesting though? - Split on the "edge" lowercase to uppercase? - Split on uppercase but keep the uppercase element?Thinking about this more: Do you *actually* have two different predicates, or are they mutually exclusive? EG: predicate "isUpper", and split on a false->true? Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your requirements could even make it simpler, since you don't need to "skip" the splitter element (which, for strings, can actually be tricky business...). You'd just have to use 2 calls to find though, for "both your predicates" or for "not predicate then predicate".
Jun 10 2014
Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, yourWhat do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? Thx.
Jun 10 2014
On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote:I meant mostly copy pasting it, and modifying it to your needs. For example, I adapted it into this. For simplicity, I stripped infinite and forward only range support. The only functions I actually modified were "findTerminator", to actually find according to what I want, and popFront. //---- auto slicer(alias isTerminator, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(unaryFun!isTerminator(input.front)))) { return SlicerResult!(unaryFun!isTerminator, Range)(input); } private struct SlicerResult(alias isTerminator, Range) { alias notTerminator = not!isTerminator; private Range _input; private size_t _end = 0; private void findTerminator() { auto r = _input.save.find!(not!isTerminator).find!isTerminator(); _end = _input.length - r.length; } this(Range input) { _input = input; if (!_input.empty) findTerminator(); else _end = size_t.max; } static if (isInfinite!Range) enum bool empty = false; // Propagate infiniteness. else property bool empty() { return _end == size_t.max; } property auto front() { return _input[0 .. _end]; } void popFront() { _input = _input[_end .. _input.length]; if (_input.empty) { _end = size_t.max; return; } findTerminator(); } property typeof(this) save() { auto ret = this; ret._input = _input.save; return ret; } } //---- This will split on before the first element where pred is true, provided there are previous elements where pred is false: //---- void main() { "SomeGreatVariableName" .slicer!isUpper.writeln(); "someGGGreatVariableName".slicer!isUpper.writeln(); "".slicer!isUpper.writeln(); "a".slicer!isUpper.writeln(); "A".slicer!isUpper.writeln(); } //---- ["Some", "Great", "Variable", "Name"] ["some", "GGGreat", "Variable", "Name"] [] ["a"] ["A"] //---- This may or may not be what you wanted, depending on how you want to split "GGGreat". If you wanted it to simply split *ON* the left of every capital letter, then you can modify the the find terminator into: private void findTerminator() { auto r = _input.save.dropOne.find!isTerminator; _end = _input.length - r.length; } And you'd get: ["some", "G", "G", "Great", "Variable", "Name"] ******************************************* ******************************************* ******************************************* In any case, yeah, it shouldn't be too hard to shape it into what you want. A more involved solution to this problem could be to simply pass a "searchFun" predicate, in which case you'd be able to split not just according to any "unitary predicate", but according to an entire "range search strategy: //---- auto slicer(alias searchFun, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(searchFun(input)))) { return SlicerResult!(searchFun, Range)(input); } private struct SlicerResult(alias searchFun, Range) { private Range _input; private size_t _end = 0; private void findTerminator() { auto r = searchFun(_input.save); _end = _input.length - r.length; } ... //---- And then: "SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln(); "someGGGreatVariableName" .slicer!((s)=>s.dropOne.find!isUpper).writeln(); ["Some", "GGGGreat", "Variable", "Name"] ["some", "G", "G", "Great", "Variable", "Name"] Just ideas.Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, yourWhat do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? Thx.
Jun 11 2014
On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:auto slicer(alias isTerminator, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(unaryFun!isTerminator(input.front)))) { return SlicerResult!(unaryFun!isTerminator, Range)(input); } ...Your solution copied here https://github.com/nordlow/justd/blob/master/slicer.d errors as /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a gorithm.d(5752,24): Error: template std.functional.not!(isUpper).not cannot deduce function from argument types !()(dchar), candidates are: /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/f nctional.d(393,10): std.functional.not!(isUpper).not(T...)(T args) if (is(typeof(!unaryFun!pred(args))) || is(typeof(!binaryFun!pred(args)))) slicer.d(29,31): Error: template instance std.algorithm.find!(not, string) error instantiating slicer.d(16,12): instantiated from here: Slicer!(isUpper, string) slicer.d(85,30): instantiated from here: slicer!(isUpper, string) What's wrong?
Oct 09 2014
On Thursday, 9 October 2014 at 21:55:03 UTC, Nordlöw wrote:On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:My quick guess is you are missing the *global* imports for the restraints. The compiler doesn't complain because the constraint is in a "is(typeof(...))" test. The reason the typeof fails is simply cause the compiler has no idea what unaryFun is.auto slicer(alias isTerminator, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(unaryFun!isTerminator(input.front)))) { return SlicerResult!(unaryFun!isTerminator, Range)(input); } ...Your solution copied here https://github.com/nordlow/justd/blob/master/slicer.d errors as /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a gorithm.d(5752,24): Error: template std.functional.not!(isUpper).not cannot deduce function from argument types !()(dchar), candidates are: /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/functional.d(393,10): std.functional.not!(isUpper).not(T...)(T args) if (is(typeof(!unaryFun!pred(args))) || is(typeof(!binaryFun!pred(args)))) slicer.d(29,31): Error: template instance std.algorithm.find!(not, string) error instantiating slicer.d(16,12): instantiated from here: Slicer!(isUpper, string) slicer.d(85,30): instantiated from here: slicer!(isUpper, string) What's wrong?
Oct 09 2014
On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra wrote:My quick guess is you are missing the *global* imports for the restraints. The compiler doesn't complain because the constraint is in a "is(typeof(...))" test. The reason the typeof fails is simply cause the compiler has no idea what unaryFun is.I don't understand. The restraints are commented out at https://github.com/nordlow/justd/blob/master/slicer.d I made a couple of changes and now it works but I don't quite understand why... Thanks anyway.
Oct 09 2014
On Friday, 10 October 2014 at 05:55:00 UTC, Nordlöw wrote:On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra wrote:Sorry, I read your compile error wrong, and was on my phone. Anyways, I'm a bit worried, because it looks like an internal phobos compile error. May I request you attempt to re-create and potentially reduce the issue? I want to make sure you didn't uncover a bug...My quick guess is you are missing the *global* imports for the restraints. The compiler doesn't complain because the constraint is in a "is(typeof(...))" test. The reason the typeof fails is simply cause the compiler has no idea what unaryFun is.I don't understand. The restraints are commented out at https://github.com/nordlow/justd/blob/master/slicer.d I made a couple of changes and now it works but I don't quite understand why... Thanks anyway.
Oct 10 2014
On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation... For example (using monarch_dodra's test inputs): auto slicer(alias PRED, R)(R r) { import std.algorithm, std.array, std.range; struct Slicer { R r; size_t c; bool empty() property const { return r.empty; } auto front() property { c = r.dropExactly(1).countUntil!PRED()+1; if (c==0) c = r.length; return r.takeExactly(c); } void popFront() { r.popFrontN(c); } } return Slicer(r); } auto slicer(R)(R r) { import std.uni; return slicer!(a=>isUpper(a))(r); } void main() { import std.stdio, std.range; "SomeGreatVariableName" .slicer().writeln(); "someGGGreatVariableName".slicer().join(" ").writeln(); "".slicer().writeln(); "a".slicer().writeln(); "A".slicer().writeln(); } arturEither way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, yourWhat do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
Jun 11 2014
On Wednesday, 11 June 2014 at 11:42:42 UTC, Artur Skawina via Digitalmars-d-learn wrote:On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:I don't know about "starting from scratch" entirely. Maybe not copy paste, but it always helps to have a reference implementation. For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG: string s = "someGGGreatVariableName".slicer().front; Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string That's why the splitter code uses the more verbose "r.length - r.find!pred".Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation... ... arturEither way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, yourWhat do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
Jun 11 2014
On 06/11/14 14:40, monarch_dodra via Digitalmars-d-learn wrote:For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG: string s = "someGGGreatVariableName".slicer().front; Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to stringThat's a problem with D's string handling. For both examples and rarely called code, i always prefer correctness over performance. Of course if that code ends up being perf significant it could be further optimized by adding a specialization for D strings...That's why the splitter code uses the more verbose "r.length - r.find!pred".... but that's the wrong approach. You end up writing a string-specific (or -aware) special version for practically every algorithm... If, instead, you create a string-specific 'countUntil' that returns a type that holds both the byte and code-point counts and implicitly converts to the latter, then you can have a 'takeExactly' overload that uses the extra info to avoid the unnecessary decoding and is able to directly return slices. The overhead is minimal; one extra integer that's passed around, and often completely eliminated by function inlining. But now you don't need to write a string-specific version of every algorithm. (Of course this description is slightly oversimplified) There is a reason why I never use D's std lib. artur
Jun 11 2014
On Wednesday, 11 June 2014 at 13:44:25 UTC, Artur Skawina via Digitalmars-d-learn wrote:There is a reason why I never use D's std lib. arturWell, (IMO) it's a problem with no real solution. But for what it's worth, most (if not all) of the algorithms in the standard lib know how to handle strings efficiently and correctly (split, find, etc...). Things only start getting funny once you start mixing indexing and element counts.
Jun 11 2014
On 06/11/14 16:05, monarch_dodra via Digitalmars-d-learn wrote:Well, (IMO) it's a problem with no real solution. But for what it's worth, most (if not all) of the algorithms in the standard lib know how to handle strings efficiently and correctly (split, find, etc...). Things only start getting funny once you start mixing indexing and element counts.If the recommended approach for writing a really trivial string transformation is to copy and modify ~60 lines of std lib code then something is very wrong. The consequence is that such code will often end up being eager, as it's much easier to write it that way... That is why, after seeing your solution, I wrote the most compact lazy version I could think of - let's not scare people away from using ranges. AFAIUI the OP basically wanted a version of splitter that does not eat the separators and a predicate that keeps around previous state. The latter can already be achieved via a lambda, so the missing bit is an optional splitter template parameter (eg consume=false). So at least in this case the problem could be easily handled. artur
Jun 11 2014
On 06/11/14 15:44, Artur Skawina wrote:If, instead, you create a string-specific 'countUntil' that returns a type that holds both the byte and code-point counts and implicitly converts to the latter, then you can have a 'takeExactly' overload that uses the extra info to avoid the unnecessary decoding and is able to directly return slices. The overhead is minimal; one extra integer that's passed around, and often completely eliminated by function inlining. But now you don't need to write a string-specific version of every algorithm. (Of course this description is slightly oversimplified)Well, for safety, you'd most likely want to pass around the range too (ie string slice, or just the pointer as the min-length is implied); so it's two size_ts, not one. Still, the overhead is tiny and the advantage of not having to special-case for strings everywhere is worth it. artur
Jun 11 2014