digitalmars.D.learn - std.string inPattern() and UTF symbols
- Fra (8/8) Dec 09 2013 various (UTF) symbols seems to be ignored by inPattern, see
various (UTF) symbols seems to be ignored by inPattern, see http://dpaste.dzfl.pl/e8ff9002 for a quick example (munch() uses inPattern() internally) Is it me doing something in an improper way, or is the documentation lacking more specific limitation of the function? All I can read is "In the future, the pattern syntax may be improved to be more like regular expression character classes". This doesn't sound like "non-ascii symbols are not supported"
Dec 09 2013
On Monday, 9 December 2013 at 14:44:23 UTC, Fra wrote:various (UTF) symbols seems to be ignored by inPattern, see http://dpaste.dzfl.pl/e8ff9002 for a quick example (munch() uses inPattern() internally) Is it me doing something in an improper way, or is the documentation lacking more specific limitation of the function? All I can read is "In the future, the pattern syntax may be improved to be more like regular expression character classes". This doesn't sound like "non-ascii symbols are not supported"Looking at the implementation of inPattern [0], I'd say it is restricted to ASCII. The unittests only cover ASCII, for example. I also smell a unicode bug, due to the combination of foreach and length. [0] https://github.com/D-Programming-Language/phobos/blob/master/std/string.d#L2595
Dec 09 2013
On Monday, 9 December 2013 at 15:58:53 UTC, qznc wrote:I also smell a unicode bug, due to the combination of foreach and length.Bug reported. :) https://d.puremagic.com/issues/show_bug.cgi?id=11712 That is probably not the root of Fras problem, though.
Dec 09 2013
On Monday, 9 December 2013 at 16:10:34 UTC, qznc wrote:That is probably not the root of Fras problem, though.You are right, that was not the root, even if the mistake is extremely simple: foreach(c, s) is used to seek the string. I just realized that foreach can mess things up when used on strings. I can't scroll the feeling this is a pitfall of the language: the code foreach (immutable dchar c; s) writeln("token: ", c); produces deeply different results than foreach (c; s) writeln("token: ", c); see http://dpaste.dzfl.pl/302291fd I understand why foreach would produce such a result, but I guess newcomers will get burnt by this. I will open a bug report for the munch function in the mean time.
Dec 09 2013
On Monday, 9 December 2013 at 16:10:34 UTC, qznc wrote:On Monday, 9 December 2013 at 15:58:53 UTC, qznc wrote:Your ticket: "The following assert fails, but should not. assert(!inPattern('a', "äöüa-z"));" Actually, 'a' IS in the given pattern, so the inPattern should return true, then you negate it and therefore it fails. Long story short, it should fail, and it does. So your bug report is actually incorrect.I also smell a unicode bug, due to the combination of foreach and length.Bug reported. :) https://d.puremagic.com/issues/show_bug.cgi?id=11712 That is probably not the root of Fras problem, though.
Dec 09 2013