www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.string inPattern() and UTF symbols

reply "Fra" <Fra b.it> writes:
various (UTF) symbols seems to be ignored by inPattern, see 
http://dpaste.dzfl.pl/e8ff9002 for a quick example (munch() uses 
inPattern() internally)

Is it me doing something in an improper way, or is the 
documentation lacking more specific limitation of the function? 
All I can read is "In the future, the pattern syntax may be 
improved to be more like regular expression character classes". 
This doesn't sound like "non-ascii symbols are not supported"
Dec 09 2013
parent reply "qznc" <qznc web.de> writes:
On Monday, 9 December 2013 at 14:44:23 UTC, Fra wrote:
 various (UTF) symbols seems to be ignored by inPattern, see 
 http://dpaste.dzfl.pl/e8ff9002 for a quick example (munch() 
 uses inPattern() internally)

 Is it me doing something in an improper way, or is the 
 documentation lacking more specific limitation of the function? 
 All I can read is "In the future, the pattern syntax may be 
 improved to be more like regular expression character classes". 
 This doesn't sound like "non-ascii symbols are not supported"
Looking at the implementation of inPattern [0], I'd say it is restricted to ASCII. The unittests only cover ASCII, for example. I also smell a unicode bug, due to the combination of foreach and length. [0] https://github.com/D-Programming-Language/phobos/blob/master/std/string.d#L2595
Dec 09 2013
parent reply "qznc" <qznc web.de> writes:
On Monday, 9 December 2013 at 15:58:53 UTC, qznc wrote:
 I also smell a unicode bug, due to the combination of foreach 
 and length.
Bug reported. :) https://d.puremagic.com/issues/show_bug.cgi?id=11712 That is probably not the root of Fras problem, though.
Dec 09 2013
next sibling parent "Fra" <Fra b.it> writes:
On Monday, 9 December 2013 at 16:10:34 UTC, qznc wrote:
 That is probably not the root of Fras problem, though.
You are right, that was not the root, even if the mistake is extremely simple: foreach(c, s) is used to seek the string. I just realized that foreach can mess things up when used on strings. I can't scroll the feeling this is a pitfall of the language: the code foreach (immutable dchar c; s) writeln("token: ", c); produces deeply different results than foreach (c; s) writeln("token: ", c); see http://dpaste.dzfl.pl/302291fd I understand why foreach would produce such a result, but I guess newcomers will get burnt by this. I will open a bug report for the munch function in the mean time.
Dec 09 2013
prev sibling parent "Fra" <Fra b.it> writes:
On Monday, 9 December 2013 at 16:10:34 UTC, qznc wrote:
 On Monday, 9 December 2013 at 15:58:53 UTC, qznc wrote:
 I also smell a unicode bug, due to the combination of foreach 
 and length.
Bug reported. :) https://d.puremagic.com/issues/show_bug.cgi?id=11712 That is probably not the root of Fras problem, though.
Your ticket: "The following assert fails, but should not. assert(!inPattern('a', "äöüa-z"));" Actually, 'a' IS in the given pattern, so the inPattern should return true, then you negate it and therefore it fails. Long story short, it should fail, and it does. So your bug report is actually incorrect.
Dec 09 2013