digitalmars.D.bugs - [Issue 18844] New: std.utf.decode skips valid character on invalid
- d-bugmail puremagic.com (21/21) May 09 2018 https://issues.dlang.org/show_bug.cgi?id=18844
https://issues.dlang.org/show_bug.cgi?id=18844 Issue ID: 18844 Summary: std.utf.decode skips valid character on invalid multibyte sequence Product: D Version: D2 Hardware: x86_64 OS: Linux Status: NEW Severity: enhancement Priority: P1 Component: phobos Assignee: nobody puremagic.com Reporter: default_357-line yahoo.de When decoding an invalid UTF-8 string, like cast(string) [cast(ubyte) 'ä', 't'], with Yes.useReplacementDchar, std.utf.decode will advance the cursor past the letter where the multibyte sequence hit an error, even if that letter is in itself a valid start of a new byte sequence. As a result, decode will advance the index to 2, leading the string to decode as "�" when it should decode as "�t". --
May 09 2018