digitalmars.D.bugs - [Issue 19428] New: std.string.indexOf wrong result with bad unicode
- d-bugmail puremagic.com (34/34) Nov 23 2018 https://issues.dlang.org/show_bug.cgi?id=19428
https://issues.dlang.org/show_bug.cgi?id=19428 Issue ID: 19428 Summary: std.string.indexOf wrong result with bad unicode Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: normal Priority: P3 Component: phobos Assignee: nobody puremagic.com Reporter: dlang-bugzilla thecybershadow.net //////////////////// test.d /////////////////// import std.algorithm.comparison; import std.range; import std.string; void main() { assert(indexOf( only('\uFFFD', '\uFFFD', '\uFFFD'), "\x83\x84\x85", CaseSensitive.yes) == -1); } /////////////////////////////////////////////// Looks like it's replacing bad Unicode with replacement characters under the hood. This becomes worse when something causes the same thing to happen to the haystack, as in this unit test: https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903 Note that this unittest is incorrectly annotated as nothrow/ nogc. We can't use the kind of decoding that substitutes errors with replacement characters, as that will introduce bugs like these. --
Nov 23 2018