www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 19428] New: std.string.indexOf wrong result with bad unicode

https://issues.dlang.org/show_bug.cgi?id=19428

          Issue ID: 19428
           Summary: std.string.indexOf wrong result with bad unicode
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: phobos
          Assignee: nobody puremagic.com
          Reporter: dlang-bugzilla thecybershadow.net

//////////////////// test.d ///////////////////
import std.algorithm.comparison;
import std.range;
import std.string;

void main()
{
    assert(indexOf(
            only('\uFFFD', '\uFFFD', '\uFFFD'),
            "\x83\x84\x85",
            CaseSensitive.yes) == -1);
}
///////////////////////////////////////////////

Looks like it's replacing bad Unicode with replacement characters under the
hood.

This becomes worse when something causes the same thing to happen to the
haystack, as in this unit test:

https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903

Note that this unittest is incorrectly annotated as nothrow/ nogc. We can't use
the kind of decoding that substitutes errors with replacement characters, as
that will introduce bugs like these.

--
Nov 23 2018