www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 7689] New: splitter() on ivalid UTF-8 sequences

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7689

           Summary: splitter() on ivalid UTF-8 sequences
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



Is this difference/inconsistency between split() and splitter() desired and
good?


import std.string, std.array, std.algorithm, std.range;
void main() {
    char[] s = cast(char[])[131, 64, 32, 251, 22];
    assert(std.string.split(s).length == 2); // no error
    assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence
    assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8
sequence
}


Output, DMD 2.059head:

std.utf.UTFException std\utf.d(645): Invalid UTF-8 sequence (at index 1)
----------------
...\dmd2\src\phobos\std\array.d(469): dchar
std.array.front!(char[]).front(char[])
...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28...
...\dmd2\src\phobos\std\range.d(971): D3std5range97__...
----------------

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2012
parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7689


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |monarchdodra gmail.com
         AssignedTo|nobody puremagic.com        |monarchdodra gmail.com




 Is this difference/inconsistency between split() and splitter() desired and
 good?
 
 
 import std.string, std.array, std.algorithm, std.range;
 void main() {
     char[] s = cast(char[])[131, 64, 32, 251, 22];
     assert(std.string.split(s).length == 2); // no error
     assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence
     assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8
 sequence
 }
 
 
 Output, DMD 2.059head:
 
 std.utf.UTFException std\utf.d(645): Invalid UTF-8 sequence (at index 1)
 ----------------
 ...\dmd2\src\phobos\std\array.d(469): dchar
 std.array.front!(char[]).front(char[])
 ...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28...
 ...\dmd2\src\phobos\std\range.d(971): D3std5range97__...
 ----------------
This is a bug in string.split (which is actually a public import of array.split). Currently array.split only supports ascii white, and is oblivious to longer utf whites (but it does work on unicode). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 22 2012