www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 7085] New: std.algorithm.reverse() problem with Unicode dchar[]

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7085

           Summary: std.algorithm.reverse() problem with Unicode dchar[]
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



This code compiles and runs raising no assert error, so reverse() is giving a
wrong result on a dchar[]:


import std.algorithm: reverse;
void main() {
    dchar[] txt = "\U00000041\U00000308\U00000042"d.dup;
    txt.reverse();
    assert(txt == "\U00000042\U00000308\U00000041"d);
}


txt contains LATIN CAPITAL LETTER A, COMBINING DIAERESIS, LATIN CAPITAL LETTER
B. See bug 7084 for more details.

A more correct output for reversing txt is (LATIN CAPITAL LETTER B, LATIN
CAPITAL LETTER A, COMBINING DIAERESIS):

"\U00000042\U00000041\U00000308"d

or even (LATIN CAPITAL LETTER B, LATIN CAPITAL LETTER A WITH DIAERESIS) (but
this changes the array size and it's not necessary):

"\U00000042\U000000C4"d

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 09 2011
parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7085


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |jmdavisProg gmx.com
         Resolution|                            |INVALID



PST ---
No, this behavior is as-designed. You're misunderstanding dchars. A dchar is a
UTF-32 code unit, which is then guaranteed to be a code point. When you reverse
a range of dchar - be it a dchar[] or some other data structure - the code
points are reversed. It doesn't take graphemes into account _at all_. If you
want to reverse a string based an graphemes, you need to have a range of
graphemes not a range of dchar. Phobos does not currently have support for a
range of graphemes, which makes that quite a bit harder to do, but until then,
all ranges of characters are ranges of dchar, and any function which operates
on a range is going to treat them as ranges of dchar, not graphemes, so reverse
is going to reverse code points, even if that's not what the programmer really
wanted.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 09 2011