www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 5257] New: std.algorithm.count works incorrectly with UTF8 and UTF16 strings

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257

           Summary: std.algorithm.count works incorrectly with UTF8 and
                    UTF16 strings
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Mac OS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: andrei metalanguage.com



10:54:01 PST ---
import std.stdio;
import std.algorithm;

void main() {
  writeln(count!("true")("日本語")); // Three characters.
}

The code prints 9 but should print 3.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 22 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com



10:54:48 PST ---
Submitted on behalf of Rainer Deyke.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257


jakobovrum gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakobovrum gmail.com



This is almost entirely off-topic, but I don't think such a tiny change
deserves its own issue... sorry if I should have :(

When this gets fixed, count() will be useful as a generic way to count the
amount of code points in a UTF encoded string. But I don't think the interface
is very pretty for this simple use case.

As a completely non-breaking change, how about changing:
size_t count(alias pred, Range)(Range r) if (isInputRange!(Range))

to:
size_t count(alias pred = "true", Range)(Range r) if (isInputRange!(Range))

So one could simply do count("日本語")?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Masahiro Nakagawa <repeatedly gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|andrei metalanguage.com     |repeatedly gmail.com



07:18:51 PST ---
Created an attachment (id=831)
Patch for this issue.

I wrote a simple patch. This patch decodes each char types to dchar and passes
predication.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED



14:51:45 PST ---
Thanks, Masahiro. I fixed with simpler means that don't need special casing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 25 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5257




21:48:06 PST ---

 Thanks, Masahiro. I fixed with simpler means that don't need special casing.
Good! Are you going to deprecate std.utf.count? std.algorithm.count(now, default pred is "true") and std.utf.count seem to be duplicate. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 25 2010