digitalmars.D.bugs - [Issue 11017] New: std.string/uni.toLower is very slow
- d-bugmail puremagic.com (26/26) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (15/30) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (8/10) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (7/14) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (9/10) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (12/19) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
- d-bugmail puremagic.com (7/22) Sep 12 2013 http://d.puremagic.com/issues/show_bug.cgi?id=11017
http://d.puremagic.com/issues/show_bug.cgi?id=11017 Summary: std.string/uni.toLower is very slow Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: peter.alexander.au gmail.com 10:52:33 PDT --- char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.toLower; This takes 4.3 seconds on my machine. char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.map!toLower.to!string; This only takes 1.1 seconds. Looking at the code for std.uni.toLower, it appears the string is constructed using repeated ~=. It should use an Appender of some sort. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dmitry.olsh gmail.com 11:59:08 PDT ---char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.toLower; This takes 4.3 seconds on my machine. char[] s = new char[10_000_000]; s[] = 'A'; auto s2 = s.map!toLower.to!string; This only takes 1.1 seconds.There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S).Looking at the code for std.uni.toLower, it appears the string is constructed using repeated ~=. It should use an Appender of some sort.This indeed could be fixed I do suspect put an optimisitc reserve(original.length) there would work even better. See also issue 10864: http://d.puremagic.com/issues/show_bug.cgi?id=10864 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 12:45:45 PDT ---There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S).Good point, although std.uni.toUpper doesn't handle it either :-) assert("ß".toUpper == "ß"); // passes -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 12:50:37 PDT ---To Lower will do. Sharp S is capital ;) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S).Good point, although std.uni.toUpper doesn't handle it either :-) assert("ß".toUpper == "ß"); // passes
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 12:52:31 PDT ---To Lower will do. Sharp S is capital ;)assert("ß".toLower == "ß"); assert("ß".toUpper == "ß"); Both pass. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 14:01:05 PDT ---Something wicked have happend. I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently. How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------To Lower will do. Sharp S is capital ;)assert("ß".toLower == "ß"); assert("ß".toUpper == "ß"); Both pass.
Sep 12 2013
http://d.puremagic.com/issues/show_bug.cgi?id=11017 14:07:17 PDT ---P.S. And there are both kinds of sharp s ... \u1E9E and \u00df -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------Something wicked have happend. I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently. How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper.To Lower will do. Sharp S is capital ;)assert("ß".toLower == "ß"); assert("ß".toUpper == "ß"); Both pass.
Sep 12 2013