digitalmars.D.bugs - [Issue 14919] New: utf error
- via Digitalmars-d-bugs (28/28) Aug 13 2015 https://issues.dlang.org/show_bug.cgi?id=14919
https://issues.dlang.org/show_bug.cgi?id=14919 Issue ID: 14919 Summary: utf error Product: D Version: D2 Hardware: x86_64 OS: Linux Status: NEW Severity: enhancement Priority: P1 Component: dmd Assignee: nobody puremagic.com Reporter: code dawg.eu Related/Alternative to issue 14519 (see https://issues.dlang.org/show_bug.cgi?id=14519#c24). When I `readText` a file a lot of time is already spent on utf validation. But we don't take advantage of that and revalidate utf in almost every algorithm. The idea from issue 14519 to replace invalid chars with a replacement makes the validation a little cheaper (b/c of the cost of dmd's EH, see issue 12442) but still incurs a high overhead. I suggest that we make a clean distinction between unvalidated ubyte[] data and treat all char/wchar/dchar[] strings as valid. The compiler already checks string literals and a few of string reading functions do it as well. Unfortunately byLine and readln currently don't validate utf. This could be a much more performant approach to correct utf handling. --
Aug 13 2015