digitalmars.D - Reading dchar from UTF-8 stdin
- =?ISO-8859-1?Q?Ali_=C7ehreli?= (28/28) Mar 15 2011 Given that the input stream is UTF-8, it is understandable that the
- spir (50/77) Mar 16 2011 Well, when I try to run that bit of code, I get an error in std.format.
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/28) Mar 16 2011 I use dmd 2.052 on an Ubuntu 10.10 console and compiles fine for me. I
Given that the input stream is UTF-8, it is understandable that the following program pulls just one code unit from the standard input (I think the console encoding is UTF-8 on my Ubuntu 10.10): import std.stdio; void main() { char code; readf(" %s", &code); writeln(code); // <-- may write an incomplete character } ö is represented by two bytes in the UTF-8 encoding. When ö is fed to the input of the program, writeln expression does not produce a complete character on the output. That's understandable with char. Would you expect all of the bytes to be consumed when a dchar was used instead? import std.stdio; void main() { dchar code; // <-- now a dchar readf(" %s", &code); writeln(code); // <-- BUG: uses a code unit as a code point! } When the input is ö, now the output becomes Ã. What would you expect to happen? Ali P.S. As what is written is not the same as what is read above, I am reminded of another issue: would you expect the strings "false" and "true" to be accepted as correct inputs when readf'ed to bool variables?
Mar 15 2011
On 03/15/2011 11:33 PM, Ali Çehreli wrote:Given that the input stream is UTF-8, it is understandable that the following program pulls just one code unit from the standard input (I think the console encoding is UTF-8 on my Ubuntu 10.10): import std.stdio; void main() { char code; readf(" %s", &code); writeln(code); // <-- may write an incomplete character } ö is represented by two bytes in the UTF-8 encoding. When ö is fed to the input of the program, writeln expression does not produce a complete character on the output. That's understandable with char. Would you expect all of the bytes to be consumed when a dchar was used instead? import std.stdio; void main() { dchar code; // <-- now a dchar readf(" %s", &code); writeln(code); // <-- BUG: uses a code unit as a code point! }Well, when I try to run that bit of code, I get an error in std.format. formattedRead (line near the end, marked with "***" below). void formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args) { auto spec = FormatSpec!Char(fmt); static if (!S.length) { spec.readUpToNextSpec(r); enforce(spec.trailing.empty); } else { // The function below accounts for '*' == fields meant to be // read and skipped void skipUnstoredFields() { for (;;) { spec.readUpToNextSpec(r); if (spec.width != spec.DYNAMIC) break; // must skip this field skipData(r, spec); } } skipUnstoredFields(); alias typeof(*args[0]) A; static if (isTuple!A) { foreach (i, T; A.Types) { //writeln("Parsing ", r, " with format ", fmt); (*args[0])[i] = unformatValue!(T)(r, spec); skipUnstoredFields(); } } else { *args[0] = unformatValue!(A)(r, spec); // *** } return formattedRead(r, spec.trailing, args[1 .. $]); } }When the input is ö, now the output becomes Ã. What would you expect to happen?I would expect a whole code representing 'ö'.Ali P.S. As what is written is not the same as what is read above, I am reminded of another issue: would you expect the strings "false" and "true" to be accepted as correct inputs when readf'ed to bool variables?Yep! Denis -- _________________ vita es estrany spir.wikidot.com
Mar 16 2011
On 03/16/2011 02:52 AM, spir wrote:On 03/15/2011 11:33 PM, Ali Çehreli wrote:[...]Given that the input stream is UTF-8I use dmd 2.052 on an Ubuntu 10.10 console and compiles fine for me. I know that there has been changes in formatted input and output lately. Perhaps you use an earlier version?Would you expect all of the bytes to be consumed when a dchar was used instead? import std.stdio; void main() { dchar code; // <-- now a dchar readf(" %s", &code); writeln(code); // <-- BUG: uses a code unit as a code point! }Well, when I try to run that bit of code, I get an error in std.format. formattedRead (line near the end, marked with "***" below).*args[0] = unformatValue!(A)(r, spec); // ***I agree; just opened a bug report: http://d.puremagic.com/issues/show_bug.cgi?id=5743 AliWhen the input is ö, now the output becomes Ã. What would you expect to happen?I would expect a whole code representing 'ö'.
Mar 16 2011