digitalmars.D.bugs - stream.readLine
- bobef (1/1) Jan 23 2007 The implementation of stream.readLine() threats char.init as EOF, which ...
- Frits van Bommel (5/6) Jan 23 2007 No, char.init is 255 which is an invalid byte in UTF-8 data.
- bobef (2/2) Jan 23 2007 Then it is impossible to use the readLine() function to read non-utf8 st...
- Frits van Bommel (19/21) Jan 23 2007 InputStream.readLine (which I presume is the one you mean) returns an
The implementation of stream.readLine() threats char.init as EOF, which is not right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should be 0.
Jan 23 2007
bobef wrote:The implementation of stream.readLine() threats char.init as EOF, which is not right because char.init is 255 (which is ÿ in Cyrillic). I believe EOF should be 0.No, char.init is 255 which is an invalid byte in UTF-8 data. Codepoint 255 *is* ÿ, IIRC, but char doesn't store codepoints. It stores UTF-8 bytes (code units?). Forgive me if I got the terminology wrong.
Jan 23 2007
Then it is impossible to use the readLine() function to read non-utf8 streams? If it is so this sucks ass, because I have to read the stream to convert it to utf8, because obviously I can't force any stream out there to be utf8 just because D likes it :)
Jan 23 2007
bobef wrote:Then it is impossible to use the readLine() function to read non-utf8 streams?InputStream.readLine (which I presume is the one you mean) returns an UTF-8 string. It doesn't mention in what format it is read. If someone wants to implement it to read a non-UTF string from somewhere and then convert it to UTF-8 and return it, that's a perfectly valid implementation.If it is so this sucks ass, because I have to read the stream to convert it to utf8, because obviously I can't force any stream out there to be utf8 just because D likes it :)A conversion stream may not be so hard to implement. Just create an object implementing InputStream and pass another InputStream to its constructor. Or you can even inherit it directly from std.stream.File, forward the constructors, and only override the readLine* functions. Then if you're reading a file formatted in some ASCII + extended codepage format, you just need a lookup table (or conversion function) to convert the last 128 values to the corresponding UTF codepoints and use std.utf.encode. For Latin-1 data it's even simpler, just pass it straight to std.utf.encode. You'll probably want to use the read(inout ubyte) method to read such a file. The process for other text formats is probably similar, perhaps using other read() overloads to read it (for multi-byte encodings). (Warning: I've never actually implemented a Stream, so the above may well be riddled with errors and misinformation :) )
Jan 23 2007