digitalmars.D.learn - Reading UTF32 files
- Tim Locke (3/3) Aug 03 2006 How do I read an UTF32 file? Stream only seems to support UTF8 with
- Hasan Aljudy (28/32) Aug 03 2006 I use mango to convert any file into UTF32
- kris (13/56) Aug 03 2006 It's perhaps easier to use UnicodeFile instead:
- Hasan Aljudy (13/30) Aug 03 2006 Ah nice! I didn't know about that.
- kris (10/41) Aug 03 2006 No, but there should be :)
- Markus Dangl (6/8) Aug 10 2006 Just a note:
- Oskar Linde (18/28) Aug 10 2006 In this case, due to a bug or an unfortunate side effect,
- Derek Parnell (9/13) Aug 03 2006 Read them in 4-byte chunks and, depending on endian-ness, convert to a
- Tim Locke (3/10) Aug 04 2006 Thanks. I will try that.
How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW. Thanks
Aug 03 2006
Tim Locke wrote:How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW. ThanksI use mango to convert any file into UTF32 I haven't actually tested it very much .. but I think it should work: --------- static import std.file; import mango = mango.convert.UnicodeBom; version( build ) //TEMP until build learns renamed import syntax { pragma( include, mango.convert.UnicodeBom ) } dchar[] readFile( char[] fileName ) { if( std.file.exists( fileName ) ) return toUtf32( std.file.read( fileName ) ); else throw new Exception("File: " ~ fileName ~ " doesn't exist"); } private { alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder; ///read BOM and decode/convert to utf-32 dchar[] toUtf32(void[] content) { auto decoder = new Utf32Decoder(mango.Unicode.Unknown); return decoder.decode(content); } } ---------
Aug 03 2006
It's perhaps easier to use UnicodeFile instead: Please note that Mango leverages a different IO model than Phobos, so you'll have to compile this along with a few other mango.io modules. Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;) Hasan Aljudy wrote:Tim Locke wrote:How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW. ThanksI use mango to convert any file into UTF32 I haven't actually tested it very much .. but I think it should work: --------- static import std.file; import mango = mango.convert.UnicodeBom; version( build ) //TEMP until build learns renamed import syntax { pragma( include, mango.convert.UnicodeBom ) } dchar[] readFile( char[] fileName ) { if( std.file.exists( fileName ) ) return toUtf32( std.file.read( fileName ) ); else throw new Exception("File: " ~ fileName ~ " doesn't exist"); } private { alias mango.UnicodeBomTemplate!(dchar) Utf32Decoder; ///read BOM and decode/convert to utf-32 dchar[] toUtf32(void[] content) { auto decoder = new Utf32Decoder(mango.Unicode.Unknown); return decoder.decode(content); } } ---------
Aug 03 2006
kris wrote:It's perhaps easier to use UnicodeFile instead:Ah nice! I didn't know about that. I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is? I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...Please note that Mango leverages a different IO model than Phobos, so you'll have to compile this along with a few other mango.io modules.I use build, so I don't really care.Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;)Yes, that would be great. Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled. I always use the -full -clean switches on build anyways.
Aug 03 2006
Hasan Aljudy wrote:kris wrote:No, but there should be :) BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function referenceIt's perhaps easier to use UnicodeFile instead:Ah nice! I didn't know about that. I wish someone had told me about it earlier. Are there any tutorials for mango that explain where everything is? I don't mean the documentation. I mean something that tells you: "if you want to read/decode files, see the documentation for mango.io.UnicodeFile" for example...Me too. Note that DMD *already* pulls in all imported modules during a compilation, and runs one or two stages on each of them ... it just doesn't propogate those modules through the latter stages of compilation and linking ~ choosing to discard them instead. A flag to include them in the compilation and linking stages would be just awesome.Mango typically requires the use of Build to pull in relevant modules, because the combination of D, libraries, and templates just doesn't work reliably at this time. If the compiler front-end were to handle recursive imports natively (like a very simple Build), it would be great! The changes to do so (for DMD) are minimal ;)Yes, that would be great. Just let dmd recursivly compile all imported module, and because dmd is so fast, it doesn't matter even if dmd recompiles modules that have already been compiled. I always use the -full -clean switches on build anyways.
Aug 03 2006
BTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function referenceJust a note: I think all methods that don't take parameters can be called without parens, just like you normally use properties, but it's a bit clearer to use parens here (because "read" should actually be used as a method). To take the reference you'd have to use sth like "auto pointer = &file.read" ...
Aug 10 2006
Markus Dangl wrote:In this case, due to a bug or an unfortunate side effect, auto content = file.read; will neither call file.read() or make content a reference to the function. It will try to make content a function type (as opposed to a reference to a function) which will fail to compile. There is also still at least one case where an empty pair of parentheses are needed at a function call. Array extension methods: void func(int[] t) {} can not be called as: arr.func; Though I'm not sure there is any fundamental reason it has to be that way. All function (reference) and delegate types will also require the parentheses, which is more or less necessary to avoid ambiguities: int delegate() func() { return { return 1; }; } ... func; /OskarBTW: that should probably read "auto content = file.read();" with parens, since otherwise the 'auto' will try to take the function referenceJust a note: I think all methods that don't take parameters can be called without parens, just like you normally use properties, but it's a bit clearer to use parens here (because "read" should actually be used as a method). To take the reference you'd have to use sth like "auto pointer = &file.read" ...
Aug 10 2006
On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW. ThanksRead them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple! -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 4/08/2006 2:15:37 PM
Aug 03 2006
On Fri, 4 Aug 2006 14:16:49 +1000, Derek Parnell <derek nomail.afraid.org> wrote:On Fri, 04 Aug 2006 00:38:18 -0300, Tim Locke wrote:Thanks. I will try that.How do I read an UTF32 file? Stream only seems to support UTF8 with readLine and UTF16 with readLineW. ThanksRead them in 4-byte chunks and, depending on endian-ness, convert to a ulong then cast to a dchar then append to a dchar[] ... simple!
Aug 04 2006