www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Using BOM to auto-detect file encoding

reply "Kai Meyer" <kai unixlords.com> writes:
I would like to know if there exists a 'stream' or 'file' class 
that is able to take a text file with a correct BOM, and an 
'ouput' utf encoding. It want it to be capable of detecting the 
'input' stream utf encoding by using the BOM, and do the encoding 
for me on the way out in the specified 'output' utf encoding.

Right now I am using std.stream.File (which I know is going the 
way of all the earth soon) and manually parsing the BOM myself to 
then choose whether I call 'readLine' or 'readLineW', and then 
subsequently calling 'toUTF8' after that.

It just seems like something like this would be nice to have in 
phobos if it's not already there.
Apr 09 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-04-09 18:25, Kai Meyer wrote:
 I would like to know if there exists a 'stream' or 'file' class that is
 able to take a text file with a correct BOM, and an 'ouput' utf
 encoding. It want it to be capable of detecting the 'input' stream utf
 encoding by using the BOM, and do the encoding for me on the way out in
 the specified 'output' utf encoding.

 Right now I am using std.stream.File (which I know is going the way of
 all the earth soon) and manually parsing the BOM myself to then choose
 whether I call 'readLine' or 'readLineW', and then subsequently calling
 'toUTF8' after that.

 It just seems like something like this would be nice to have in phobos
 if it's not already there.
There is a module in Tango for this, tango.io.UnicodeFile http://dsource.org/projects/tango/docs/current/ https://github.com/SiegeLord/Tango-D2 -- /Jacob Carlborg
Apr 09 2013
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Apr 2013 12:25:17 -0400, Kai Meyer <kai unixlords.com> wrote:

 I would like to know if there exists a 'stream' or 'file' class that is  
 able to take a text file with a correct BOM, and an 'ouput' utf  
 encoding. It want it to be capable of detecting the 'input' stream utf  
 encoding by using the BOM, and do the encoding for me on the way out in  
 the specified 'output' utf encoding.

 Right now I am using std.stream.File (which I know is going the way of  
 all the earth soon) and manually parsing the BOM myself to then choose  
 whether I call 'readLine' or 'readLineW', and then subsequently calling  
 'toUTF8' after that.

 It just seems like something like this would be nice to have in phobos  
 if it's not already there.
The new stream replacement code is capable of doing this, all without much effort. It auto-detects the byte order, and allows you to specify it if you wish. I really need to complete this code. It's long overdue. -Steve
Apr 09 2013