digitalmars.D.learn - Unicode BOM and endianness
- Tim Locke (3/3) Aug 03 2006 How do I acquire and determine the BOM and endianness of a file I am
- Derek Parnell (8/12) Aug 03 2006 You might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
- Hasan Aljudy (16/27) Aug 03 2006 Are GNU tools really as ignorant of Unicode as that page implies?
- Thomas Kuehne (15/36) Aug 04 2006 -----BEGIN PGP SIGNED MESSAGE-----
- Tim Locke (8/14) Aug 04 2006 I'm sorry but I wasn't clear in what I am looking for.
- Derek (8/27) Aug 04 2006 The phobos library supplied by Walter does not have this functionality. ...
How do I acquire and determine the BOM and endianness of a file I am reading? Thanks
Aug 03 2006
On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:How do I acquire and determine the BOM and endianness of a file I am reading? ThanksYou might check out http://en.wikipedia.org/wiki/Byte_Order_Mark -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 4/08/2006 2:14:46 PM
Aug 03 2006
Derek Parnell wrote:On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:Are GNU tools really as ignorant of Unicode as that page implies? [quote] While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may be used to mark text as UTF-8. Quite a lot of Windows software (including Windows Notepad) adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for configuration) this practice is not recommended, as it will interfere with correct processing of important codes such as the hash-bang at the start of an interpreted script. It may also interfere with source for programming languages that don't recognise it. For example, gcc reports stray characters at the beginning of a source file, and in PHP, if output buffering is disabled, it has the subtle effect of causing the page to start being sent to the browser, preventing custom headers from being specified by the PHP script [/quote]How do I acquire and determine the BOM and endianness of a file I am reading? ThanksYou might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
Aug 03 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hasan Aljudy schrieb am 2006-08-04:Derek Parnell wrote:Let's have 2 UTF-8 files with BOMs: A and B cat A B > C A's BOM will remain a BOM but B's BOM is going to be interpreted as "zero-width no-break space". Thus using BOMs in combination with streaming, concating etc. will allways cause problems. In contrast to Windows, Linux - - home to the GNU tools - treats "text" and "binary" files as "binary" files. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFE076MLK5blCcjpWoRAk2+AKCkpgjpZxJLcTOjcfZLWbfyZqnJgQCgjQTk aVnsQBdsGsq/IehsN4xYAHs= =FlZk -----END PGP SIGNATURE-----On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:Are GNU tools really as ignorant of Unicode as that page implies? [quote] While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may be used to mark text as UTF-8. Quite a lot of Windows software (including Windows Notepad) adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for configuration) this practice is not recommended, as it will interfere with correct processing of important codes such as the hash-bang at the start of an interpreted script.How do I acquire and determine the BOM and endianness of a file I am reading? ThanksYou might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
Aug 04 2006
On Fri, 4 Aug 2006 14:15:00 +1000, Derek Parnell <derek nomail.afraid.org> wrote:On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:I'm sorry but I wasn't clear in what I am looking for. I'm looking to be able to open a file and have D automatically tell me which format it is, e.g. UTF-8, UTF-16LE, UTF-16BE, etc. without my having to code it. Ideally I would like to be able to read any unicode or ascii file and have D automatically detect its type and allow me to read it into whatever format I want, such as char, wchar, dchar.How do I acquire and determine the BOM and endianness of a file I am reading? ThanksYou might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
Aug 04 2006
On Fri, 04 Aug 2006 08:44:21 -0300, Tim Locke wrote:On Fri, 4 Aug 2006 14:15:00 +1000, Derek Parnell <derek nomail.afraid.org> wrote:The phobos library supplied by Walter does not have this functionality. The mango library and maybe others do. I know that I had to code this myself when I needed it. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"On Fri, 04 Aug 2006 00:36:21 -0300, Tim Locke wrote:I'm sorry but I wasn't clear in what I am looking for. I'm looking to be able to open a file and have D automatically tell me which format it is, e.g. UTF-8, UTF-16LE, UTF-16BE, etc. without my having to code it. Ideally I would like to be able to read any unicode or ascii file and have D automatically detect its type and allow me to read it into whatever format I want, such as char, wchar, dchar.How do I acquire and determine the BOM and endianness of a file I am reading? ThanksYou might check out http://en.wikipedia.org/wiki/Byte_Order_Mark
Aug 04 2006