digitalmars.D.learn - recognizing asciiz, utf ...
- newbee (4/4) Mar 13 2009 Hi all,
- Jarrett Billingsley (10/14) Mar 13 2009 =92t know if it is asciiz or utf or wchar. Is it possible to find out in...
- newbee (2/18) Mar 13 2009 i get it from a tcp buffer and do not know in advace if it is char[], as...
- Daniel Keep (5/25) Mar 13 2009 If you're getting data from a network connection and you have no idea
- Sergey Gromov (10/14) Mar 14 2009 There is some redundancy in UTF-8 format so you can test if your string
- newbee (2/20) Mar 15 2009 thank you kindly. this explanation really helped me. i will try that.
- Sergey Gromov (4/26) Mar 15 2009 You're welcome.
Hi all, How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don’t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2? Any help is appreciated.
Mar 13 2009
On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:Hi all, How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don==92t know if it is asciiz or utf or wchar. Is it possible to find out in dm= d1 and dmd2?Any help is appreciated.How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
Mar 13 2009
Jarrett Billingsley Wrote:On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:i get it from a tcp buffer and do not know in advace if it is char[], asciiz or wchar. is it possible to check for that?Hi all, How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don’t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2? Any help is appreciated.How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
Mar 13 2009
newbee wrote:Jarrett Billingsley Wrote:If you're getting data from a network connection and you have no idea what it is, then the language certainly isn't going to help you with that. Perhaps reading the documentation for the network protocol is in order? :P -- DanielOn Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:i get it from a tcp buffer and do not know in advace if it is char[], asciiz or wchar. is it possible to check for that?Hi all, How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don�t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2? Any help is appreciated.How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
Mar 13 2009
Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don’t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
Mar 14 2009
Sergey Gromov Wrote:Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:thank you kindly. this explanation really helped me. i will try that.How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don�t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
Mar 15 2009
Sun, 15 Mar 2009 05:20:08 -0400, newbee wrote:Sergey Gromov Wrote:You're welcome. I just realized that there are wchar[] and dchar[] versions of std.utf.validate(). This should make your test really straight-forward.Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:thank you kindly. this explanation really helped me. i will try that.How does one check for asciiz, utf ...? I do get a buffer with characters as parameter in a function, but i don�t know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
Mar 15 2009