D - UTF8/16 always 8/16 bits ?
- Achilleas Margaritis (3/3) Apr 22 2004 The unicode standard says that UTF8 and UTF16 characters vary in size. H...
- Ben Hinkle (10/13) Apr 22 2004 In std.utf
- Scott Egan (10/13) Apr 22 2004 It doesn't although they are called UTF-8 and UTF-16 they are just array...
The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?
Apr 22 2004
On Thu, 22 Apr 2004 11:57:31 +0000 (UTC), Achilleas Margaritis <Achilleas_member pathlink.com> wrote:The unicode standard says that UTF8 and UTF16 characters vary in size. How D handles this ? is it assumed that UTF8 chars are always 8-bits and UTF16 chars are always 16 bits ?In std.utf http://www.digitalmars.com/d/phobos.html#utf there are functions like dchar decode(char[] s, inout uint idx) that take a UTF8 char[] and an index and return the UTF32 codepoint and advances the index by one or more bytes. The regular array indexing [] doesn't know about multi-slot characters. -Ben
Apr 22 2004
It doesn't although they are called UTF-8 and UTF-16 they are just arrays of appropriate lengh chars. The O/S is what really has to deal with them as Unicode. This means of course that using indexes against the char[] and mucking aroung with the data you may end up with invalid unicode. telle est la vie "Achilleas Margaritis" <Achilleas_member pathlink.com> wrote in message news:c68bvb$1vgk$1 digitaldaemon.com...The unicode standard says that UTF8 and UTF16 characters vary in size. HowDhandles this ? is it assumed that UTF8 chars are always 8-bits and UTF16charsare always 16 bits ?
Apr 22 2004