www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - auto-decoding

reply auto <auto dlang.dec> writes:
What is auto decoding and why it is a problem?
Mar 31 2018
parent reply Uknown <sireeshkodali1 gmail.com> writes:
On Sunday, 1 April 2018 at 01:19:08 UTC, auto wrote:
 What is auto decoding and why it is a problem?
Auto-decoding is essentially related to UTF representation of Unicode strings. In D, `char[]` and `string` represent UTF8 strings, `wchar[]` and `wstring` represent UTF16 strings and `dchar[]` and `dstring` represent UTF32 strings. You need to know how UFT works in order to understand auto-decoding. Since in practice most code deals with UTF8 I'll explain wrt that. Essentially, the problem comes down to the fact that not all the Unicode characters are representable by 8 bit `char`s (for UTF8). Only the ASCII stuff is represented by the "normal" way. UTF8 uses the fact that the first few buts in a char are never used in ASCII, to tell how many more `char`s ahead that character is encoded in. You can watch this video for a better understanding[0]. By default though, if one were to traverse a `char` looking for characters, they would get unexpected results with Unicode data Auto-decoding tries to solve this by automatically applying the algorithm to decode the characters to Unicode "Code-Points". This is where my knowledge ends though. I'll give you pros and cons of auto-decoding. Pros: * It makes Unicode string handeling much more easier for beginners. * Much less effort in general, it seems to "just work™" Cons: * It makes string handling slow by default * It may be the wrong thing, since you may not want Unicode code-points, but graphemes instead. * Auto-decoding throws exceptions on reaching invalid code-points, so all string handling code in general throws exceptions. If you want to stop auto-decoding, you can use std.string.representation like this: import std.string : representation; auto no_decode = some_string.representation; Now no_decode wont be auto-decoded, and you can use it in place of some_string. You can also use std.utf to decode by graphemes instead. You should also read this blog post: https://jackstouffer.com/blog/d_auto_decoding_and_you.html And this forum post: https://forum.dlang.org/post/eozguhavggchzzruzkwk forum.dlang.org [0]: https://www.youtube.com/watch?v=MijmeoH9LT4
Mar 31 2018
parent Seb <seb wilzba.ch> writes:
On Sunday, 1 April 2018 at 02:44:32 UTC, Uknown wrote:
 If you want to stop auto-decoding, you can use 
 std.string.representation like this:

 import std.string : representation;
 auto no_decode = some_string.representation;

 Now no_decode wont be auto-decoded, and you can use it in place 
 of some_string. You can also use std.utf to decode by graphemes 
 instead.
.representation gives you an const(ubyte)[] What you typically want is const(char)[], for this you can use std.utf.byCodeUnit https://dlang.org/phobos/std_utf.html#byCodeUnit There's also this good article: https://tour.dlang.org/tour/en/gems/unicode
Apr 01 2018