digitalmars.D - Encodings
- Nathan M. Swan (6/6) Apr 08 2012 For most of the string processing I do, I read/write text in
- Jonathan M Davis (14/21) Apr 08 2012 It depends on what you're doing. Depending on the functions that you use...
For most of the string processing I do, I read/write text in UTF-8 and convert it to UTF-32 for processing (with std.utf), so I don't have to worry about encoding. Is this a good or bad paradigm? Is there a better way to do this? What method do all of you use? Just curious, NMS
Apr 08 2012
On Sunday, April 08, 2012 23:36:23 Nathan M. Swan wrote:For most of the string processing I do, I read/write text in UTF-8 and convert it to UTF-32 for processing (with std.utf), so I don't have to worry about encoding. Is this a good or bad paradigm? Is there a better way to do this? What method do all of you use? Just curious, NMSIt depends on what you're doing. Depending on the functions that you use and your memory requirements, UTF-8 may be faster or UTF-32 may be faster. UTF-32 has the advantage of being a random-access range, which will make it work with a number of functions that UTF-8 won't work with. But UTF-32 also takes considerably more memory (especially if most of your characters are ASCII characters), which can be a problem. I think that the most common thing is to just operate on UTF-8 unless another encoding is needed (e.g. UTF-32 is required because random-access is needed), and in plenty of cases, you end up operating on generic ranges anyway if you use range-based functions on strings and don't use std.array.array on them. You're going to have to profile your code to see whether using UTF-8 or UTF-32 primarily in your string-processing is more efficient. - Jonathan M Davis
Apr 08 2012