www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Internationalization vs. Unicode

reply "Tyro[17]" <nospam home.com> writes:
There are myriad encoding schemes. D natively supports Unicode and 
provide functionality via phobos. A byproduct of this is that since 
ASCII is a subset of Unicode, it also natively support ASCII. This is a 
plus for the language but what of the other encoding schemes? What 
library functionality is provided to manipulate or convert between those 
encoding schemes and Unicode?

I have a need to convert from CKJ encoding (presently EUC-JP and 
Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there 
a standalone library that does this? If so, can someone point me to it? 
If not, is there planned functionality for inclusion in phobos or am I 
doomed to resorting to Java or some other language to accomplish this 
task (or at least until I'm educated enough to do it myself)?

Thanks,
Andrew
Apr 26 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 26, 2013 at 06:09:48PM -0400, Tyro[17] wrote:
 There are myriad encoding schemes. D natively supports Unicode and
 provide functionality via phobos. A byproduct of this is that since
 ASCII is a subset of Unicode, it also natively support ASCII. This
 is a plus for the language but what of the other encoding schemes?
 What library functionality is provided to manipulate or convert
 between those encoding schemes and Unicode?
 
 I have a need to convert from CKJ encoding (presently EUC-JP and
 Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is
 there a standalone library that does this? If so, can someone point
 me to it? If not, is there planned functionality for inclusion in
 phobos or am I doomed to resorting to Java or some other language to
 accomplish this task (or at least until I'm educated enough to do it
 myself)?
[...] If you're using a Posix system, you could look into the 'recode' utility to convert from those legacy formats to Unicode before using your program on them. You may be able to figure out how to do it by looking at recode's source code. But AFAIK there is no way to do it in D currently. Maybe someone should invent std.recode and submit it for inclusion into Phobos. ;-) T -- People tell me that I'm paranoid, but they're just out to get me.
Apr 26 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-04-27 00:09, Tyro[17] wrote:
 There are myriad encoding schemes. D natively supports Unicode and
 provide functionality via phobos. A byproduct of this is that since
 ASCII is a subset of Unicode, it also natively support ASCII. This is a
 plus for the language but what of the other encoding schemes? What
 library functionality is provided to manipulate or convert between those
 encoding schemes and Unicode?

 I have a need to convert from CKJ encoding (presently EUC-JP and
 Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there
 a standalone library that does this? If so, can someone point me to it?
 If not, is there planned functionality for inclusion in phobos or am I
 doomed to resorting to Java or some other language to accomplish this
 task (or at least until I'm educated enough to do it myself)?
Would ICU do the work? If that's the case you can take a look at this: https://github.com/d-widget-toolkit/com.ibm.icu I will most likely not compile with the latest version of DMD. Also I don't know how complete it is. -- /Jacob Carlborg
Apr 27 2013
parent reply "Tyro[17]" <nospam home.com> writes:
On 4/27/13 6:37 AM, Jacob Carlborg wrote:
 On 2013-04-27 00:09, Tyro[17] wrote:
 There are myriad encoding schemes. D natively supports Unicode and
 provide functionality via phobos. A byproduct of this is that since
 ASCII is a subset of Unicode, it also natively support ASCII. This is a
 plus for the language but what of the other encoding schemes? What
 library functionality is provided to manipulate or convert between those
 encoding schemes and Unicode?

 I have a need to convert from CKJ encoding (presently EUC-JP and
 Shift-JIS) to Unicode. How do I accomplish this using D/Phobos? Is there
 a standalone library that does this? If so, can someone point me to it?
 If not, is there planned functionality for inclusion in phobos or am I
 doomed to resorting to Java or some other language to accomplish this
 task (or at least until I'm educated enough to do it myself)?
Would ICU do the work? If that's the case you can take a look at this: https://github.com/d-widget-toolkit/com.ibm.icu I will most likely not compile with the latest version of DMD. Also I don't know how complete it is.
This might work. Not sure yet. The first thing that caught my eyes is import java.lang.all; import java.math.BigInteger; import java.text.CharacterIterator; import java.text.ParsePosition; import java.util.Comparator; import java.util.Date; and I was immediately confused. What? We can directly import and use Java in D? Let me try this... Oh! No! Not really! We can't. Well, since D uses the file system to organize its files, I should be able to find a java folder with these classes signatures or the D equivalent somewhere in the project folder. No... I don't see one anywhere. Looks like I will have to file ICU on my list of things to get educated about. For now I will continue to use the Java implementation I've got. Thanks.
Apr 29 2013
parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Monday, 29 April 2013 at 18:36:32 UTC, Tyro[17] wrote:

 This might work. Not sure yet. The first thing that caught my 
 eyes is
You'll find the ported Java source: https://github.com/d-widget-toolkit/base/tree/master/src
Apr 29 2013