digitalmars.D - Unicode, graphemes and D
- bearophile (7/7) Apr 05 2012 For people interested in a better Unicode handling in D, I have seen tha...
- Dmitry Olshansky (5/12) Apr 05 2012 FYI
- stephan (7/9) Apr 05 2012 Maybe helpful for your GSOC project: as part of a larger code
- Dmitry Olshansky (6/17) Apr 05 2012 Nice.
- stephan (18/19) Apr 05 2012 Ah, the licensing question. I am not a lawyer and I don't know
For people interested in a better Unicode handling in D, I have seen that Perl has some support for graphemes, /\X/ matches an extended grapheme cluster: http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul http://perldoc.perl.org/perluniprops.html Perl seems one of the best languages to manage Unicode (D and Go too are good): http://rosettacode.org/wiki/String_length#Grapheme_Length_2 Bye, bearophile
Apr 05 2012
On 05.04.2012 15:53, bearophile wrote:For people interested in a better Unicode handling in D, I have seen that Perl has some support for graphemes, /\X/ matches an extended grapheme cluster: http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul http://perldoc.perl.org/perluniprops.html Perl seems one of the best languages to manage Unicode (D and Go too are good): http://rosettacode.org/wiki/String_length#Grapheme_Length_2 Bye, bearophileFYI -- Dmitry Olshansky
Apr 05 2012
FYIMaybe helpful for your GSOC project: as part of a larger code base, we have implemented many standard Unicode algorithms (normalization; casefolding; graphemes; info like general category, Bidi class, joining type, etc.; ...). The doc and source can be found at http://stephan.bitbucket.org/. As this was just a helper, it is not fully polished (but it works and is reasonably fast).
Apr 05 2012
On 05.04.2012 18:56, stephan wrote:Nice. I'll add a link to my proposal. Though I can use it iff the license is Boost compatible. -- Dmitry OlshanskyFYIMaybe helpful for your GSOC project: as part of a larger code base, we have implemented many standard Unicode algorithms (normalization; casefolding; graphemes; info like general category, Bidi class, joining type, etc.; ...). The doc and source can be found at http://stephan.bitbucket.org/. As this was just a helper, it is not fully polished (but it works and is reasonably fast).
Apr 05 2012
On Thursday, 5 April 2012 at 16:17:46 UTC, Dmitry Olshansky wrote:Though I can use it iff the license is Boost compatible.Ah, the licensing question. I am not a lawyer and I don't know much about copyright law. So you have to do your own research. But here is my view regarding the unicodedata.d license situation. Our code is Boost licensed. It is however not a clean-room installation. Although almost all algorithms and data structures are different and there is minimal (and clearly marked) direct copying, we have looked quite a bit at the ICU implementation (and its predecessors) for inspiration. The ICU license is very permissive, hence you should be ok here. Furthermore, data files from the Unicode Consortium are part of the distribution. They are used in the "script mode" (version SCRIPT_DATA) to generate the relevant Unicode data in an appropriate format. Furthermore, they are used in the extensive unit tests (version ALL_UNIT_TESTS) for testing correctness against various test files and derived property files. Again, the data files have a very permissive license. Let me know if I can be of any help.
Apr 05 2012