digitalmars.D - ICU D Wrapper
- Jake (10/10) Dec 12 2014 I'm not sure if anyone has noticed, but the D wrapper for C's ICU
- Trent Forkert (27/38) Dec 12 2014 I've looked into writing a binding for ICU recently, but
- Sean Kelly (6/9) Dec 13 2014 Wow... really? You're actually going to write transcoders for
- Trent Forkert (52/63) Dec 13 2014 Running down the icu4c API listing:
- Dmitry Olshansky (5/15) Dec 12 2014 Well I collect ideas/enhancmenets for std.uni so feel free to list
- Jacob Carlborg (6/8) Dec 12 2014 That library is very old, for the days of D1, and not maintained anymore...
I'm not sure if anyone has noticed, but the D wrapper for C's ICU library is far from working it seems. mango.icu is its technical name. I've read articles on the forum about how excited people were to get ICU usable in D, but whoever made mango.icu hasn't made any updates on it or even documented it online. I'm just letting you all know about this. D seems to already have a bunch of ICU's functionality though, so maybe the wrapper died on purpose. I really just wanted to use its time zone and date features since std.datetime doesn't make it very easy to use the TZ database on Windows.
Dec 12 2014
On Friday, 12 December 2014 at 16:51:43 UTC, Jake wrote:I'm not sure if anyone has noticed, but the D wrapper for C's ICU library is far from working it seems. mango.icu is its technical name. I've read articles on the forum about how excited people were to get ICU usable in D, but whoever made mango.icu hasn't made any updates on it or even documented it online. I'm just letting you all know about this. D seems to already have a bunch of ICU's functionality though, so maybe the wrapper died on purpose. I really just wanted to use its time zone and date features since std.datetime doesn't make it very easy to use the TZ database on Windows.I've looked into writing a binding for ICU recently, but ultimately decided to abandon that idea in favor of writing a replacement for it in D. The reasons for this are: * ICU breaks its ABI with every release, meaning a D binding would only work for one version of ICU, and need pragma(mangle) to have a hope of easy updating. Alternatively, what mango.icu seems to have done is load ICU at runtime in order to figure out what library to bind * ICU's data and APIs use UTF-16. I'd rather everything be UTF-8. * ICU's API is incredibly inconvenient for (if not impossible to access from) D. For example, some of the functionality requires binding C++ classes that use multiple inheritance * A decent chunk (though not all) of ICU is actually generated from CLDR, meaning I can do the same It looks to me like mango.icu hasn't updated since ICU v38 (it is up to v54 now), and made extensive use of wrappers in order to hide the C-API nastiness. It also doesn't support any of the functionality that requires C++. Binding ICU would be very nice, but this is one of the few cases I actually think we'd be better off rolling our own. I'm still a little ways off from having my work ready for public release, but I've been making good progress recently. If you can point out what ICU API you need, I'll make sure to included equivalent API in my library. - Trent
Dec 12 2014
On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkert wrote:I've looked into writing a binding for ICU recently, but ultimately decided to abandon that idea in favor of writing a replacement for it in D.Wow... really? You're actually going to write transcoders for all available encodings? Plus the conversion and parsing tools, plus expand our calendar functionality to handle the things it doesn't do now, plus... I mean I'd love it, but the scope of the project can be measured in tens of man-years.
Dec 13 2014
On Saturday, 13 December 2014 at 15:44:59 UTC, Sean Kelly wrote:On Friday, 12 December 2014 at 17:57:41 UTC, Trent Forkert wrote:Running down the icu4c API listing: * Basic Types and Constants - only as needed * Strings and character iteration - Just use D strings, std.string * Unicode character properties and names - I think std.uni handles this * Sets of Unicode Code Points and Strings - ditto * Codepage conversion - ignoring, at least for now. See below. * Unicode text compression - again, I think std.uni handles this * Locales - yes * Resource Bundles - will offer equivalent functionality, just not identical * Normalization - std.uni * Calendars - see below * Date and time formatting - yes * Message formatting - yes * Number formatting / spell-out - yes * Transliteration - yes, but may be delayed until after initial release * Bidirectional Algorithm - not at first, is this in std.uni? * Arabic shaping - not at first, is this in std.uni? * Collation - I'm delaying this until after the initial release to get it out faster * String searching - depends on Collation * Index characters - depends on Collation * Text Boundary analysis - depends on Collation * Regular Expression - use std.regex * StringPrep - not initially, is this in std.uni? * IDNA - not initially, is this in Phobos? * Identifier spoofing and confusability - not initially * Layout engine - delayed, looks like ICU is removing this and pointing to another library * Universal Time Scale - see below * ICU I/O - use phobos There are very few things above that are not possible to generate from CLDR data. Of those, most are RFC-defined algorithms, several of which I believe are already part of Phobos. If I add codepage conversion, it will likely be in terms of iconv on POSIX and MultiByteToWideChar and friends on Windows. Alternatively, I could "borrow" the IBM CDRA/UCM data the way I'm getting almost everything else from CLDR data. Support of other calendar systems is up in the air at the moment. I had thought CLDR contained what I needed, but it looks like it might not. It has locale-specific formatting and display info for calendars, and mappings to when other calendar's eras begin in terms of the Gregorian calendar, but I don't see further breakdown of information. So, initially it looks like I'll only be supporting Gregorian calendar, but I may add the others in the future. It is a lot of work, yes, but the Unicode Consortium already does a significant chunk of it with CLDR. - TrentI've looked into writing a binding for ICU recently, but ultimately decided to abandon that idea in favor of writing a replacement for it in D.Wow... really? You're actually going to write transcoders for all available encodings? Plus the conversion and parsing tools, plus expand our calendar functionality to handle the things it doesn't do now, plus... I mean I'd love it, but the scope of the project can be measured in tens of man-years.
Dec 13 2014
12-Dec-2014 19:51, Jake пишет:I'm not sure if anyone has noticed, but the D wrapper for C's ICU library is far from working it seems. mango.icu is its technical name. I've read articles on the forum about how excited people were to get ICU usable in D, but whoever made mango.icu hasn't made any updates on it or even documented it online. I'm just letting you all know about this. D seems to already have a bunch of ICU's functionality though, so maybe the wrapper died on purpose. I really just wanted to use its time zone and date features since std.datetime doesn't make it very easy to use the TZ database on Windows.Well I collect ideas/enhancmenets for std.uni so feel free to list what's missing and what primitives do you need for TZ database. -- Dmitry Olshansky
Dec 12 2014
On 2014-12-12 17:51, Jake wrote:I'm not sure if anyone has noticed, but the D wrapper for C's ICU library is far from working it seems. mango.icu is its technical name.That library is very old, for the days of D1, and not maintained anymore. There's also this version, that might be more up to date [1]. [1] https://github.com/d-widget-toolkit/com.ibm.icu -- /Jacob Carlborg
Dec 12 2014