digitalmars.D - 8-bit character encodings
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (22/47) Nov 23 2004 I've written some test code for encodings...
- Walter (2/2) Nov 23 2004 There's a Microsoft API function to do this, I think it's
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/6) Nov 23 2004 But that's only on Windows, right ?
- Walter (3/8) Nov 23 2004 Right. I don't know what the corresponding linux API is.
-
Kris
(44/44)
Nov 23 2004
| "Anders F Björklund"
wrote in message - =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (5/14) Nov 23 2004 OK, will check it out. Only difference being: 12 MB versus 32 KB :-)
-
Kris
(17/17)
Nov 24 2004
"Anders F Björklund"
wrote in message news:co1du1 - =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/15) Nov 24 2004 I copied the linux makefile to darwin.make, and tried it.
-
Kris
(31/31)
Nov 24 2004
"Anders F Björklund"
wrote in message - =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (9/18) Nov 24 2004 Also, the Makefile seems a little broken since it recompiles everything?
-
Kris
(23/23)
Nov 24 2004
"Anders F Björklund"
wrote in message | Also, the Makef... - Walter (10/44) Nov 23 2004 hooks.
-
Stewart Gordon
(5/18)
Nov 24 2004
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/14) Nov 24 2004 Because it was a quick and dirty hack, with the
I've written some test code for encodings... They take a mapping (wchar[256]) from ubyte, which defines the 8-bit charset / encoding. Then it can convert to and from Unicode. (such as the default char[] strings in D) The unoptimized D code looks like this:/// converts a 8-bit charset encoding string into unicode char[] decode_string(ubyte[] string, wchar[256] mapping) { wchar[] result; foreach (ubyte c; string) { if (mapping[c] != 0xFFFF) result ~= mapping[c]; } return std.utf.toUTF8(result); }/// converts a unicode string into 8-bit charset encoding ubyte[] encode_string(char[] string, wchar[256] mapping) { ubyte[] result; foreach (wchar c; string) { foreach (int i, wchar m; mapping) { if (c == m) result ~= cast(ubyte) i; } } return result; }I added four mappings, just to have something to test with: iso88591, cp1252, cp437, macroman (each lookup table is 512 bytes, so that's 2K) The ubyte[] can then be used as C (char *), by nul-terminating as usual, for e.g. printf("%s") It works just fine, for both I/O as e.g. Latin-1 It should probably throw an exception or something like that, when it encounters unmapped characters ? (for instance: Win CP-1252 has 5 non-Unicode chars) Surely someone must have written this before ? Just that I couldn't find it in the libraries... --anders PS. The real code builds reverse lookup tables too. (one with chars < 0x0100, and one with the rest) PPS. wchar[] versions left as exercise for the reader. They would avoid all the UTF-8 conversions above.
Nov 23 2004
There's a Microsoft API function to do this, I think it's WideCharToMultiByte() and MultiByteToWideChar().
Nov 23 2004
Walter wrote:There's a Microsoft API function to do this, I think it's WideCharToMultiByte() and MultiByteToWideChar().But that's only on Windows, right ? (got the lookups from unicode.org) --anders
Nov 23 2004
"Anders F Björklund" <afb algonet.se> wrote in message news:co0f4b$28ip$1 digitaldaemon.com...Walter wrote:Right. I don't know what the corresponding linux API is.There's a Microsoft API function to do this, I think it's WideCharToMultiByte() and MultiByteToWideChar().But that's only on Windows, right ? (got the lookups from unicode.org)
Nov 23 2004
| "Anders F Björklund" <afb algonet.se> wrote in message | > Walter wrote: | > > There's a Microsoft API function to do this, I think it's | > > WideCharToMultiByte() and MultiByteToWideChar(). | > | > But that's only on Windows, right ? | > | > (got the lookups from unicode.org) | | Right. I don't know what the corresponding linux API is. Mango.io has optional bindings to any/all of the extensive ICU converters. Stdio is covered there also, so it should probably handle the above case without issue. I'd like to encourage folks to consider Mango.io and Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat biased :-) For those not familiar with Mango, it comprises a set of related packages (the Mango Tree) including: - Cohesive, type-safe, and highly extensible IO package. Now with ICU hooks. Supports all the D types along with all their array variants, and makes it trivial to bind your own classes directly to the IO layer. Provides both the put/get & <</>> syntactical flavors. - Configurable runtime logging, a la Log4J, with a bonus HTML-based manager to dynamically adjust the settings of a running executable. Also hooks into Chainsaw for remote monitoring. - Servlet engine. Supports the best parts of what the Java servlet spec provides, and has better IO. - A customizable and extensible HTTP server (used by the servlet engine). Perhaps the fastest HTTP server available, since it can happily process requests without making a single memory allocation. Just goes to show what thread-locals and D array-slicing can do for performance! Also has a separate HttpClient. - High performance clustering. Based loosely around a Linda design, with aspects of pub/sub and queuing mixed in. Uses D class-serialization to send objects around a cluster, and is easy to use. - Wrappers around the extensive ICU (unicode) project. This currently covers around 85% of the ICU functionality, and includes a very usable unicode-enabled UString class. These packages are available as separate libraries. That is, Mango.icu and Mango.log can be used in complete isolation. Mango.io can also be used standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache leverage the IO package to one degree or another. Beta 9.6 will be released before the week is out, and v1.0 of some packages will occur shortly thereafter. You can find out more about Mango over here: http://www.dsource.org/forums/
Nov 23 2004
Kris wrote:Mango.io has optional bindings to any/all of the extensive ICU converters. Stdio is covered there also, so it should probably handle the above case without issue. I'd like to encourage folks to consider Mango.io and Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat biased :-)OK, will check it out. Only difference being: 12 MB versus 32 KB :-) Put there's probably other neat stuff in there, and had ICU already.These packages are available as separate libraries. That is, Mango.icu and Mango.log can be used in complete isolation. Mango.io can also be used standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache leverage the IO package to one degree or another.Looks extensive! Wonder if it compiles on Darwin ? Hmm, no makefile... --anders
Nov 23 2004
"Anders F Björklund" <afb algonet.se> wrote in message news:co1du1 | | Looks extensive! Wonder if it compiles on Darwin ? Hmm, no makefile... | | --anders I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux makefile will work? This one is compatible with the Beta 9.5 download (accessible via the dsource download section), and I'll update it tomorrow with the Beta 9.6 equivalent (to match the current checkins) http://svn.dsource.org/svn/projects/mango/trunk/ Given that the ICU stuff is so recent, it has not been linked to the *nix libs. The effort to get there is a known (and limited) quantity, but hasn't happened yet. Everything else compiles and links just fine on linux, and the vast majority of it runs without issue (there is one known problem regarding Mango.cluster on that platform). If you'd perhaps be willing to lend a hand regarding Darwin (or with the ICU bindings, or whatever else), that would be great! :-)
Nov 24 2004
Kris wrote:I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux makefile will work? This one is compatible with the Beta 9.5 download (accessible via the dsource download section), and I'll update it tomorrow with the Beta 9.6 equivalent (to match the current checkins)I copied the linux makefile to darwin.make, and tried it. Throwed some errors and then gdc hung on FileConduit.d... I think it was, will post the actual errors on Mango forumGiven that the ICU stuff is so recent, it has not been linked to the *nix libs. The effort to get there is a known (and limited) quantity, but hasn't happened yet. Everything else compiles and links just fine on linux, and the vast majority of it runs without issue (there is one known problem regarding Mango.cluster on that platform).Looks like most of it is POSIX-ish, should be compilable ? --anders
Nov 24 2004
"Anders F Björklund" <afb algonet.se> wrote in message news:co23ph$1k5f$1 digitaldaemon.com... | Kris wrote: | | > I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux | > makefile will work? This one is compatible with the Beta 9.5 download | > (accessible via the dsource download section), and I'll update it tomorrow | > with the Beta 9.6 equivalent (to match the current checkins) | | I copied the linux makefile to darwin.make, and tried it. | Throwed some errors and then gdc hung on FileConduit.d... | | I think it was, will post the actual errors on Mango forum Thanks; I'll check it out ... | | > Given that the ICU stuff is so recent, it has not been linked to the *nix | > libs. The effort to get there is a known (and limited) quantity, but hasn't | > happened yet. Everything else compiles and links just fine on linux, and the | > vast majority of it runs without issue (there is one known problem regarding | > Mango.cluster on that platform). | | Looks like most of it is POSIX-ish, should be compilable ? Yep. We have to provide a little bit of linker glue, in place of the Win32 DLL binding-mechanism. The file ULocale has an example of how this should work. It's not much effort, but it just hasn't been done.
Nov 24 2004
Kris wrote:I'm not sure that anyone has tried it on Darwin as yet. Perhaps the linux makefile will work? This one is compatible with the Beta 9.5 download (accessible via the dsource download section), and I'll update it tomorrow with the Beta 9.6 equivalent (to match the current checkins)Also, the Makefile seems a little broken since it recompiles everything? It should reference the object files, and not the source code directly. Something like:%.o : %.d $(DMD) -c $(DFLAGS) -o $ $< libmango.a : $(OBJECTS) $(AR) -r $ $(OBJECTS)Perhaps adapted to use the $(OBJ) dir? "all", "clean" and "install" targets seems to be missing, by the way. They are phony targets that just references the others or runs shell. One could also add a "check" target, that would run the unit-tests... --anders
Nov 24 2004
"Anders F Björklund" <afb algonet.se> wrote in message | Also, the Makefile seems a little broken since it recompiles everything? | It should reference the object files, and not the source code directly. | | Something like: | | > %.o : %.d | > $(DMD) -c $(DFLAGS) -o $ $< | > | > libmango.a : $(OBJECTS) | > $(AR) -r $ $(OBJECTS) | | Perhaps adapted to use the $(OBJ) dir? That's because it's often faster to recompile everything than doing it piecemeal :-) One of the benefits of D is the speed at which it ploughs through source, leaving tools like make in its wake (so to speak). The latest Win32 make file does things somewhat differently, and is more along the lines of which you speak (builds things a package at a time, rather than the whole enchilada), and the linux makefile is expected to migrate to a similar strategy. There again, I have limited experience with make; and would be more than happy if someone were to do it properly.
Nov 24 2004
That's good work! "Kris" <fu bar.com> wrote in message news:co0pv7$2oh2$1 digitaldaemon.com...Mango.io has optional bindings to any/all of the extensive ICU converters. Stdio is covered there also, so it should probably handle the above case without issue. I'd like to encourage folks to consider Mango.io and Mango.icu as part of any Unicode oriented project. Naturally, I'm somewhat biased :-) For those not familiar with Mango, it comprises a set of related packages (the Mango Tree) including: - Cohesive, type-safe, and highly extensible IO package. Now with ICUhooks.Supports all the D types along with all their array variants, and makes it trivial to bind your own classes directly to the IO layer. Provides boththeput/get & <</>> syntactical flavors. - Configurable runtime logging, a la Log4J, with a bonus HTML-basedmanagerto dynamically adjust the settings of a running executable. Also hooksintoChainsaw for remote monitoring. - Servlet engine. Supports the best parts of what the Java servlet spec provides, and has better IO. - A customizable and extensible HTTP server (used by the servlet engine). Perhaps the fastest HTTP server available, since it can happily process requests without making a single memory allocation. Just goes to show what thread-locals and D array-slicing can do for performance! Also has a separate HttpClient. - High performance clustering. Based loosely around a Linda design, with aspects of pub/sub and queuing mixed in. Uses D class-serialization tosendobjects around a cluster, and is easy to use. - Wrappers around the extensive ICU (unicode) project. This currentlycoversaround 85% of the ICU functionality, and includes a very usable unicode-enabled UString class. These packages are available as separate libraries. That is, Mango.icu and Mango.log can be used in complete isolation. Mango.io can also be used standalone. Mango.cluster, Mango.http, Mango.servlet, and Mango.cache leverage the IO package to one degree or another. Beta 9.6 will be released before the week is out, and v1.0 of somepackageswill occur shortly thereafter. You can find out more about Mango overhere:http://www.dsource.org/forums/
Nov 23 2004
Anders F Björklund wrote:I've written some test code for encodings... They take a mapping (wchar[256]) from ubyte, which defines the 8-bit charset / encoding. Then it can convert to and from Unicode. (such as the default char[] strings in D) The unoptimized D code looks like this:<snip> Why restrict yourself to 8-bit character sets that don't include U+10000 or above? Stewart./// converts a 8-bit charset encoding string into unicode char[] decode_string(ubyte[] string, wchar[256] mapping)
Nov 24 2004
Stewart Gordon wrote:Because it was a quick and dirty hack, with the sole purpose of being able to provide input and output with consoles that don't talk Unicode... ICU has a better "full" implementation of this ? (as used by the Mango library posted here earlier) --andersI've written some test code for encodings... They take a mapping (wchar[256]) from ubyte, which defines the 8-bit charset / encoding.Why restrict yourself to 8-bit character sets that don't include U+10000 or above?
Nov 24 2004