digitalmars.D - Unichar optimization: performance vs RAM usage. Your opinion?
- Hauke Duden (19/19) Jun 08 2004 I had an idea last night on how to optimize the unichar module to only
- Arcane Jill (9/10) Jun 08 2004 Huh? There are over a million codepoints in Unicode (1,114,112, in fact)...
- Hauke Duden (9/22) Jun 08 2004 Not quite ;). If you look at the data then you'll see that only the
- Kevin Bealer (7/29) Jun 09 2004 I've seen the idea of a two-level table mentioned on one of the Unicode ...
- Arcane Jill (8/17) Jun 08 2004 No bitshifting. You just need tables for blocks of 256 characters, then ...
- Arcane Jill (4/10) Jun 08 2004 That's assuming little-endian architecture, of course. You'd need two ve...
- Hauke Duden (9/35) Jun 08 2004 Splitting the data into blocks and using the same memory for equal
- Walter (4/4) Jun 08 2004 It's such a huge reduction in memory usage, I'd go for it. Reducing memo...
- DemmeGod (4/8) Jun 08 2004 I think what Walter means to say is that it'll never make it into Phobos
I had an idea last night on how to optimize the unichar module to only use about 50 KB instead of the current 2 MB of RAM at runtime. It may also be possible to reduce the impact on the executable size (12 KB currently) but I haven't worked that out fully yet. Unfortunately the change would make the lookup code slower. For example, translating "chr" to lower case would have to be changed from return lowerData[chr] to return chr + lowerData[chr>>9][chr & 511]; In other words instead of one memory lookup it will do two memory lookups, a shift, and AND and an addition. It is not THAT expensive, but in an inner loop it might add up. And people have lots of RAM nowadays - 2 MB is a small amount and since only small parts of it will be actually used (depending on the language the program is working with) most of it will be paged to the disk by the OS. So what do you think: is the lower RAM usage worth the performance decrease? I could also implement both versions and select one at compile time, but then the question is which one is the default? Hauke
Jun 08 2004
In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...return lowerData[chr]Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That must be a ----ing HUGE lookup table! Even if the data were only one byte per codepoint, that's a meg already. Of course it's reasonable to compress that. There are huge swathes of codepoints that are just totally unused (yet), and many, many blocks in which all characters have identical properties. One humungous lookup table is definitely not how I would have done it. Maybe I'm just not enough of a speed freak. Jill
Jun 08 2004
Arcane Jill wrote:In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...Not quite ;). If you look at the data then you'll see that only the first ~200.000 actually have case mappings. That translates to about 2 MB of needed RAM (for all tables together).return lowerData[chr]Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That must be a ----ing HUGE lookup table! Even if the data were only one byte per codepoint, that's a meg already.Of course it's reasonable to compress that. There are huge swathes of codepoints that are just totally unused (yet), and many, many blocks in which all characters have identical properties. One humungous lookup table is definitely not how I would have done it. Maybe I'm just not enough of a speed freak.Dividing the Unicode range into pages and using the same memory for identical ones is exactly the optimization I think about doing. The question is if the modest (by today's standards) decrease in needed RAM is worth the decrease in performance. Hauke
Jun 08 2004
In article <ca4aap$11jk$1 digitaldaemon.com>, Hauke Duden says...Arcane Jill wrote:I've seen the idea of a two-level table mentioned on one of the Unicode sites; it was one of the recommended implementations for a lookup table. I used a version of this at IBM to handle ASCII/EBCDIC/Unicode translation. I think doing an extra table lookup per character will not be a big issue, since the first table (the pointers to the other tables) will probably stay in cache. KevinIn article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...Not quite ;). If you look at the data then you'll see that only the first ~200.000 actually have case mappings. That translates to about 2 MB of needed RAM (for all tables together).return lowerData[chr]Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That must be a ----ing HUGE lookup table! Even if the data were only one byte per codepoint, that's a meg already.Of course it's reasonable to compress that. There are huge swathes of codepoints that are just totally unused (yet), and many, many blocks in which all characters have identical properties. One humungous lookup table is definitely not how I would have done it. Maybe I'm just not enough of a speed freak.Dividing the Unicode range into pages and using the same memory for identical ones is exactly the optimization I think about doing. The question is if the modest (by today's standards) decrease in needed RAM is worth the decrease in performance. Hauke
Jun 09 2004
In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...return chr + lowerData[chr>>9][chr & 511];You can get faster than that.union U { dchar c; ubyte[3] n; } U u; u.c = chr; return lowerData[u.n[2]][u.n[1]][u.n[0]];No bitshifting. You just need tables for blocks of 256 characters, then tables for blocks of 256 pointers to these, then tables for blocks of 17 pointers to these. Duplicate blocks (in particular, those for unassigned codepoints) can simply be the same block. Well, that's how I would have done it. Jill
Jun 08 2004
In article <ca46cu$rfj$1 digitaldaemon.com>, Arcane Jill says...That's assuming little-endian architecture, of course. You'd need two versions to cover both possibilities. Jillunion U { dchar c; ubyte[3] n; }
Jun 08 2004
Arcane Jill wrote:In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...Splitting the data into blocks and using the same memory for equal blocks is indeed the main idea behind the optimization. But doing 3 memory lookups with byte indices is probably slower than doing the bit shift and 2 lookups. Memory is usually a bottleneck and shifts have been 1-cycle operations for ages (even on the 386 IIRC). Pipelining is also possible to some degree. Also, byte-based operations are slower than int-based ones on many CPUs. Haukereturn chr + lowerData[chr>>9][chr & 511];You can get faster than that.union U { dchar c; ubyte[3] n; } U u; u.c = chr; return lowerData[u.n[2]][u.n[1]][u.n[0]];No bitshifting. You just need tables for blocks of 256 characters, then tables for blocks of 256 pointers to these, then tables for blocks of 17 pointers to these. Duplicate blocks (in particular, those for unassigned codepoints) can simply be the same block. Well, that's how I would have done it.
Jun 08 2004
It's such a huge reduction in memory usage, I'd go for it. Reducing memory consumption has its own associated performance speedups - less swapping, less memory for the gc to scan, etc. Keep the faster version there with a version {} statement, though, in case experience shows otherwise.
Jun 08 2004
I think what Walter means to say is that it'll never make it into Phobos using 2mb of ram. Yes? John On Tue, 08 Jun 2004 09:23:08 -0700, Walter wrote:It's such a huge reduction in memory usage, I'd go for it. Reducing memory consumption has its own associated performance speedups - less swapping, less memory for the gc to scan, etc. Keep the faster version there with a version {} statement, though, in case experience shows otherwise.
Jun 08 2004