digitalmars.D - Short strings optimization
- bearophile (8/8) Feb 18 2011 I've found this simple blog post:
I've found this simple blog post: http://journal.thobe.org/2011/02/better-support-for-short-strings-in.html It suggests the creation of a ShortArray struct, that like arrays is size_t.sizeof*2, that's usable for short arrays and short strings. In 64 bit mode that's 16 bytes long, just like a dynamic array. The higher bit of the struct may denote a "short array". If it's 0, then the the second nibble of the first byte represents the length (even if it's an array of bytes its length can't be more than 15), otherwise the struct contains a length/ptr pair, like a normal dynamic array. This is able to remove many heap allocations (without any compression or tricks), because in programs a good percentage of strings are short. Every time an item of the array is accessed, the struct opIndex/opIndexAssign has to test the first bit, this lowers the performance, but there are no pointers to follow in other parts of the memory, so the overall performance is probably not much different, and if the array is in RAM instead of caches, it may be even faster. The ptr is not always a pointer, so the GC needs to be aware of this and do not follow the pointer if the first bit of the struct is 0. To manage this cleanly, it's useful a standard method like onGCScan() I have suggested time ago, to specify the runtime semantics for the garbage collector. Bye, bearophile
Feb 18 2011