digitalmars.D - BitArray implementation issue
- safety0ff (35/35) Jul 22 2014 Currently the implementation of BitArray's length setter is badly
- H. S. Teoh via Digitalmars-d (23/60) Jul 22 2014 You might want to consider implementing a way of tracking how many bits
- safety0ff (6/18) Jul 22 2014 In brief, you're suggesting to always realloc on extension, no
- safety0ff (2/3) Jul 22 2014 Nevermind this. :o)
- safety0ff (3/6) Jul 22 2014 Actually, you'd need to allocate and copy on each extension, not
- safety0ff (6/15) Jul 22 2014 Sorry, being too eager to reply.
- H. S. Teoh via Digitalmars-d (9/16) Jul 22 2014 Hmm. It looks like the code is already counting bits rather than words.
Currently the implementation of BitArray's length setter is badly broken. Its implementation contains this code segment: if (newdim != olddim) { auto b = ptr[0 .. olddim]; b.length = newdim; // realloc ptr = b.ptr; if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0 ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1))); } There are many bugs in the last 2 lines: 1) If we're increasing the length, ptr[newdim-1] is necessarily zero, so we're doing nothing. 2) If we're reducing the length, we're essentially clearing arbitrary bits whenever the branch is taken, furthermore: 3) On 64 bit, we're always clearing the upper 32 bits of the last word. 4) On 64 bit, we have undefined behaviour from the left shift. While trying to fix this the question was raised: should we even clear any bits? The underlying array can be shared between BitArray's (assignment behaves like dynamic array assignment.) This means that clearing the bits might affect other BitArray's if the extension does not cross a size_t boundary. So it is difficult to mimic D dynamic array semantics at a bit level without a reference counting mechanism. Furthermore, a BitArray can function as an arbitrary data reinterpreter via the function: void init(void[] v, size_t numbits) Help is needed for deciding whether setting length should zero the new bits (even if it means stomping other BitArrays.) Currently I've implemented zero'ing bits upon extension in my PR [1], but it is stalled on this design issue. [1] https://github.com/D-Programming-Language/phobos/pull/2249
Jul 22 2014
On Tue, Jul 22, 2014 at 09:16:35PM +0000, safety0ff via Digitalmars-d wrote:Currently the implementation of BitArray's length setter is badly broken. Its implementation contains this code segment: if (newdim != olddim) { auto b = ptr[0 .. olddim]; b.length = newdim; // realloc ptr = b.ptr; if (newdim & (bitsPerSizeT-1)) // Set any pad bits to 0 ptr[newdim - 1] &= ~(~0 << (newdim & (bitsPerSizeT-1))); } There are many bugs in the last 2 lines: 1) If we're increasing the length, ptr[newdim-1] is necessarily zero, so we're doing nothing. 2) If we're reducing the length, we're essentially clearing arbitrary bits whenever the branch is taken, furthermore: 3) On 64 bit, we're always clearing the upper 32 bits of the last word. 4) On 64 bit, we have undefined behaviour from the left shift.Wow. Looks like this is in dire need of some TLC.While trying to fix this the question was raised: should we even clear any bits?You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea.The underlying array can be shared between BitArray's (assignment behaves like dynamic array assignment.) This means that clearing the bits might affect other BitArray's if the extension does not cross a size_t boundary. So it is difficult to mimic D dynamic array semantics at a bit level without a reference counting mechanism.I think it's best to keep a bit count of the number of valid bits at the end of the array, in addition to the number of words in the underlying array. You might also want to consider keeping a bit count of the number of valid bits at the *start* of the array, so that it's possible to bit-level slicing in a transparent way.Furthermore, a BitArray can function as an arbitrary data reinterpreter via the function: void init(void[] v, size_t numbits) Help is needed for deciding whether setting length should zero the new bits (even if it means stomping other BitArrays.)It shouldn't. Keeping track of the number of valid bits in the last word should solve this problem.Currently I've implemented zero'ing bits upon extension in my PR [1], but it is stalled on this design issue. [1] https://github.com/D-Programming-Language/phobos/pull/2249I think the best solution is to keep an additional count of the number of valid bits in the last word of the array. Then you don't have to worry about stomping other BitArrays, and you can potentially implement bit-level slicing too. T -- IBM = I Blame Microsoft
Jul 22 2014
On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via Digitalmars-d wrote:You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea.In brief, you're suggesting to always realloc on extension, no matter if it's a sub-word extension. This solves the stomping issue nicely, but it would cause a lot of GC churn in a concatenation loop.
Jul 22 2014
On Wednesday, 23 July 2014 at 02:33:39 UTC, safety0ff wrote:but it would cause a lot of GC churn in a concatenation loop.Nevermind this. :o)
Jul 22 2014
On Wednesday, 23 July 2014 at 02:33:39 UTC, safety0ff wrote:In brief, you're suggesting to always realloc on extension, no matter if it's a sub-word extension. This solves the stomping issue nicelyActually, you'd need to allocate and copy on each extension, not realloc, because realloc would do nothing on subword extension.
Jul 22 2014
On Wednesday, 23 July 2014 at 00:59:38 UTC, H. S. Teoh via Digitalmars-d wrote:I think it's best to keep a bit count of the number of valid bits at the end of the array, in addition to the number of words in the underlying array. You might also want to consider keeping a bit count of the number of valid bits at the *start* of the array, so that it's possible to bit-level slicing in a transparent way.Sorry, being too eager to reply. We would need to use a secondary pointer to the count to avoid breaking code that uses the void init(void[], size_t) function which allows arbitrary bit twiddling of the passed in array.
Jul 22 2014
On Tue, Jul 22, 2014 at 05:57:48PM -0700, H. S. Teoh via Digitalmars-d wrote: [....]You might want to consider implementing a way of tracking how many bits in the final word are valid. That way, you can correctly trigger reallocation if the user tries to write to a bit beyond the current end of the array (instead of stomping over other BitArray's data), and you don't have to clear any bits except newly-allocated words. Assuming we want to keep dynamic-array-like behaviour for BitArray, that is. I think it's a good idea.Hmm. It looks like the code is already counting bits rather than words. So you should already have enough information to implement the correct solution (take the current length modulo the number of bits per word, and that tells you which bits in the last word are valid). T -- Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry
Jul 22 2014