digitalmars.D - Has AA a bad implementation?
- frame (5/14) Feb 12 2021 I see this is for performance but it also means that an
- welkam (4/19) Feb 12 2021 There can be slices that point to the memory owned by this array
- frame (9/12) Feb 12 2021 The array is overwritten with 0 so any pointers are lost anyway.
- Steven Schveighoffer (14/29) Feb 12 2021 There's a misunderstanding here. The bucket array is not the data
- frame (8/14) Feb 12 2021 No, I'm talking about those Buckets.
- Steven Schveighoffer (16/32) Feb 12 2021 Your numbers are a bit high, but I don't know what else you would
- Joseph Rushton Wakeling (11/14) Feb 12 2021 Do we necessarily want that shrinkage by default? If an AA has
- Steven Schveighoffer (10/24) Feb 12 2021 The shrinkage happens by default if you remove an element and it goes
- frame (9/25) Feb 12 2021 I know.
- Paul Backus (3/8) Feb 12 2021 Think of it this way: when you "clear" a room, you remove all of
- Steven Schveighoffer (28/55) Feb 13 2021 Yes, it's the same.
- Elronnd (5/8) Feb 16 2021 Removing all the keys will trigger resizes, freeing the memory.
Talking about this: aaA.d: 157void clear() pure nothrow { import core.stdc.string : memset; // clear all data, but don't change bucket array length memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) * Bucket.sizeof); deleted = used = 0; firstUsed = cast(uint) dim; }I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.
Feb 12 2021
On Friday, 12 February 2021 at 12:41:32 UTC, frame wrote:Talking about this: aaA.d: 157There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.void clear() pure nothrow { import core.stdc.string : memset; // clear all data, but don't change bucket array length memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) * Bucket.sizeof); deleted = used = 0; firstUsed = cast(uint) dim; }I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.
Feb 12 2021
On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0. In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.
Feb 12 2021
On 2/12/21 3:35 PM, frame wrote:On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here: https://github.com/dlang/druntime/blob/a026ea46088b369f1182c397e7e046f5fc950358/src/rt/aaA.d#L173 When you use clear, it removes the pointers to all the existing key/value pairs. Those data items are separate GC blocks. If you no longer have any pointers to them, they will be garbage collected. If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome. I don't think there's any current way to explicitly free all the items in the AA. Even aa.remove(key) is not going to free the item. -SteveThere can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0. In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.
Feb 12 2021
On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here:No, I'm talking about those Buckets.If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome.Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array. Yes, nulling helps to get rid of the Bucket list. Also destroy(). But that's not so intentional than using clear() - this is my point.
Feb 12 2021
On 2/12/21 4:56 PM, frame wrote:On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:Your numbers are a bit high, but I don't know what else you would expect. If you have an AA with 10000 elements, it's going to consume a multiple of that number in memory.There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here:No, I'm talking about those Buckets.If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome.Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array.Yes, nulling helps to get rid of the Bucket list. Also destroy(). But that's not so intentional than using clear() - this is my point.The intent of clear is to reset the array so I DON'T have to reallocate the bucket array. Before clear existed, you had to remove each element one at a time, and then it would resize down throwing away all that space. If your use case is to clear an aa to fill it up again, then clear is perfect for that. It also has implications if you have multiple references to the same AA. Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission. Other types in Phobos treat clear the same way (e.g. appender). I think you are just misunderstanding what clear is for. It means, reset but don't deallocate. -Steve
Feb 12 2021
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission.Do we necessarily want that shrinkage by default? If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again. For comparison, does `rehash` shrink the bucket array when the AA is non-empty but much, much smaller than the bucket array? I don't object to the possibility to shrink the bucket array where possible but it should probably require an explicit and unambiguous request.
Feb 12 2021
On 2/12/21 6:13 PM, Joseph Rushton Wakeling wrote:On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:The shrinkage happens by default if you remove an element and it goes below a threshold. It does not shrink on clear (on purpose, for the reason you say).Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission.Do we necessarily want that shrinkage by default? If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again.For comparison, does `rehash` shrink the bucket array when the AA is non-empty but much, much smaller than the bucket array?Yes. If it has one element, then it will resize the array.I don't object to the possibility to shrink the bucket array where possible but it should probably require an explicit and unambiguous request.I think it already does shrink for a rehash. It's just set up to not shrink when the AA is "empty". I'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail. -Steve
Feb 12 2021
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:On 2/12/21 4:56 PM, frame wrote:The intent of clear is to reset the array so I DON'T have to reallocate the bucket array.I know.Other types in Phobos treat clear the same way (e.g. appender).Is this really the same? This looks more like clearing in Appender:void clear() trusted pure nothrow { if (_data) { _data.arr = _data.arr.ptr[0 .. 0]; } }I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.So it should be called reset() or blank() then.I'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail.I just don't like a widely used but leaky standard element that imply it does not. Better correct than error. But ok, it's not a big deal. Should be maybe documented that null can be useful.
Feb 12 2021
On Saturday, 13 February 2021 at 03:43:13 UTC, frame wrote:On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:Think of it this way: when you "clear" a room, you remove all of its current occupants, but you do not demolish the room itself.I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.So it should be called reset() or blank() then.
Feb 12 2021
On 2/12/21 10:43 PM, frame wrote:On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:Yes, it's the same. Appender uses knowledge of the array allocated length, the _data.arr slice is the current "valid" portion. Your request is the the bucket list is deallocated. Appender does not deallocate the array, as it still deliberately retains a pointer to the GC block.On 2/12/21 4:56 PM, frame wrote:Other types in Phobos treat clear the same way (e.g. appender).Is this really the same? This looks more like clearing in Appender:void clear() trusted pure nothrow { if (_data) { _data.arr = _data.arr.ptr[0 .. 0]; } }You are not the first to interpret the meaning of words differently. These are not winnable arguments. Just learn what it does now, not what you think it should do. And if you want a feature to do what you want, propose it under a different name. The name "clear" is not changing.I think you are just misunderstanding what clear is for. It means, reset but don't deallocate.So it should be called reset() or blank() then.Setting to null is useful, but only if you have one reference to the AA. It won't deallocate the buckets if you have more than one reference. e.g.: int[int] aa = [1 : 2, 3 : 4]; auto bb = aa; aa = null; assert(bb.length == 2); // still there aa = bb; aa.clear; assert(bb.length == 0); // all references are now empty, but bucket array is still fully there There is probably room to have a mechanism to destroy the buckets for all references. Looking at destroy, it looks like it's equivalent to setting equal to null. Which is consistent, but I'm surprised there's no "recursive destroy" type function, even for normal arrays. -SteveI'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail.I just don't like a widely used but leaky standard element that imply it does not. Better correct than error. But ok, it's not a big deal. Should be maybe documented that null can be useful.
Feb 13 2021
On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:I don't think there's any current way to explicitly free all the items in the AA. Even aa.remove(key) is not going to free the item.Removing all the keys will trigger resizes, freeing the memory. But doing that individually for each key incurs overhead. Why not expose such functionality?
Feb 16 2021