digitalmars.D - Associative array
- Imperatorn (22/22) Oct 13 2023 Hello there.
- bachmeier (3/25) Oct 13 2023 It's something I've wanted. There was a discussion this summer:
- Nick Treleaven (4/11) Oct 14 2023 There was a PR for that but it wasn't merged. It only reserved
- Imperatorn (3/17) Oct 14 2023 Interesting. I will read more about it. Pros and cons etc. Thanks
- Steven Schveighoffer (11/27) Oct 15 2023 Yeah, as identified by Andrei (and myself), reserving for an AA means
- Jonathan M Davis (21/45) Oct 14 2023 My main concern here would be that we not add calls which relate to how ...
- bachmeier (4/28) Oct 14 2023 Someone wanting to work on this could build on the EMSI hashmap:
- Imperatorn (8/24) Oct 14 2023 I think you're basically right. But what could be done is to
- Alexandru Ermicioi (10/16) Oct 15 2023 In java for example you can list initial map size in elements not
- Imperatorn (3/21) Oct 15 2023 Agreed
- Witold (17/39) Oct 26 2023 In my opinion C++ / STL containers in `std` are pretty meh, and
Hello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating. For reference, see std::unordered_map, which is kvp, which has rehashing etc: ```cpp template<typename _Key, typename _Tp, typename _Hash = hash<_Key>, typename _Pred = equal_to<_Key>, typename _Alloc = allocator<std::pair<const _Key, _Tp>>> class unordered_map { typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> _Hashtable; _Hashtable _M_h; ``` https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.h
Oct 13 2023
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:Hello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating. For reference, see std::unordered_map, which is kvp, which has rehashing etc: ```cpp template<typename _Key, typename _Tp, typename _Hash = hash<_Key>, typename _Pred = equal_to<_Key>, typename _Alloc = allocator<std::pair<const _Key, _Tp>>> class unordered_map { typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> _Hashtable; _Hashtable _M_h; ``` https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.hIt's something I've wanted. There was a discussion this summer: https://forum.dlang.org/post/u8fp23$2462$1 digitalmars.com
Oct 13 2023
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:Hello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating.There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Oct 14 2023
On Saturday, 14 October 2023 at 20:46:45 UTC, Nick Treleaven wrote:On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:Interesting. I will read more about it. Pros and cons etc. ThanksHello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating.There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Oct 14 2023
On 10/14/23 4:46 PM, Nick Treleaven wrote:On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:Yeah, as identified by Andrei (and myself), reserving for an AA means reserving elements, not just buckets. It does not help to compare to other languages, because D's semantic requirements for AAs require that you can keep a reference to an element long after the AA is gone and deallocated (C++ says as soon as you remove an element, all pointers to it are invalid). The choices are bleak, especially without help from the GC. It's doable, but requires some mechanism that doesn't yet exist. I identified the options here: https://issues.dlang.org/show_bug.cgi?id=17881 -SteveHello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating.There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Oct 15 2023
On Friday, October 13, 2023 5:32:29 AM MDT Imperatorn via Digitalmars-d wrote:Hello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating. For reference, see std::unordered_map, which is kvp, which has rehashing etc: ```cpp template<typename _Key, typename _Tp, typename _Hash = hash<_Key>, typename _Pred = equal_to<_Key>, typename _Alloc = allocator<std::pair<const _Key, _Tp>>> class unordered_map { typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> _Hashtable; _Hashtable _M_h; ``` https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/ unordered_map.h https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/ hashtable.hMy main concern here would be that we not add calls which relate to how AAs are currently implemented but which may not relate to how they're implemented in the future if/when we make changes, and I'm inclined to think that anything to do with buckets is an implementation detail, making reserving not something that users should really be doing. It _might_ make sense to have a general reserve function that says that you're planning to insert X number of elements, allowing the AA to pre-allocate some stuff based on that information, but I don't know how much sense that really makes given that what that would mean for a given AA implementation could differ quite a bit. If rehashing is what you want, we already have that: https://dlang.org/spec/hash-map.html#properties Ultimately though, I think that we want to be careful about how much we expose of the AA, since that's going to restrict what we can do to improve it in the future, and if someone wants to do much a tweaking, IMHO, a user-defined type would make more sense than the language-provided AAs. Steven would likely have a much better idea of how much sense this idea makes given that he's been working on the AA stuff and is familiar with some of the changes that have been made to it over time. - Jonathan M Davis
Oct 14 2023
On Sunday, 15 October 2023 at 00:14:39 UTC, Jonathan M Davis wrote:My main concern here would be that we not add calls which relate to how AAs are currently implemented but which may not relate to how they're implemented in the future if/when we make changes, and I'm inclined to think that anything to do with buckets is an implementation detail, making reserving not something that users should really be doing. It _might_ make sense to have a general reserve function that says that you're planning to insert X number of elements, allowing the AA to pre-allocate some stuff based on that information, but I don't know how much sense that really makes given that what that would mean for a given AA implementation could differ quite a bit. If rehashing is what you want, we already have that: https://dlang.org/spec/hash-map.html#properties Ultimately though, I think that we want to be careful about how much we expose of the AA, since that's going to restrict what we can do to improve it in the future, and if someone wants to do much a tweaking, IMHO, a user-defined type would make more sense than the language-provided AAs. Steven would likely have a much better idea of how much sense this idea makes given that he's been working on the AA stuff and is familiar with some of the changes that have been made to it over time. - Jonathan M DavisSomeone wanting to work on this could build on the EMSI hashmap: https://github.com/dlang-community/containers/blob/master/src/ ontainers/hashmap.d That would leave it as a library solution rather than messing with the AAs in the language.
Oct 14 2023
On Sunday, 15 October 2023 at 00:14:39 UTC, Jonathan M Davis wrote:On Friday, October 13, 2023 5:32:29 AM MDT Imperatorn via Digitalmars-d wrote:I think you're basically right. But what could be done is to instead of reserving n buckets like the other languages do, maybe we could (if gc is used in this example) make a calculated gc reserve. But, I've taken a quick look at 3-4 other languages, and they basically reserve buckets.[...]My main concern here would be that we not add calls which relate to how AAs are currently implemented but which may not relate to how they're implemented in the future if/when we make changes, and I'm inclined to think that anything to do with buckets is an implementation detail, making reserving not something that users should really be doing. It _might_ make sense to have a general reserve function that says that you're planning to insert X number of elements, allowing the AA to pre-allocate some stuff based on that information, but I don't know how much sense that really makes given that what that would mean for a given AA implementation could differ quite a bit. [...]
Oct 14 2023
On Sunday, 15 October 2023 at 06:56:25 UTC, Imperatorn wrote:I think you're basically right. But what could be done is to instead of reserving n buckets like the other languages do, maybe we could (if gc is used in this example) make a calculated gc reserve. But, I've taken a quick look at 3-4 other languages, and they basically reserve buckets.In java for example you can list initial map size in elements not buckets: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#HashMap-int- If bucket reservation is added, it should be heavily emphasized that method is implementation specific, and no guarantees that would be available in alternative implementations. I.e. it should not be part of public AA interface. Best regards, Alexandru.
Oct 15 2023
On Sunday, 15 October 2023 at 09:06:30 UTC, Alexandru Ermicioi wrote:On Sunday, 15 October 2023 at 06:56:25 UTC, Imperatorn wrote:AgreedI think you're basically right. But what could be done is to instead of reserving n buckets like the other languages do, maybe we could (if gc is used in this example) make a calculated gc reserve. But, I've taken a quick look at 3-4 other languages, and they basically reserve buckets.In java for example you can list initial map size in elements not buckets: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#HashMap-int- If bucket reservation is added, it should be heavily emphasized that method is implementation specific, and no guarantees that would be available in alternative implementations. I.e. it should not be part of public AA interface. Best regards, Alexandru.
Oct 15 2023
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:Hello there. I'm wondering if anyone would be opposed to us/me making associative array having resize/reserve public? Currently there exists private members in druntime etc for this but they are not exposed. Basically I would like to be able to reserve n buckets before populating. For reference, see std::unordered_map, which is kvp, which has rehashing etc: ```cpp template<typename _Key, typename _Tp, typename _Hash = hash<_Key>, typename _Pred = equal_to<_Key>, typename _Alloc = allocator<std::pair<const _Key, _Tp>>> class unordered_map { typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> _Hashtable; _Hashtable _M_h; ``` https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.hIn my opinion C++ / STL containers in `std` are pretty meh, and one probably should not be guided by them. unordered_map design and requirements in particular are bad. I like `reserve`, but I would say: show real world example, and benchmark (not microbenchmark, but actual app), where `reserve` helped (i.e. it is also easy to use reserve and actually make code / memory / cpu usage worse). Then maybe. For example, `reserve` on tree based associative array, does not really make sense. If you need `reserve`, maybe you need to not relay on any specific AA implementation provided by D runtime by default, but rather use library or own solution, that you control over time. Otherwise, change in a compiler or runtime, could obliterate any performance improvements you had before. (i.e. what if it was critical to performance, and at next dmd version, `reserve` started just being ignored, then you are in a big trouble).
Oct 26 2023