www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Associative array

reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
Hello there.

I'm wondering if anyone would be opposed to us/me making 
associative array having resize/reserve public?

Currently there exists private members in druntime etc for this 
but they are not exposed.

Basically I would like to be able to reserve n buckets before 
populating.

For reference, see std::unordered_map, which is kvp, which has 
rehashing etc:

```cpp
   template<typename _Key, typename _Tp,
	   typename _Hash = hash<_Key>,
	   typename _Pred = equal_to<_Key>,
	   typename _Alloc = allocator<std::pair<const _Key, _Tp>>>
     class unordered_map
     {
       typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc>  
_Hashtable;
       _Hashtable _M_h;
```

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.h
Oct 13 2023
next sibling parent bachmeier <no spam.net> writes:
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making 
 associative array having resize/reserve public?

 Currently there exists private members in druntime etc for this 
 but they are not exposed.

 Basically I would like to be able to reserve n buckets before 
 populating.

 For reference, see std::unordered_map, which is kvp, which has 
 rehashing etc:

 ```cpp
   template<typename _Key, typename _Tp,
 	   typename _Hash = hash<_Key>,
 	   typename _Pred = equal_to<_Key>,
 	   typename _Alloc = allocator<std::pair<const _Key, _Tp>>>
     class unordered_map
     {
       typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> 
  _Hashtable;
       _Hashtable _M_h;
 ```

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.h
It's something I've wanted. There was a discussion this summer: https://forum.dlang.org/post/u8fp23$2462$1 digitalmars.com
Oct 13 2023
prev sibling next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making 
 associative array having resize/reserve public?

 Currently there exists private members in druntime etc for this 
 but they are not exposed.

 Basically I would like to be able to reserve n buckets before 
 populating.
There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Oct 14 2023
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 14 October 2023 at 20:46:45 UTC, Nick Treleaven 
wrote:
 On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making 
 associative array having resize/reserve public?

 Currently there exists private members in druntime etc for 
 this but they are not exposed.

 Basically I would like to be able to reserve n buckets before 
 populating.
There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Interesting. I will read more about it. Pros and cons etc. Thanks
Oct 14 2023
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/14/23 4:46 PM, Nick Treleaven wrote:
 On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making associative 
 array having resize/reserve public?

 Currently there exists private members in druntime etc for this but 
 they are not exposed.

 Basically I would like to be able to reserve n buckets before populating.
There was a PR for that but it wasn't merged. It only reserved bucket space, not space for elements. See: https://github.com/dlang/druntime/pull/1929#issuecomment-336799304
Yeah, as identified by Andrei (and myself), reserving for an AA means reserving elements, not just buckets. It does not help to compare to other languages, because D's semantic requirements for AAs require that you can keep a reference to an element long after the AA is gone and deallocated (C++ says as soon as you remove an element, all pointers to it are invalid). The choices are bleak, especially without help from the GC. It's doable, but requires some mechanism that doesn't yet exist. I identified the options here: https://issues.dlang.org/show_bug.cgi?id=17881 -Steve
Oct 15 2023
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, October 13, 2023 5:32:29 AM MDT Imperatorn via Digitalmars-d wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making
 associative array having resize/reserve public?

 Currently there exists private members in druntime etc for this
 but they are not exposed.

 Basically I would like to be able to reserve n buckets before
 populating.

 For reference, see std::unordered_map, which is kvp, which has
 rehashing etc:

 ```cpp
    template<typename _Key, typename _Tp,
      typename _Hash = hash<_Key>,
      typename _Pred = equal_to<_Key>,
      typename _Alloc = allocator<std::pair<const _Key, _Tp>>>
      class unordered_map
      {
        typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc>
 _Hashtable;
        _Hashtable _M_h;
 ```

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/
 unordered_map.h

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/
 hashtable.h
My main concern here would be that we not add calls which relate to how AAs are currently implemented but which may not relate to how they're implemented in the future if/when we make changes, and I'm inclined to think that anything to do with buckets is an implementation detail, making reserving not something that users should really be doing. It _might_ make sense to have a general reserve function that says that you're planning to insert X number of elements, allowing the AA to pre-allocate some stuff based on that information, but I don't know how much sense that really makes given that what that would mean for a given AA implementation could differ quite a bit. If rehashing is what you want, we already have that: https://dlang.org/spec/hash-map.html#properties Ultimately though, I think that we want to be careful about how much we expose of the AA, since that's going to restrict what we can do to improve it in the future, and if someone wants to do much a tweaking, IMHO, a user-defined type would make more sense than the language-provided AAs. Steven would likely have a much better idea of how much sense this idea makes given that he's been working on the AA stuff and is familiar with some of the changes that have been made to it over time. - Jonathan M Davis
Oct 14 2023
next sibling parent bachmeier <no spam.net> writes:
On Sunday, 15 October 2023 at 00:14:39 UTC, Jonathan M Davis 
wrote:

 My main concern here would be that we not add calls which 
 relate to how AAs are currently implemented but which may not 
 relate to how they're implemented in the future if/when we make 
 changes, and I'm inclined to think that anything to do with 
 buckets is an implementation detail, making reserving not 
 something that users should really be doing. It _might_ make 
 sense to have a general reserve function that says that you're 
 planning to insert X number of elements, allowing the AA to 
 pre-allocate some stuff based on that information, but I don't 
 know how much sense that really makes given that what that 
 would mean for a given AA implementation could differ quite a 
 bit.

 If rehashing is what you want, we already have that:

 https://dlang.org/spec/hash-map.html#properties

 Ultimately though, I think that we want to be careful about how 
 much we expose of the AA, since that's going to restrict what 
 we can do to improve it in the future, and if someone wants to 
 do much a tweaking, IMHO, a user-defined type would make more 
 sense than the language-provided AAs.

 Steven would likely have a much better idea of how much sense 
 this idea makes given that he's been working on the AA stuff 
 and is familiar with some of the changes that have been made to 
 it over time.

 - Jonathan M Davis
Someone wanting to work on this could build on the EMSI hashmap: https://github.com/dlang-community/containers/blob/master/src/ ontainers/hashmap.d That would leave it as a library solution rather than messing with the AAs in the language.
Oct 14 2023
prev sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 15 October 2023 at 00:14:39 UTC, Jonathan M Davis 
wrote:
 On Friday, October 13, 2023 5:32:29 AM MDT Imperatorn via 
 Digitalmars-d wrote:
 [...]
My main concern here would be that we not add calls which relate to how AAs are currently implemented but which may not relate to how they're implemented in the future if/when we make changes, and I'm inclined to think that anything to do with buckets is an implementation detail, making reserving not something that users should really be doing. It _might_ make sense to have a general reserve function that says that you're planning to insert X number of elements, allowing the AA to pre-allocate some stuff based on that information, but I don't know how much sense that really makes given that what that would mean for a given AA implementation could differ quite a bit. [...]
I think you're basically right. But what could be done is to instead of reserving n buckets like the other languages do, maybe we could (if gc is used in this example) make a calculated gc reserve. But, I've taken a quick look at 3-4 other languages, and they basically reserve buckets.
Oct 14 2023
parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 15 October 2023 at 06:56:25 UTC, Imperatorn wrote:
 I think you're basically right. But what could be done is to 
 instead of reserving n buckets like the other languages do, 
 maybe we could (if gc is used in this example) make a 
 calculated gc reserve.

 But, I've taken a quick look at 3-4 other languages, and they 
 basically reserve buckets.
In java for example you can list initial map size in elements not buckets: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#HashMap-int- If bucket reservation is added, it should be heavily emphasized that method is implementation specific, and no guarantees that would be available in alternative implementations. I.e. it should not be part of public AA interface. Best regards, Alexandru.
Oct 15 2023
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 15 October 2023 at 09:06:30 UTC, Alexandru Ermicioi 
wrote:
 On Sunday, 15 October 2023 at 06:56:25 UTC, Imperatorn wrote:
 I think you're basically right. But what could be done is to 
 instead of reserving n buckets like the other languages do, 
 maybe we could (if gc is used in this example) make a 
 calculated gc reserve.

 But, I've taken a quick look at 3-4 other languages, and they 
 basically reserve buckets.
In java for example you can list initial map size in elements not buckets: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#HashMap-int- If bucket reservation is added, it should be heavily emphasized that method is implementation specific, and no guarantees that would be available in alternative implementations. I.e. it should not be part of public AA interface. Best regards, Alexandru.
Agreed
Oct 15 2023
prev sibling parent Witold <witold.baryluk+dlang gmail.com> writes:
On Friday, 13 October 2023 at 11:32:29 UTC, Imperatorn wrote:
 Hello there.

 I'm wondering if anyone would be opposed to us/me making 
 associative array having resize/reserve public?

 Currently there exists private members in druntime etc for this 
 but they are not exposed.

 Basically I would like to be able to reserve n buckets before 
 populating.

 For reference, see std::unordered_map, which is kvp, which has 
 rehashing etc:

 ```cpp
   template<typename _Key, typename _Tp,
 	   typename _Hash = hash<_Key>,
 	   typename _Pred = equal_to<_Key>,
 	   typename _Alloc = allocator<std::pair<const _Key, _Tp>>>
     class unordered_map
     {
       typedef __umap_hashtable<_Key, _Tp, _Hash, _Pred, _Alloc> 
  _Hashtable;
       _Hashtable _M_h;
 ```

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h

 https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/hashtable.h
In my opinion C++ / STL containers in `std` are pretty meh, and one probably should not be guided by them. unordered_map design and requirements in particular are bad. I like `reserve`, but I would say: show real world example, and benchmark (not microbenchmark, but actual app), where `reserve` helped (i.e. it is also easy to use reserve and actually make code / memory / cpu usage worse). Then maybe. For example, `reserve` on tree based associative array, does not really make sense. If you need `reserve`, maybe you need to not relay on any specific AA implementation provided by D runtime by default, but rather use library or own solution, that you control over time. Otherwise, change in a compiler or runtime, could obliterate any performance improvements you had before. (i.e. what if it was critical to performance, and at next dmd version, `reserve` started just being ignored, then you are in a big trouble).
Oct 26 2023