www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Containers

reply deadalnix <deadalnix gmail.com> writes:
I want to start working on a container library. Interestingly 
enough, I don't think we have what we need on that front.

I'll describe here what are the design I intend to go for, and 
the rationales for it, but before, there is an unsolved problem 
that I don't know how to work around that is somewhat of a 
blocker. If one of you guys have an idea how to solve it, I'm 
taking. If I'm being honest, this is probably the most important 
problem that need to be solved for D right now. That problem is 
turtling down type qualifiers.

For instance, if we have a Vector!T, then we want it to convert 
to a Vector!(const T) and we want const(Vector!T) == 
const(Vector!(const T)), all of it implicitly. Same goes for 
other qualifiers.

To put it bluntly, I have no idea how to achieve this. And I 
don't think we'll have a good set of collection before this is 
sorted out.

Now onto the design:
  - Value types. It is very easy to turn a value into a reference 
type, not so much the other way around. reference type also come 
with a bag of edges cases, like the collection being null vs 
empty, that cause confusion and problems in practice.
  - COW. Value type collection can be very expensive to work with 
if the eagerly copy everything. On the other hand, COW ensures 
copies are cheap. There are also very common use case that are 
well served by COW collections, for instance, when one wants 
value types for collection that are infrequently changes - but 
you don't know or can't prove that they won't.
  - Small collection features. As every engineer, but no 
mathematician will tell you, is that most numbers are small. It 
is also true of collection sizes. In order to speed things up for 
collection that are expected to be small, being able to specify 
Vector!(T, 8) to reserve 8 slots for elements, in the base 
collection, and only allocate on heap if the collection size 
grows past this. LLVM has a collection library doing so and it is 
the most useful thing in the world (see 
https://www.youtube.com/watch?v=vElZc6zSIXM for more details).
  - Not expose much of its internals. This like iterators being 
raw pointers (or the result of using the in operator) for 
instance, impose constraint on the internal that almost always 
end up costing a ton of perf as new techniques are discovered 
(recent exemples include open addressing with SSE probing for 
instance). All of this needs to be wrapped.

I have a good idea how to solve all these problems. I have no 
idea how to solve the type qualifier one. I need help.
Aug 31 2021
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 01/09/2021 5:21 AM, deadalnix wrote:
   - Not expose much of its internals. This like iterators being raw 
 pointers (or the result of using the in operator) for instance, impose 
 constraint on the internal that almost always end up costing a ton of 
 perf as new techniques are discovered (recent exemples include open 
 addressing with SSE probing for instance). All of this needs to be wrapped.
Something I'm doing with my own containers is to create a "custom reference" around the internals. ``` struct Thing { this(RCISharedAllocator allocator) { this.impl = allocator.make!ThingImpl(allocator); } this(ref Thing other) { this.impl = other.impl; if (this.impl !is null) atomicOp!"+="(this.impl.refCount, 1); } ~this() { if (this.impl !is null) { if (atomicOp!"-="(this.impl.refCount, 1) == 0) { RCISharedAllocator allocator = this.impl.allocator; allocator.dispose(this.impl); } } } safe: bool isNull() { return impl is null; } private: ThingImpl* impl; } private: struct ThingImpl { safe: RCISharedAllocator allocator; shared(size_t) refCount; } ``` I've found this to work quite well.
Aug 31 2021
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 For instance, if we have a Vector!T, then we want it to convert 
 to a Vector!(const T) and we want const(Vector!T) == 
 const(Vector!(const T)), all of it implicitly. Same goes for 
 other qualifiers.
As far as I know, the best approach anyone's come up with for this is the `headMutable` pattern described in this blog post: https://dlang.org/blog/2020/06/25/a-pattern-for-head-mutable-structures/ This gives us: * `Vector!T` -> `const(Vector!T)` (built in) * `const(Vector!T)` -> `Vector!(const(T))` (via `headMutable`) * `Vector!(const(T))` -> `const(Vector!(const(T)))` (built in) * `const(Vector!(const(T)))` -> `Vector!(const(T))` (via `headMutable`) The only conversion missing is * `const(Vector!(const(T)))` -> `const(Vector!T)` ...which would have to be implemented as another method (`tailUnqual`?). Of course many of these conversions are explicit, not implicit, but I don't think there's any way around that given D's current capabilities. The only user-defined implicit conversions in D are via inheritance and `alias this`, and neither can do what we need here.
  - COW. Value type collection can be very expensive to work 
 with if the eagerly copy everything. On the other hand, COW 
 ensures copies are cheap. There are also very common use case 
 that are well served by COW collections, for instance, when one 
 wants value types for collection that are infrequently changes 
 - but you don't know or can't prove that they won't.
This sounds similar to Clojure's immutable data structures. I agree these would be a good thing to have, although I think traditional mutable containers should also exist alongside them.
Aug 31 2021
parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 31 August 2021 at 18:11:04 UTC, Paul Backus wrote:
 This sounds similar to Clojure's immutable data structures. I 
 agree these would be a good thing to have, although I think 
 traditional mutable containers should also exist alongside them.
```d class RefContainer { Container c; } ``` Here you go :) There is obviously some boilerplate missing, but I'm sure you get the idea. One nice bonus is that you get a practically free clone method, that will only do so when you actually mutate things. You'll note that in practice, you don't need to clone the container every time you write, you need to keep a ref count and only clone when the number of references to the container is greater than 1. Lastly, my experience tells me that container being traditionally references was a mistakes. I have many scars to show for it. Thank you for the Head mutable article. I'll dig into this and see if I can make it work.
Aug 31 2021
parent deadalnix <deadalnix gmail.com> writes:
On Tuesday, 31 August 2021 at 18:24:54 UTC, deadalnix wrote:
 Thank you for the Head mutable article. I'll dig into this and 
 see if I can make it work.
So I read it and played with the concept a bit. If I'm honest, it fall very much short of where it needs to be. It is inconvenient - but workable - when using things by value, however, the shortcoming become very apparent when it is used as a reference, or within a more complex type construction. This similarly go for other proposed solution in the thread. I don't think we'll ever have a set of container that are as good as they should without this being nailed down.
Aug 31 2021
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I have a good idea how to solve all these problems. I have no 
 idea how to solve the type qualifier one. I need help.
Declare opAssign on Vector!(const T) that takes const Vector!T, then the assignment will compile.
Aug 31 2021
prev sibling next sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.

 I'll describe here what are the design I intend to go for, and 
 the rationales for it, but before, there is an unsolved problem 
 that I don't know how to work around that is somewhat of a 
 blocker. If one of you guys have an idea how to solve it, I'm 
 taking. If I'm being honest, this is probably the most 
 important problem that need to be solved for D right now. That 
 problem is turtling down type qualifiers.

 For instance, if we have a Vector!T, then we want it to convert 
 to a Vector!(const T) and we want const(Vector!T) == 
 const(Vector!(const T)), all of it implicitly. Same goes for 
 other qualifiers.
Converting from Vector!T to Vector!(const T) should be possible through copy constructors and opAssign overloads, same for const(Vector!T) to const(Vector!(const T)). having something like this should suffice for initialization for example: ``` static if (is(T == const)) { this(Vector!(Unqual!T) mutable) { this.array = mutable.array; // Or you can do a complete copy. } static if (is(this == const)) { // ... well case for const T and const this } } ```
 To put it bluntly, I have no idea how to achieve this. And I 
 don't think we'll have a good set of collection before this is 
 sorted out.

 Now onto the design:
  - Value types. It is very easy to turn a value into a 
 reference type, not so much the other way around.
Would be nice if even as value types different implementations say of a list would follow a well defined interface, same for other type of collections.
 reference type also come with a bag of edges cases, like the 
 collection being null vs empty, that cause confusion and 
 problems in practice.
Imho, there is no problem here. null pointer to collection class means no collection present, while empty collection is just an empty collection. There should not be any conflation between these notions, in general, and could be made clear in the documentation.
  - Not expose much of its internals. This like iterators being 
 raw pointers (or the result of using the in operator) for 
 instance, impose constraint on the internal that almost always 
 end up costing a ton of perf as new techniques are discovered 
 (recent exemples include open addressing with SSE probing for 
 instance). All of this needs to be wrapped.
It would be nice if the collection api would have rich and consistent api for working with it. I.e have common interfaces for lists, sets, queues and etc. that value and reference type collections implement. And that is not only about the list of methods they implement, but also the behavior they exhibit. Java for example has a good api for collections, from which your collection library can take inspiration, and ofc with D twist of features. I also tried poking around with a collection implementation, though not value types but ref types, and found it being problematic to implement, due to limitations of how safety annotations work. You basically cannot force the interface being safe/trusted/system based on element it contains without doing extensive compile time introspection... So if you plan to add ref wrappers this one should be also resolved somehow to have a safe lib, otherwise I'm afraid class wrappers will be only for system code. Best regards, Alexandru.
Aug 31 2021
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
Good initiative. I'd be happy to assist. I have a bunch of value-type containers (struct + dup) lying around [here](https://github.com/nordlow/phobos-next/tree/master/src/nxt) being ` safe pure nothrow nogc` whenever possible. It supports significantly more members than emsi-containers at least. It also tries to be dip-1000 compliant but there are like bugs lurking here and there. [Here](https://github.com/nordlow/phobos-next#containers)'s a likely incomplete list of the containers. Feel free to give me a ping if you need any guidance. You might find `nxt.trie` of special interest. Beware of its size, though. There's also a benchmark at benchmarks/containers you can try out via either benchmarks/containers/test-dmd-debug or benchmarks/containers/test-ldc-release . It benchmarks a subset of the containers together with D arrays and AAs. Somewhat related, I've also experimented with a run-time variant of Rust's ownership and borrowing rules at [borrown.d](https://github.com/nordlow/phobos-next/blob/master/src/nxt/borrown.d). Until later, Per
Aug 31 2021
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 August 2021 at 21:36:25 UTC, Per Nordlöw wrote:
 you can try out via either

     benchmarks/containers/test-dmd-debug

 or

     benchmarks/containers/test-ldc-release
Should be ./test-dmd-debug or ./test-ldc-release under `benchmarks/containers`.
Aug 31 2021
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 31 August 2021 at 21:36:25 UTC, Per Nordlöw wrote:
 On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
Good initiative. I'd be happy to assist. I have a bunch of value-type containers (struct + dup) lying around [here](https://github.com/nordlow/phobos-next/tree/master/src/nxt) being ` safe pure nothrow nogc` whenever possible. It supports significantly more members than emsi-containers at least. It also tries to be dip-1000 compliant but there are like bugs lurking here and there. [Here](https://github.com/nordlow/phobos-next#containers)'s a likely incomplete list of the containers. Feel free to give me a ping if you need any guidance. You might find `nxt.trie` of special interest. Beware of its size, though.
There is a ton of good work in there! Do you agree with the direction explicated? I expected to focus on Vector/Set/Map initially. This should serve most use case in practice.
Aug 31 2021
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 August 2021 at 23:24:57 UTC, deadalnix wrote:
 There is a ton of good work in there!
Yes, and I would really like it to come of use upstream. :)
 Do you agree with the direction explicated?
Yes. No objections afaict. All good points. - CoW: sounds like the future direction in other languages such as Swift so no objections on that matter: - Should we enforce CoW or make it opt-in via storages fed as template parameters? - Will you be supporting - Will you be using ranges or iterators? - What about allocators? If you do can you please use std.experimental.allocator instead of stdx.allocator (used by emsi-containers and libdparse and dsymbol) is more than 2 years old now. std.experimental.allocator has received much love since, for instance `allocateZeroed` for the allocators that support it. - An interesting emsi container that was new to me is `UnrolledList`. It's used extensively in dsymbol.
 I expected to focus on Vector/Set/Map initially. This should 
 serve most use case in practice.
You should probably use robin-hood hashing if you pick open-addressing. I'm not using that (yet) in my hash sets/maps. It has very good asymptotic complexity. You should probably have a look at the state-of-the-art robin-hood implementations in C++ aswell. To run my own C++-benchmark for containers you can do ./test under benchmarks/containers-cxx .
Sep 01 2021
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 10:12:23 UTC, Per Nordlöw wrote:
 - CoW: sounds like the future direction in other languages such 
 as Swift so no objections on that matter:
   - Should we enforce CoW or make it opt-in via storages fed as 
 template parameters?
I think COW should be the default. It is not difficult to build reference container from COW ones, so we probably don't need it to be optional, simply provide wrappr around the COW container to provide references ones.
   - Will you be supporting
I'm not sure what you mean here. If it is bugfixes and all, I don't have much problem with that. Container are fairly straightforward, so I don't expect the workload to be very high anyways.
 - Will you be using ranges or iterators?
This is D, we must provide ranges. But in operator and similar kind forces us to do both.
 - What about allocators? If you do can you please use 
 std.experimental.allocator instead of stdx.allocator (used by 
 emsi-containers and libdparse and dsymbol) is more than 2 years 
 old now. std.experimental.allocator has received much love 
 since, for instance `allocateZeroed` for the allocators that 
 support it.
I have to admit I have not put too much thought into this.
 - An interesting emsi container that was new to me is 
 `UnrolledList`. It's used extensively in dsymbol.
I have no idea what that is.
 I expected to focus on Vector/Set/Map initially. This should 
 serve most use case in practice.
You should probably use robin-hood hashing if you pick open-addressing.
Yes, that being said, I would like to not expose the implementation. PHP and python were able to move from a tree based approach to open addressing for dictionaries, for great benefit. They could do so because they didn't expose the innards of the container. There are very interesting approach leveraging vector capabilities of modern CPU such as F14: https://engineering.fb.com/2019/04/25/developer-tools/f14/ and exposing the innards would prevent moving to them later on.
Sep 01 2021
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 10:32:45 UTC, deadalnix wrote:
   - Will you be supporting
Ignore that. An editing mistake.
 I'm not sure what you mean here. If it is bugfixes and all, I 
 don't have much problem with that. Container are fairly 
 straightforward, so I don't expect the workload to be very high 
 anyways.
Sep 01 2021
prev sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 01/09/2021 10:32 PM, deadalnix wrote:
 - An interesting emsi container that was new to me is `UnrolledList`. 
 It's used extensively in dsymbol.
I have no idea what that is.
Unrolled linked list, a linked list with X number of entries per node. Very good for cache efficiency especially if you need append only.
Sep 01 2021
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 12:09:19 UTC, rikki cattermole 
wrote:
 Unrolled linked list, a linked list with X number of entries 
 per node.
Yes, where each node typically is sized to fit in a single cache-line. Suitable for small element types. I don't know why an UnrolledList is preferred over a traditional vector type bundled with an allocator that does the same grouping of allocations, though.
Sep 01 2021
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 02/09/2021 8:55 AM, Per Nordlöw wrote:
 On Wednesday, 1 September 2021 at 12:09:19 UTC, rikki cattermole wrote:
 Unrolled linked list, a linked list with X number of entries per node.
Yes, where each node typically is sized to fit in a single cache-line. Suitable for small element types. I don't know why an UnrolledList is preferred over a traditional vector type bundled with an allocator that does the same grouping of allocations, though.
All depends on how the vector is implemented. If its just an array that is resized as required with extra capacity, then obviously there is memory fragmentation issues that can result in allocation failing. But I think each have different memory patterns associated with it, so it may not be clear cut. For example I'm working on a color palette data structure using an unrolled linked list as the basis. Insertions, removals are both quite common here, but so is only appending. And of course quite importantly you want fairly fast skipping of entries (hence unrolled rather than a straight linked list) to find the one you want.
Sep 01 2021
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/31/21 1:21 PM, deadalnix wrote:
 I want to start working on a container library. Interestingly enough, I 
 don't think we have what we need on that front.
 
 I'll describe here what are the design I intend to go for, and the 
 rationales for it, but before, there is an unsolved problem that I don't 
 know how to work around that is somewhat of a blocker. If one of you 
 guys have an idea how to solve it, I'm taking. If I'm being honest, this 
 is probably the most important problem that need to be solved for D 
 right now. That problem is turtling down type qualifiers.
 
 For instance, if we have a Vector!T, then we want it to convert to a 
 Vector!(const T) and we want const(Vector!T) == const(Vector!(const T)), 
 all of it implicitly. Same goes for other qualifiers.
This is the crux of the tail-const problem. The language allows tail-const for only 2 types, T[] and T*. Everything else is fully const or fully mutable.
 
 To put it bluntly, I have no idea how to achieve this. And I don't think 
 we'll have a good set of collection before this is sorted out.
I had an idea a long time ago, but ashamedly never brought it to fruition. But long story short, I think we need a way to specify "tail" for modifiers, and allow implicitly casting the head. Then you'd have `tailconst(Vector!int)`, and you are good. I actually had a decent syntax for it, but I'm not sure it would fly anyway.
 
 Now onto the design:
   - Value types. It is very easy to turn a value into a reference type, 
 not so much the other way around. reference type also come with a bag of 
 edges cases, like the collection being null vs empty, that cause 
 confusion and problems in practice.
Null vs. empty isn't a problem for non-class reference types. Of course, without tail-const, you are kinda screwed.
   - COW. Value type collection can be very expensive to work with if the 
 eagerly copy everything. On the other hand, COW ensures copies are 
 cheap. There are also very common use case that are well served by COW 
 collections, for instance, when one wants value types for collection 
 that are infrequently changes - but you don't know or can't prove that 
 they won't.
I don't agree that COW is a good idea for generic collections. IIRC, C++ tried this with std::string and it was a disaster.
   - Small collection features. As every engineer, but no mathematician 
 will tell you, is that most numbers are small. It is also true of 
 collection sizes. In order to speed things up for collection that are 
 expected to be small, being able to specify Vector!(T, 8) to reserve 8 
 slots for elements, in the base collection, and only allocate on heap if 
 the collection size grows past this. LLVM has a collection library doing 
 so and it is the most useful thing in the world (see 
 https://www.youtube.com/watch?v=vElZc6zSIXM for more details).
I think this is a great idea.
   - Not expose much of its internals. This like iterators being raw 
 pointers (or the result of using the in operator) for instance, impose 
 constraint on the internal that almost always end up costing a ton of 
 perf as new techniques are discovered (recent exemples include open 
 addressing with SSE probing for instance). All of this needs to be wrapped.
Feel free to copy any of my ideas from dcollections for this. See https://github.com/schveiguy/dcollections/blob/master/concepts.txt Also note that Dcollections were strictly reference types, but had struct-based implementations (which could actually be swapped out with a different implementation if anyone had ever developed any). It came in handy in some cases, like the hash bucket collision linked list was just the implementation for the standard linked list.
 I have a good idea how to solve all these problems. I have no idea how 
 to solve the type qualifier one. I need help.
Feel free to ping, maybe on slack. We can discuss the tail-const problem. -Steve
Aug 31 2021
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer 
wrote:
 To put it bluntly, I have no idea how to achieve this. And I 
 don't think we'll have a good set of collection before this is 
 sorted out.
I had an idea a long time ago, but ashamedly never brought it to fruition. But long story short, I think we need a way to specify "tail" for modifiers, and allow implicitly casting the head. Then you'd have `tailconst(Vector!int)`, and you are good. I actually had a decent syntax for it, but I'm not sure it would fly anyway.
One problem I can forsee with this approach is that, from the compiler's point of view, there is not necessarily any predictable relationship between `qualifier(Template!T)` and `Template!(qualifier(T))`. For all it knows, you could have written code like the following: ```d struct Vector(T) { static if (is(T == const)) { // do one thing } else { // do something completely different } } ``` Maybe in some restricted cases it could examine the uninstantiated template, determine that there's no funny business going on, and conclude that the implicit conversion is valid--but this kind of heuristic-based approach tends to be very brittle and compose poorly (exhibit A: [issue 1807][1]). In general, if we want the compiler to know that a conversion from one template instantiation to another is valid, we are going to have to [prove it by construction][2]--which is to say, by writing a function (like a constructor or an `opCast` overload) that actually performs the conversion. [1]: https://issues.dlang.org/show_bug.cgi?id=1807 [2]: https://en.wikipedia.org/wiki/Constructive_proof
Aug 31 2021
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 1 September 2021 at 00:26:33 UTC, Paul Backus wrote:
 [snip]

 One problem I can forsee with this approach is that, from the 
 compiler's point of view, there is not necessarily any 
 predictable relationship between `qualifier(Template!T)` and 
 `Template!(qualifier(T))`. For all it knows, you could have 
 written code like the following:

 ```d
 struct Vector(T)
 {
     static if (is(T == const)) {
         // do one thing
     } else {
         // do something completely different
     }
 }
 ```

 Maybe in some restricted cases it could examine the 
 uninstantiated template, determine that there's no funny 
 business going on, and conclude that the implicit conversion is 
 valid--but this kind of heuristic-based approach tends to be 
 very brittle and compose poorly (exhibit A: [issue 1807][1]).

 In general, if we want the compiler to know that a conversion 
 from one template instantiation to another is valid, we are 
 going to have to [prove it by construction][2]--which is to 
 say, by writing a function (like a constructor or an `opCast` 
 overload) that actually performs the conversion.

 [1]: https://issues.dlang.org/show_bug.cgi?id=1807
 [2]: https://en.wikipedia.org/wiki/Constructive_proof
I think you make some good points here. My recollection is that opHeadMutable [1] was intended as a way to inform the compiler that the conversion is valid. On issue 1807, as much as I want that enhancement, I just hope that there is a general solution to resolve it and not just a heuristic-based one. [1] https://forum.dlang.org/post/cpxfgdmklgusodqouqdr forum.dlang.org
Aug 31 2021
parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 1 September 2021 at 01:38:49 UTC, jmh530 wrote:
 My recollection is that opHeadMutable [1] was intended as a way 
 to inform the compiler that the conversion is valid.

 [1] 
 https://forum.dlang.org/post/cpxfgdmklgusodqouqdr forum.dlang.org
It looks like `opHeadMutable` is basically a restricted form of "implicit `opCast`"--that is, rather than allowing the programmer to define *arbitrary* implicit conversions, it lets them define implicit conversions of the form `qual(Template!(Args))` -> `Template!(qual(Args))`. IMO it is a compromise that both sides would find unsatisfying. Those who want convenient implicit conversions will not be happy with it because there are important cases it does not cover (e.g. `qual(Template!(qual(T)))` -> `qual(Template!T)`), but it is still flexible enough that anyone opposed to user-defined implicit conversions in general will likely find it hard to stomach.
Aug 31 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/31/21 8:26 PM, Paul Backus wrote:
 On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer wrote:
 To put it bluntly, I have no idea how to achieve this. And I don't 
 think we'll have a good set of collection before this is sorted out.
I had an idea a long time ago, but ashamedly never brought it to fruition. But long story short, I think we need a way to specify "tail" for modifiers, and allow implicitly casting the head. Then you'd have `tailconst(Vector!int)`, and you are good. I actually had a decent syntax for it, but I'm not sure it would fly anyway.
One problem I can forsee with this approach is that, from the compiler's point of view, there is not necessarily any predictable relationship between `qualifier(Template!T)` and `Template!(qualifier(T))`.
It's possible to forward, i.e.: ```d template Vector(T) if (isConst!T) { alias Vector = tailconst(.Vector!(Unqual!T)); } ``` If you look at arrays, `const(T)[]` is the same as `tailconst(T[])`. This would be similar. The problem I have with any other approach is that the compiler doesn't understand the relationships, and you will run into weird corner cases. It can't correctly add or strip the qualifier to the correct members when calling compatible functions. -Steve
Sep 01 2021
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 1 September 2021 at 11:04:41 UTC, Steven 
Schveighoffer wrote:
 It's possible to forward, i.e.:

 ```d
 template Vector(T) if (isConst!T) { alias Vector = 
 tailconst(.Vector!(Unqual!T)); }
 ```

 If you look at arrays, `const(T)[]` is the same as 
 `tailconst(T[])`. This would be similar.
This runs directly into the issue 1807 problem. The compiler has to recognize this specific pattern as meaningful, and if the programmer writes something *semantically* equivalent using a different pattern, their code will mysteriously break; e.g., ```d alias TailConstOf(T) = tailconst(T); alias Vector(T : const(U)) = TailConstOf!(.Vector!U); ```
 The problem I have with any other approach is that the compiler 
 doesn't understand the relationships, and you will run into 
 weird corner cases. It can't correctly add or strip the 
 qualifier to the correct members when calling compatible 
 functions.
With the implicit `opCast`/constructor approach (or the `opHeadMutable` approach), the compiler doesn't have to understand the relationships. It just needs to know the original type and the desired type, and call the appropriate user-defined function to perform the conversion.
Sep 01 2021
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/1/21 7:57 AM, Paul Backus wrote:
 On Wednesday, 1 September 2021 at 11:04:41 UTC, Steven Schveighoffer wrote:
 It's possible to forward, i.e.:

 ```d
 template Vector(T) if (isConst!T) { alias Vector = 
 tailconst(.Vector!(Unqual!T)); }
 ```

 If you look at arrays, `const(T)[]` is the same as `tailconst(T[])`. 
 This would be similar.
This runs directly into the issue 1807 problem. The compiler has to recognize this specific pattern as meaningful, and if the programmer writes something *semantically* equivalent using a different pattern, their code will mysteriously break; e.g., ```d alias TailConstOf(T) = tailconst(T); alias Vector(T : const(U)) = TailConstOf!(.Vector!U); ```
Right, we have to fix that problem anyway. To further explain for readers, the issue is: ```d void foo(T)(Vector!(const(T)) arg); ``` This should accept a `Vector!(const int)` which is aliased as `tailconst(Vector!int)`, but the compiler can't see how to infer `T` as `int` from `tailconst(Vector!int)`. I think the implicit cast solution suffers from similar problems. It works fine if you use `const(T)[]`, which is the standard we should strive for. This is one of those longstanding issues that needs addressing.
 
 With the implicit `opCast`/constructor approach (or the `opHeadMutable` 
 approach), the compiler doesn't have to understand the relationships. It 
 just needs to know the original type and the desired type, and call the 
 appropriate user-defined function to perform the conversion.
It just means you need to handle all the implicit conversions, which is not trivial. The compiler already has rules for implicit const conversions, some of which have subtle pitfalls if you get them wrong. I'd rather utilize those mechanisms rather than relying on each type author to get them all right. -Steve
Sep 01 2021
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 1 September 2021 at 12:32:36 UTC, Steven 
Schveighoffer wrote:
 With the implicit `opCast`/constructor approach (or the 
 `opHeadMutable` approach), the compiler doesn't have to 
 understand the relationships. It just needs to know the 
 original type and the desired type, and call the appropriate 
 user-defined function to perform the conversion.
It just means you need to handle all the implicit conversions, which is not trivial. The compiler already has rules for implicit const conversions, some of which have subtle pitfalls if you get them wrong. I'd rather utilize those mechanisms rather than relying on each type author to get them all right.
As long as the type author does not use unsafe casts, the compiler will prevent them from getting the conversions wrong.
Sep 01 2021
prev sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 1 September 2021 at 12:32:36 UTC, Steven 
Schveighoffer wrote:
 To further explain for readers, the issue is:

 ```d
 void foo(T)(Vector!(const(T)) arg);
 ```
This compiles: ```d struct Vector(T){} void f(T:const(U),U)(Vector!T arg); void f1() { Vector!(const int) a; f(a); } ```
Sep 02 2021
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 11:57:01 UTC, Paul Backus wrote:
 With the implicit `opCast`/constructor approach (or the 
 `opHeadMutable` approach), the compiler doesn't have to 
 understand the relationships. It just needs to know the 
 original type and the desired type, and call the appropriate 
 user-defined function to perform the conversion.
It still wouldn't work when manipulating things by reference. In fact, the litmus test for a solution to that problem should be: can we reproduce the behavior of builtin types such as T* or T[] with it?
Sep 01 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 1 September 2021 at 12:52:47 UTC, deadalnix wrote:
 On Wednesday, 1 September 2021 at 11:57:01 UTC, Paul Backus 
 wrote:
 With the implicit `opCast`/constructor approach (or the 
 `opHeadMutable` approach), the compiler doesn't have to 
 understand the relationships. It just needs to know the 
 original type and the desired type, and call the appropriate 
 user-defined function to perform the conversion.
It still wouldn't work when manipulating things by reference. In fact, the litmus test for a solution to that problem should be: can we reproduce the behavior of builtin types such as T* or T[] with it?
It doesn't work when manipulating built-in types by reference either: ```d void manipulate(ref const(int)[] arr) {} void main() { const(int[]) arr; manipulate(arr); // Error: cannot pass argument `arr` of type `const(int[])` // to parameter `ref const(int)[] arr` } ```
Sep 01 2021
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 13:22:46 UTC, Paul Backus wrote:
 It doesn't work when manipulating built-in types by reference 
 either:

 ```d
 void manipulate(ref const(int)[] arr) {}

 void main()
 {
     const(int[]) arr;
     manipulate(arr);
     // Error: cannot pass argument `arr` of type `const(int[])`
     // to parameter `ref const(int)[] arr`
 }
 ```
No you right, this is not what I meant (but this is def what I wrote). I meant that int[][] or int[]* turtle the qualifier down.
Sep 01 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 1 September 2021 at 18:23:06 UTC, deadalnix wrote:
 No you right, this is not what I meant (but this is def what I 
 wrote).

 I meant that int[][] or int[]* turtle the qualifier down.
Are you sure? ```d void manipulate(const(int)[][] arr) {} void main() { const(int[][]) arr; manipulate(arr); // Error: cannot pass argument `arr` of type `const(int[][])` // to parameter `const(int)[][] arr` } ```
Sep 01 2021
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 18:32:29 UTC, Paul Backus wrote:
 On Wednesday, 1 September 2021 at 18:23:06 UTC, deadalnix wrote:
 No you right, this is not what I meant (but this is def what I 
 wrote).

 I meant that int[][] or int[]* turtle the qualifier down.
Are you sure? ```d void manipulate(const(int)[][] arr) {} void main() { const(int[][]) arr; manipulate(arr); // Error: cannot pass argument `arr` of type `const(int[][])` // to parameter `const(int)[][] arr` } ```
Yes, const(int[][]) is effectively const(const(const(int)[])[]). In the same way, you want const(Vector!int[]) to be const(const(Vector!(const(int))[]). Which means that you should be able to convert implicitly a reference to a Vector!int into a reference to a Vector!(const(int)). In your example, you remove const qualifiers, so it obviously doesn't work. But it work in the other direction, when you add them. It won't for a collection even with an implicit cast operation, as what is really desired is some kind of reinterpret_cast.
Sep 01 2021
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 11:04:41 UTC, Steven 
Schveighoffer wrote:
 It's possible to forward, i.e.:

 ```d
 template Vector(T) if (isConst!T) { alias Vector = 
 tailconst(.Vector!(Unqual!T)); }
 ```

 If you look at arrays, `const(T)[]` is the same as 
 `tailconst(T[])`. This would be similar.

 The problem I have with any other approach is that the compiler 
 doesn't understand the relationships, and you will run into 
 weird corner cases. It can't correctly add or strip the 
 qualifier to the correct members when calling compatible 
 functions.

 -Steve
If the type is unrelated, then you are back to the heuristic problem.
Sep 01 2021
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer 
wrote:
 I don't agree that COW is a good idea for generic collections. 
 IIRC, C++ tried this with std::string and it was a disaster.
You seem to be confused. The C++11 standard prohibits std::string from using COW. Existing C++ implementations were using COW, and getting out of it was what was a disaster (notably, there were breaking ABI changes, which the C++ ecosystem is not well suited to handle).
Aug 31 2021
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 8/31/21 8:52 PM, deadalnix wrote:
 On Tuesday, 31 August 2021 at 23:56:51 UTC, Steven Schveighoffer wrote:
 I don't agree that COW is a good idea for generic collections. IIRC, 
 C++ tried this with std::string and it was a disaster.
You seem to be confused. The C++11 standard prohibits std::string from using COW. Existing C++ implementations were using COW, and getting out of it was what was a disaster (notably, there were breaking ABI changes, which the C++ ecosystem is not well suited to handle).
Isn't the reason they prohibited COW is because of the problems with it? Not saying COW is terrible in all cases, just that a generic container (non-specialized usage) should not be COW. If you need that feature, you should have to ask for it, not have it be an implementation detail. -Steve
Sep 01 2021
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 11:13:27 UTC, Steven 
Schveighoffer wrote:
 Isn't the reason they prohibited COW is because of the problems 
 with it?

 Not saying COW is terrible in all cases, just that a generic 
 container (non-specialized usage) should not be COW. If you 
 need that feature, you should have to ask for it, not have it 
 be an implementation detail.

 -Steve
Well I never experienced any problem with them, so did many other C++ users. As far as i am concerned, this was a net step backward. That being said, the reason were due to iterator invalidation. There are several function in the string API that do not invalidate iterators, as per the C++ standard, and these function might end up copying the content of the string in some specific situations, which would invalidate iterators. In practice, I think the main problem here is related to exposing the innards of the datastructure. An iterator that contains a reference to the container + index, rather than some kind of raw pointer for instance, could avoid that problem.
Sep 01 2021
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 00:52:04 UTC, deadalnix wrote:
 You seem to be confused. The C++11 standard prohibits 
 std::string from using COW. Existing C++ implementations were 
 using COW, and getting out of it was what was a disaster 
 (notably, there were breaking ABI changes, which the C++ 
 ecosystem is not well suited to handle).
In what ways was it a disaster? Because most strings are small enough to benefit from the small string optimization?
Sep 01 2021
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 20:48:10 UTC, Per Nordlöw wrote:
 On Wednesday, 1 September 2021 at 00:52:04 UTC, deadalnix wrote:
 You seem to be confused. The C++11 standard prohibits 
 std::string from using COW. Existing C++ implementations were 
 using COW, and getting out of it was what was a disaster 
 (notably, there were breaking ABI changes, which the C++ 
 ecosystem is not well suited to handle).
In what ways was it a disaster? Because most strings are small enough to benefit from the small string optimization?
It was a disaster because gcc's standard lib used a COW string implementation, but had to change it when moving the C++11. Considering virutally everything uses string and that a lot of C++ software is distributed in binary format, and that the standard lib is often liked in live at launch time by the OS, you are faced with an ABI breakage nightmare.
Sep 01 2021
prev sibling next sibling parent reply max haughton <maxhaton gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.

 I'll describe here what are the design I intend to go for, and 
 the rationales for it, but before, there is an unsolved problem 
 that I don't know how to work around that is somewhat of a 
 blocker. If one of you guys have an idea how to solve it, I'm 
 taking. If I'm being honest, this is probably the most 
 important problem that need to be solved for D right now. That 
 problem is turtling down type qualifiers.

 For instance, if we have a Vector!T, then we want it to convert 
 to a Vector!(const T) and we want const(Vector!T) == 
 const(Vector!(const T)), all of it implicitly. Same goes for 
 other qualifiers.

 To put it bluntly, I have no idea how to achieve this. And I 
 don't think we'll have a good set of collection before this is 
 sorted out.

 Now onto the design:
  - Value types. It is very easy to turn a value into a 
 reference type, not so much the other way around. reference 
 type also come with a bag of edges cases, like the collection 
 being null vs empty, that cause confusion and problems in 
 practice.
  - COW. Value type collection can be very expensive to work 
 with if the eagerly copy everything. On the other hand, COW 
 ensures copies are cheap. There are also very common use case 
 that are well served by COW collections, for instance, when one 
 wants value types for collection that are infrequently changes 
 - but you don't know or can't prove that they won't.
  - Small collection features. As every engineer, but no 
 mathematician will tell you, is that most numbers are small. It 
 is also true of collection sizes. In order to speed things up 
 for collection that are expected to be small, being able to 
 specify Vector!(T, 8) to reserve 8 slots for elements, in the 
 base collection, and only allocate on heap if the collection 
 size grows past this. LLVM has a collection library doing so 
 and it is the most useful thing in the world (see 
 https://www.youtube.com/watch?v=vElZc6zSIXM for more details).
  - Not expose much of its internals. This like iterators being 
 raw pointers (or the result of using the in operator) for 
 instance, impose constraint on the internal that almost always 
 end up costing a ton of perf as new techniques are discovered 
 (recent exemples include open addressing with SSE probing for 
 instance). All of this needs to be wrapped.

 I have a good idea how to solve all these problems. I have no 
 idea how to solve the type qualifier one. I need help.
It's good that you are doing this Amaury, thanks. We need better containers to compete with other languages. On top of that, there are patterns that other languages cannot do. One thing that comes to mind is how D is one of not many languages (languages with macros or similar probably can), that can easily allow one to write containers which lie about their layout. The most profitable example of that I can think of is automatically switching arrays of POD types to be a struct of arrays internally. This transformation makes random access to elements slower, however (say) loops where access to a subset of the elements dominates then this is much faster due to improved cache efficiency.
Aug 31 2021
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 02:08:40 UTC, max haughton 
wrote:
 layout. The most profitable example of that I can think of is 
 automatically switching arrays of POD types to be a struct of 
 arrays internally. This transformation makes random access to 
 elements slower, however (say) loops where access to a subset 
 of the elements dominates then this is much faster due to 
 improved cache efficiency.
Speaking of which, here's my D implementation of struct of arrays (SoA): https://github.com/nordlow/phobos-next/blob/master/src/nxt/soa.d Peter Kirov has an alternative implementation providing some benefits. I've copied it into https://github.com/nordlow/phobos-next/blob/master/src/nxt/soa_petar_kirov.d The two implementations should be united. Note that Petar's SoA-implementation has its `capacity`-property functionally deduced from the `length` property as ```d /** Returns the current capacity - the current length rounded up to a power of two. This assumption allows us to save on having a separate field. */ size_t capacity() const pure safe { return _length.roundUpToPowerOf2; } ``` thereby saving a word of (stack) space. That behaviour could be opt-in aswell via a template parameter.
Sep 01 2021
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 02:08:40 UTC, max haughton 
wrote:
 We need better containers to compete with other languages. On 
 top of that, there are patterns that other languages cannot do. 
 One thing that comes to mind is how D is one of not many 
 languages (languages with macros or similar probably can), that 
 can easily allow one to write containers which lie about their 
 layout. The most profitable example of that I can think of is 
 automatically switching arrays of POD types to be a struct of 
 arrays internally. This transformation makes random access to 
 elements slower, however (say) loops where access to a subset 
 of the elements dominates then this is much faster due to 
 improved cache efficiency.
Great point. Game engines extensively use such data layouts when storing vertex data. That's reason for Jai wanting a builtin SoA type. Jai has interestingly since made that a library feature.
Sep 01 2021
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
You give Razvan and Andrei a ping aswell. They have done deep work on the topic of specifically trying to make RC-backed containers/collections safe pure nothrow nogc. One of Andrei's conclusion was that a run-time flag is needed to reveal whether a const parameter is fed an immutable argument or not.
Sep 01 2021
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 10:32:41 UTC, Per Nordlöw wrote:
 On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
You give Razvan and Andrei a ping aswell. They have done deep work on the topic of specifically trying to make RC-backed containers/collections safe pure nothrow nogc. One of Andrei's conclusion was that a run-time flag is needed to reveal whether a const parameter is fed an immutable argument or not.
I started looking into this after a short discussion with Andrei :)
Sep 01 2021
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 1 September 2021 at 10:33:47 UTC, deadalnix wrote:
 I started looking into this after a short discussion with 
 Andrei :)
Also highly related is Atila's work on doing `std.typecons.RefCounted` safe at https://github.com/dlang/phobos/pull/8101. Seems stalled, though.
Sep 01 2021
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
Moreover, Rust collections provides makes rich of static members such as [1] `C.with_capacity(size_t)` instead of guessing what C(42) means which have introduced bugs in my code previously. Might be worthwhile considering making use of these patterns. My containers make use of such static members. [1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity.
Sep 01 2021
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 1 September 2021 at 10:44:46 UTC, Per Nordlöw wrote:
 On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
Moreover, Rust collections provides makes rich of static members such as [1] `C.with_capacity(size_t)` instead of guessing what C(42) means which have introduced bugs in my code previously. Might be worthwhile considering making use of these patterns. My containers make use of such static members. [1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity.
I have adopted that style a lot lately and this is indeed much better than C(42).
Sep 01 2021
prev sibling next sibling parent vit <vit vit.vit> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 ...
 For instance, if we have a Vector!T, then we want it to convert 
 to a Vector!(const T) and we want const(Vector!T) == 
 const(Vector!(const T)), all of it implicitly. Same goes for 
 other qualifiers.

 To put it bluntly, I have no idea how to achieve this. And I 
 don't think we'll have a good set of collection before this is 
 sorted out.
 ...
Try `this` template parameter: ```d import std.traits : CopyConstness, isMutable; void main(){ { Vector!int a; Vector!(const int) b = a; const Vector!int c = b; } { Vector!int a; Vector!(const int) b; b = a; } } struct Vector(T){ alias ElementType = T; this(U, this This)(ref Vector!U rhs) if(!is(U == T) && isCopyable!(This, typeof(rhs))){ //impl } this(U, this This)(Vector!U rhs) if(isCopyable!(typeof(rhs), This)){ //impl } ///TODO copy ctors... void opAssign(U, this This)(auto ref Vector!U rhs) if(isCopyable!(typeof(rhs), This) && isMutable!This){ //impl } } private enum bool isCopyable(From, To) = is( CopyConstness!(From, From.ElementType[]) : CopyConstness!(To, To.ElementType[]) ); ```
Sep 01 2021
prev sibling next sibling parent ikod <igor.khasilev gmail.com> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
 I want to start working on a container library. Interestingly 
 enough, I don't think we have what we need on that front.
 Now onto the design:
  - Value types. It is very easy to turn a value into a 
 reference type, not so much the other way around. reference 
 type also come with a bag of edges cases, like the collection 
 being null vs empty, that cause confusion and problems in 
 practice.
My collections library[1] safety inherited from the stored data types. Any data access methods that returns reference marked with ' system', so you can't use it easily in safe code.
 I have a good idea how to solve all these problems. I have no 
 idea how to solve the type qualifier one. I need help.
You didn't mention iterator stability. [1] https://github.com/ikod/ikod-containers
Sep 02 2021
prev sibling parent reply Kagamin <spam here.lot> writes:
On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
  - Not expose much of its internals.
Do you want to expose the COW detail in the interface?
Sep 02 2021
parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 2 September 2021 at 09:15:10 UTC, Kagamin wrote:
 On Tuesday, 31 August 2021 at 17:21:02 UTC, deadalnix wrote:
  - Not expose much of its internals.
Do you want to expose the COW detail in the interface?
It doesn't need to be explicit, but in many instances, this is going to be visible, for instance, if the type stored has a copy constructor, then copying the container will be a very visible operation.
Sep 02 2021