digitalmars.D.learn - How to use sets in D?
- tastyminerals (4/22) Feb 08 2022 Can you please explain what does `bool[E]` mean? And how does the
- H. S. Teoh (24/45) Feb 08 2022 [...]
- Siarhei Siamashka (10/12) Feb 09 2022 Is the current implementation of associative arrays in D language
- Siarhei Siamashka (66/68) Feb 09 2022 Answering to myself. No, it isn't. Here's a simple example:
- H. S. Teoh (12/24) Feb 09 2022 [...]
- tastyminerals (2/10) Feb 10 2022 Thank you! That's neat.
- Paul Backus (26/48) Feb 08 2022 `bool[E]` means an [associative array][1] with keys of type `E`
- H. S. Teoh (9/17) Feb 08 2022 [...]
- Paul Backus (4/19) Feb 08 2022 Ah, yeah, I forgot about that bug.
- H. S. Teoh (7/10) Feb 08 2022 OT1H, nice that works! ... but OTOH, ugh, it looks so ugly. :-P I think
https://forum.dlang.org/post/mailman.1072.1581112984.31109.digitalmars-d-learn puremagic.com On Friday, 7 February 2020 at 22:03:00 UTC, H. S. Teoh wrote:On Fri, Feb 07, 2020 at 07:37:08PM +0000, mark via Digitalmars-d-learn wrote:Can you please explain what does `bool[E]` mean? And how does the code with aliasing void[0] and then using enum even work?[...]bool[E] works just fine. The bool does take up 1 byte, though, so if you're *really* want to optimize that away, you could do this: alias Unit = void[0]; enum unit = Unit.init; // Look, ma! A bona fide set! Unit[E] mySet; mySet[...] = unit; mySet.remove(...); ... // etc. Or you can wrap void[0][E] in a nice user-defined type that gives nice set-like syntax. But IMO, this is all overkill, and adds needless complexity. Just use bool[E] or std.container.rbtree. :-D T
Feb 08 2022
On Tue, Feb 08, 2022 at 09:08:47PM +0000, tastyminerals via Digitalmars-d-learn wrote: [...][...]bool[E] works just fine. The bool does take up 1 byte, though, so if you're *really* want to optimize that away, you could do this: alias Unit = void[0]; enum unit = Unit.init; // Look, ma! A bona fide set! Unit[E] mySet; mySet[...] = unit; mySet.remove(...); ... // etc. Or you can wrap void[0][E] in a nice user-defined type that gives nice set-like syntax. But IMO, this is all overkill, and adds needless complexity. Just use bool[E] or std.container.rbtree. :-DCan you please explain what does `bool[E]` mean?E is whatever key type you want to use. String, or integer ID, or whatever you want to put in your set.And how does the code with aliasing void[0] and then using enum even work?The alias is a workaround for a syntactic limitation in D, because you cannot write `mySet[someKey] = void[0].init;` without a syntax error. So we use the alias to give `void[0]` a name that we can refer to in assignment statements, so that we can assign its values to the AA. As for `void[0]` itself, it's just a zero-element static array of an indeterminate type. The whole point is to create something that occupies 0 bytes. The actual type doesn't actually matter; you could have used int[0] and it'd work too. (You can't use an empty struct because empty structs have size 1.) By using a zero-sized value type for the AA, we effectively turn it into a set: it only stores keys. Well, technically it stores values of zero size too, but since it's zero-sized, it's as if the value isn't there, and it incurs zero memory overhead. (Whereas an empty struct would likely occupy an entire machine word, rounded up from size 1.) But as I said, this is overkill for something so trivial. Using `bool[E]` or an RBTree works just fine. T -- Javascript is what you use to allow third party programs you don't know anything about and doing you know not what to run on your computer. -- Charles Hixson
Feb 08 2022
On Tuesday, 8 February 2022 at 21:42:06 UTC, H. S. Teoh wrote:But as I said, this is overkill for something so trivial. Using `bool[E]` or an RBTree works just fine.Is the current implementation of associative arrays in D language resistant to Denial of Service hash collision attacks? * https://github.com/dlang/dmd/blob/master/src/dmd/root/aav.d * https://arstechnica.com/information-technology/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/ * https://lwn.net/Articles/474912/ * https://codeforces.com/blog/entry/62393 If not, then redBlackTree may be safer to use when dealing with potentially adversarially constructed input data.
Feb 09 2022
On Wednesday, 9 February 2022 at 21:05:47 UTC, Siarhei Siamashka wrote:Is the current implementation of associative arrays in D language resistant to Denial of Service hash collision attacks?Answering to myself. No, it isn't. Here's a simple example: ```D import std, core.time; const ulong n = 100000; // Count the number of unique elements using associative array ulong count_unique_values_assoc(const ulong[] a) { bool[ulong] set; foreach (x ; a) set[x] = true; return set.length; } // Count the number of unique elements using redBlackTree ulong count_unique_values_rbtree(const ulong[] a) { auto set = redBlackTree!("a<b", false, ulong)(); foreach (x ; a) set.insert(x); return set.length; } // Generate array, which triggers hash collisions in the // current D language associative array implementation ulong[] generate_bad_array(ulong n) { return iota(n).map!(x => (x << 46)).array; } // Benchmark associative array vs. rbtree void run_benchmark(ulong[] a) { auto t1 = MonoTime.currTime; auto result_assoc = a.count_unique_values_assoc; auto t2 = MonoTime.currTime; auto result_rbtree = a.count_unique_values_rbtree; auto t3 = MonoTime.currTime; assert(result_assoc == result_rbtree); writefln("n=%d, unique_elements=%d, assoc time: %d ms, rbtree time: %d ms", a.length, result_assoc, (t2 - t1).total!"msecs", (t3 - t2).total!"msecs"); } void main() { assert([1UL, 1UL, 2UL].count_unique_values_assoc == 2); assert([1UL, 1UL, 2UL].count_unique_values_rbtree == 2); writefln("== consecutive numbers =="); n.iota.array.run_benchmark; writefln("\n== random numbers between 0 and 99 =="); n.iota.map!(_ => uniform(0UL, 100UL)).array.run_benchmark; writefln("\n== adversarially constructed array =="); n.generate_bad_array.run_benchmark; } ``` Benchmark results using a 64-bit LDC compiler: ``` == consecutive numbers == n=100000, unique_elements=100000, assoc time: 11 ms, rbtree time: 20 ms == random numbers between 0 and 99 == n=100000, unique_elements=100, assoc time: 2 ms, rbtree time: 4 ms == adversarially constructed array == n=100000, unique_elements=100000, assoc time: 22626 ms, rbtree time: 15 ms ```
Feb 09 2022
On Thu, Feb 10, 2022 at 12:53:08AM +0000, Siarhei Siamashka via Digitalmars-d-learn wrote:On Wednesday, 9 February 2022 at 21:05:47 UTC, Siarhei Siamashka wrote:[...]Is the current implementation of associative arrays in D language resistant to Denial of Service hash collision attacks?Answering to myself. No, it isn't. Here's a simple example:ulong[] generate_bad_array(ulong n) { return iota(n).map!(x => (x << 46)).array; }[...]== adversarially constructed array == n=100000, unique_elements=100000, assoc time: 22626 ms, rbtree time: 15 ms ```Nice investigation! This should be filed as a bug. I remember browsing through various hash functions in druntime before (built-in AA's use per-type hash functions), and many of them were trivial, which would imply that it's trivial to construct adversarial input that triggers a large number of collisions for various basic types. T -- Give me some fresh salted fish, please.
Feb 09 2022
On Tuesday, 8 February 2022 at 21:42:06 UTC, H. S. Teoh wrote:On Tue, Feb 08, 2022 at 09:08:47PM +0000, tastyminerals via Digitalmars-d-learn wrote: [...]Thank you! That's neat.[...][...][...]E is whatever key type you want to use. String, or integer ID, or whatever you want to put in your set. [...]
Feb 10 2022
On Tuesday, 8 February 2022 at 21:08:47 UTC, tastyminerals wrote:https://forum.dlang.org/post/mailman.1072.1581112984.31109.digitalmars-d-learn puremagic.com On Friday, 7 February 2020 at 22:03:00 UTC, H. S. Teoh wrote:[...]On Fri, Feb 07, 2020 at 07:37:08PM +0000, mark via Digitalmars-d-learn wrote:[...]bool[E] works just fine. The bool does take up 1 byte, though, so if you're *really* want to optimize that away, you could do this: alias Unit = void[0]; enum unit = Unit.init; // Look, ma! A bona fide set! Unit[E] mySet; mySet[...] = unit; mySet.remove(...); ... // etc.Can you please explain what does `bool[E]` mean? And how does the code with aliasing void[0] and then using enum even work?`bool[E]` means an [associative array][1] with keys of type `E` and values of type `bool`. Since a `bool` value takes up 1 byte in memory, using a `bool[E]` associative array to store a set of `E` values means that for every value in your set, you will have to allocate an extra 1 byte for the `bool` in addition to the bytes required for the `E` itself. In most cases, 1 extra byte is not going to matter very much, so that's not a big deal. But if you really, really want to avoid allocating any extra bytes, you can use `void[0]` in place of `bool`, since a `void[0]` takes up 0 bytes in memory. (The `void` part has no special significance here--you could also use `int[0]` or `char[0]`, for example.) The `alias` and the `enum` just make the code a little nicer to read by letting you write `Unit` instead of `void[0]` and `unit` instead of `void[0].init`. You could get rid of them and the code would work exactly the same way; it'd just be a little bit uglier: void[0][E] mySet; mySet[...] = void[0].init; mySet.remove(...); // etc. The name "unit" comes from the concept of a [unit type][2] in theoretical computer science. [1]: https://dlang.org/spec/hash-map.html [2]: https://en.wikipedia.org/wiki/Unit_type
Feb 08 2022
On Tue, Feb 08, 2022 at 09:47:13PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]The `alias` and the `enum` just make the code a little nicer to read by letting you write `Unit` instead of `void[0]` and `unit` instead of `void[0].init`. You could get rid of them and the code would work exactly the same way; it'd just be a little bit uglier: void[0][E] mySet; mySet[...] = void[0].init;[...] Unfortunately, this doesn't work due to a syntax restriction (the parser isn't expecting a type name after the `=`, and will raise a syntax error). So the alias is in fact necessary. T -- Fact is stranger than fiction.
Feb 08 2022
On Tuesday, 8 February 2022 at 21:58:49 UTC, H. S. Teoh wrote:On Tue, Feb 08, 2022 at 09:47:13PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]Ah, yeah, I forgot about that bug. It also works if you use parentheses: mySet[...] = (void[0]).init;The `alias` and the `enum` just make the code a little nicer to read by letting you write `Unit` instead of `void[0]` and `unit` instead of `void[0].init`. You could get rid of them and the code would work exactly the same way; it'd just be a little bit uglier: void[0][E] mySet; mySet[...] = void[0].init;[...] Unfortunately, this doesn't work due to a syntax restriction (the parser isn't expecting a type name after the `=`, and will raise a syntax error). So the alias is in fact necessary.
Feb 08 2022
On Tue, Feb 08, 2022 at 10:02:30PM +0000, Paul Backus via Digitalmars-d-learn wrote: [...]It also works if you use parentheses: mySet[...] = (void[0]).init;OT1H, nice that works! ... but OTOH, ugh, it looks so ugly. :-P I think I'll just stick with the alias. T -- Who told you to swim in Crocodile Lake without life insurance??
Feb 08 2022