digitalmars.D.learn - Intersection of two sets

Jan =?UTF-8?B?SMO2bmln?= (24/24) Dec 03 2019 It seems i don't google the right keywords.

Alex (6/30) Dec 03 2019 This depends on the available accesses on your sets. In terms of

Jan =?UTF-8?B?SMO2bmln?= (5/11) Dec 03 2019 In terms of ranges, i need to understand ranges properly first.

mipri (46/62) Dec 03 2019 The top of https://dlang.org/phobos/std_range.html has some

ikod (4/28) Dec 03 2019 Never tried, but depending of the nature of your "something" you

Jan =?UTF-8?B?SMO2bmln?= (3/6) Dec 03 2019 "roaring" is massive overkill for my case, but thanks for

Per =?UTF-8?B?Tm9yZGzDtnc=?= (20/33) Dec 04 2019 One simple optimization is to

Jan =?UTF-8?B?SMO2bmln?= (3/36) Dec 06 2019 Thanks!

Jan =?UTF-8?B?SMO2bmln?= <hrominium gmail.com> writes:

It seems i don't google the right keywords.

What i want to do: I have two sets. (I didn't find how to do 
sets, so i have two associative boolean arrays 
`bool[<something>]`). And i want to join them, via an 
intersection.

I know how to code this. Loop over one AA, if the key is also in 
the other one, we add that to the third array which is going to 
be returned.

pseudocode:
alias set = bool[<something>]
set foo = ...
set bar = ...
set result;

foreach(key; foo)
{
   if (key in bar)
   {
     result[key] = true;
   }
}
return result;


1) Is there a better way for creating a set, other then `alias 
set = bool[keyClass]`?
2) Are there some build-in function for handling such sets?

Dec 03 2019

Alex <sascha.orlov gmail.com> writes:

On Tuesday, 3 December 2019 at 13:43:26 UTC, Jan Hönig wrote:
 It seems i don't google the right keywords.

 What i want to do: I have two sets. (I didn't find how to do 
 sets, so i have two associative boolean arrays 
 `bool[<something>]`). And i want to join them, via an 
 intersection.

 I know how to code this. Loop over one AA, if the key is also 
 in the other one, we add that to the third array which is going 
 to be returned.

 pseudocode:
 alias set = bool[<something>]
 set foo = ...
 set bar = ...
 set result;

 foreach(key; foo)
 {
   if (key in bar)
   {
     result[key] = true;
   }
 }
 return result;


 1) Is there a better way for creating a set, other then `alias 
 set = bool[keyClass]`?

This depends on the available accesses on your sets. In terms of 
ranges:
Are your sets InputRanges, ForwardRange, ... ?


 2) Are there some build-in function for handling such sets?

This is maybe what you are looking for: 
https://dlang.org/phobos/std_algorithm_setops.html

Dec 03 2019

Jan =?UTF-8?B?SMO2bmln?= <hrominium gmail.com> writes:

On Tuesday, 3 December 2019 at 13:55:51 UTC, Alex wrote:
 This depends on the available accesses on your sets. In terms 
 of ranges:
 Are your sets InputRanges, ForwardRange, ... ?


 2) Are there some build-in function for handling such sets?

 This is maybe what you are looking for: 
 https://dlang.org/phobos/std_algorithm_setops.html

In terms of ranges, i need to understand ranges properly first.
The std.algorithm.setops have definitely the functionality i need.
I guess my current implementation would be a simple array.
I just need to make sure delete, or not create double entries.

Dec 03 2019

mipri <mipri minimaltype.com> writes:

On Tuesday, 3 December 2019 at 18:45:18 UTC, Jan Hönig wrote:
 On Tuesday, 3 December 2019 at 13:55:51 UTC, Alex wrote:
 This depends on the available accesses on your sets. In terms 
 of ranges:
 Are your sets InputRanges, ForwardRange, ... ?


 2) Are there some build-in function for handling such sets?

 This is maybe what you are looking for: 
 https://dlang.org/phobos/std_algorithm_setops.html

 In terms of ranges, i need to understand ranges properly first.

The top of https://dlang.org/phobos/std_range.html has some
talks and tutorials to consider.

 The std.algorithm.setops have definitely the functionality i 
 need.
 I guess my current implementation would be a simple array.
 I just need to make sure delete, or not create double entries.

A problem with the unittest examples is that they assume the
reader is very familiar with the language :) Here are some
things to keep in mind with setIntersection:

1. setIntersection expects its inputs to be sorted. If they
aren't, it doesn't notice; you just don't get the behavior you
want.

   int[] a = [4, 2, 3, 1], b = [8, 4, 2, 6];

   assert(setIntersection(a, b).array == []); // !
   assert(setIntersection(a.sort, b.sort).array == [2, 4]);

2. setIntersection returns a range, not an array. This probably
lets you return the infinite intersections of two infinite
ranges, for example. If you're looking at this for Advent of
Code you'll probably want to pass the intersections through
some other operators that will accept a range.

(Also, sort mutates *and* returns the array. if you're
depending on the positions of 2 and 4 in the original arrays,
you should get the intersection of a sorted copy of them.)

In general, pay attention to the static constraints:

   if (Rs.length >= 2 && allSatisfy!(isInputRange, Rs) &&
   !is(CommonType!(staticMap!(ElementType, Rs)) == void));

In English:

1. if you're getting an intersection of ranges, you need to
provide at least two ranges.

2. all of the ranges need to be input ranges

3. the ranges' element types need to have a common type that
isn't 'void'.

You can use these same constraints in your code if you're not
sure if they apply

     static assert (isInputRange!(typeof(a)));
     assert(setIntersection(a, b).array == []);

or you can test them interactively:

   $ rdmd --eval 'isInputRange!(typeof([1, 2, 3])).writeln'
   true

And you can also check on their documentation and see if it's
what you expect:

   https://dlang.org/phobos/std_range_primitives.html

D doesn't have 500+ line errors for the simplest template
misuse like C does, but its much shorter template error
messages still could be friendlier. Usually they're in terms of
these constraints, or you've a more basic error like having a
range type (like the SetIntersection struct that
setIntersection actually returns) where some other code expects
a simple array, or vice versa.

Dec 03 2019

ikod <igor.khasilev gmail.com> writes:

On Tuesday, 3 December 2019 at 13:43:26 UTC, Jan Hönig wrote:
 It seems i don't google the right keywords.

 What i want to do: I have two sets. (I didn't find how to do 
 sets, so i have two associative boolean arrays 
 `bool[<something>]`). And i want to join them, via an 
 intersection.

 I know how to code this. Loop over one AA, if the key is also 
 in the other one, we add that to the third array which is going 
 to be returned.

 pseudocode:
 alias set = bool[<something>]
 set foo = ...
 set bar = ...
 set result;

 foreach(key; foo)
 {
   if (key in bar)
   {
     result[key] = true;
   }
 }
 return result;


 1) Is there a better way for creating a set, other then `alias 
 set = bool[keyClass]`?
 2) Are there some build-in function for handling such sets?

Never tried, but depending of the nature of your "something" you 
can try bit sets. There are efficient algorithms for large bit 
arrays (see "roaring" for example).

Dec 03 2019

Jan =?UTF-8?B?SMO2bmln?= <hrominium gmail.com> writes:

On Tuesday, 3 December 2019 at 15:14:03 UTC, ikod wrote:
 Never tried, but depending of the nature of your "something" 
 you can try bit sets. There are efficient algorithms for large 
 bit arrays (see "roaring" for example).

"roaring" is massive overkill for my case, but thanks for 
suggesting it. I didn't know about it.

Dec 03 2019

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 3 December 2019 at 13:43:26 UTC, Jan Hönig wrote:
 pseudocode:
 alias set = bool[<something>]
 set foo = ...
 set bar = ...
 set result;

One simple optimization is to

set* smallest;
set* largest;

if (foo.length < bar.length)
{
     smallest = &foo;
     largest = &bar;
}
else
{
     smallest = &bar;
     largest = &foo;
}

 foreach(key; *smallest)
 {
   if (key in *largest)
   {
     result[key] = true;
   }
 }
 return result;

Provided that your set type has an `in`-operator with 
time-complexity O(1), this will, in the worst case, reduce time 
complexity from

     O(max(foo.length, bar.length))

to

     O(min(foo.length, bar.length))

Dec 04 2019

Jan =?UTF-8?B?SMO2bmln?= <hrominium gmail.com> writes:

On Wednesday, 4 December 2019 at 08:01:59 UTC, Per Nordlöw wrote:
 On Tuesday, 3 December 2019 at 13:43:26 UTC, Jan Hönig wrote:
 pseudocode:
 alias set = bool[<something>]
 set foo = ...
 set bar = ...
 set result;

 One simple optimization is to

 set* smallest;
 set* largest;

 if (foo.length < bar.length)
 {
     smallest = &foo;
     largest = &bar;
 }
 else
 {
     smallest = &bar;
     largest = &foo;
 }

 foreach(key; *smallest)
 {
   if (key in *largest)
   {
     result[key] = true;
   }
 }
 return result;

 Provided that your set type has an `in`-operator with 
 time-complexity O(1), this will, in the worst case, reduce time 
 complexity from

     O(max(foo.length, bar.length))

 to

     O(min(foo.length, bar.length))

Thanks!
I didn't even thought about optimizations like this :)

Dec 06 2019

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Intersection of two sets