www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6718] New: "nWayUnion" => "nWayMerge", plus true nWayUnion

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718

           Summary: "nWayUnion" => "nWayMerge", plus true nWayUnion
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



This is a bug report plus an optional enhancement request.

In std.algorithm there is a section of set functions that work on ordered
iterables, like nWayUnion:
http://www.d-programming-language.org/phobos/std_algorithm.html#nWayUnion

This is an example of nWayUnion usage:


import std.algorithm;
void main() {
    auto data = [[1, 2, 3], [1, 3, 5]];
    auto expected = [1, 1, 2, 3, 3, 5];
    assert(equal(nWayUnion(data), expected));
}


But this is not a set operation, and the result is not a set. A set never
contains repeated items (a certain definition of "duplication" is possible only
if you define your own equality relation).

So in my opinion a nWayUnion function has to return:

auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 2, 3, 5];
assert(equal(nWayUnion(data), expected));


While the current nWayUnion function has to be renamed "nWayMerge".

auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 1, 2, 3, 3, 5];
assert(equal(nWayMerge(data), expected));

In my opinion both nWayUnion and nWayMerge are very useful operations useful in
various situations, but they are different things. In my opinion confusing the
two leads to bugs in programs that use arrays to represent sets (I have just
hit a but caused by this).

nWayUnion is probably uniq(nWayMerge). If you don't want to add the "nWayUnion"
function, then I suggest to just rename "nWayUnion" => "nWayMerge" (and put it
outside the documentation section about set functions) to avoid some mistakes.
Names are important, and classifications too.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 22 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718


Andrei Alexandrescu <andrei erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei erdani.com



PST ---
Here "union" has the meaning in relational algebra, not in set theory. I'll
keep this opened until I add a mention to the documentation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 27 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718





 Here "union" has the meaning in relational algebra, not in set theory. I'll
 keep this opened until I add a mention to the documentation.
Thank you for the answer. So all those set functions turn out not being about _sets_. Adding a mention in the documentation is not enough. So: 1) I suggest to rename "nWayUnion" as "nWayMerge", to avoid any confusion in D programmers like me. In the Merge Sort that algorithm is named "Merge", so calling it "merge" has a very long tradition in computer science (http://en.wikipedia.org/wiki/Merge_algorithm ). 2) In this page: http://dlang.org/phobos/std_algorithm.html The "Set operations" column is misnamed because those are NOT sets. In Computer Science and in Mathematics those are named bags or multisets (http://en.wikipedia.org/wiki/Multiset ). But those functions work with a comparison operator and work on ordered sequences, so I suggest to call that column "Multiset operations" or "Ordered bags operations". 3) setIntersection is equally badly named. And it should be renamed "bagsIntersection", or something similar. Also the examples given in the site for setIntersection show only true sets, so this contribute to the confusion, it's even more easy to miss that it is actually a bags intersection: int[] a = [ 1, 2, 4, 5, 7, 9 ]; int[] b = [ 0, 1, 2, 4, 7, 8 ]; int[] c = [ 0, 1, 4, 5, 7, 8 ]; assert(equal(setIntersection(a, a), a)); assert(equal(setIntersection(a, b), [1, 2, 4, 7][])); assert(equal(setIntersection(a, b, c), [1, 4, 7][])); Those examples need to show clearly some repetitions, like this: int[] a = [1, 1, 2, 4, 5, 5, 5, 5, 7, 9]; int[] b = [0, 1, 2, 4, 7, 7, 8]; int[] c = [0, 1, 1, 1, 1, 4, 4, 5, 7, 8]; Extra) Is it worth to add a true nWayUnion (in set theory sense) to std.algorithm? It's a useful operation. A nWayUnion is possibly computed as: seqs.nWayMerge.uniq.array So maybe it's not worth it, I don't know. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 27 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major



Raised importance to Major, because this naming cancer in std.algorithm is
worse than I expected in 2011.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 27 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718


Andrei Alexandrescu <andrei erdani.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |minor



PST ---
No need to raise importance out of proportion. Thanks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 27 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718





 No need to raise importance out of proportion. Thanks.
OK, sorry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 27 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6718


Denis Shelomovskij <verylonglogin.reg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |verylonglogin.reg gmail.com
           Severity|minor                       |normal



18:00:41 MSD ---
Bearophile, what about to start NG thread about it? Bad names are always causes
of user code bugs so this issue should be thoroughly considered.

Also restored "normal" importance as terminology isn't a minor issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 30 2013