digitalmars.D.bugs - [Issue 6718] New: "nWayUnion" => "nWayMerge", plus true nWayUnion
- d-bugmail puremagic.com (46/46) Sep 22 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (11/11) Feb 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (39/41) Feb 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (11/11) Feb 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (10/10) Feb 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (7/8) Feb 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
- d-bugmail puremagic.com (13/13) Sep 30 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6718
http://d.puremagic.com/issues/show_bug.cgi?id=6718
           Summary: "nWayUnion" => "nWayMerge", plus true nWayUnion
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc
This is a bug report plus an optional enhancement request.
In std.algorithm there is a section of set functions that work on ordered
iterables, like nWayUnion:
http://www.d-programming-language.org/phobos/std_algorithm.html#nWayUnion
This is an example of nWayUnion usage:
import std.algorithm;
void main() {
    auto data = [[1, 2, 3], [1, 3, 5]];
    auto expected = [1, 1, 2, 3, 3, 5];
    assert(equal(nWayUnion(data), expected));
}
But this is not a set operation, and the result is not a set. A set never
contains repeated items (a certain definition of "duplication" is possible only
if you define your own equality relation).
So in my opinion a nWayUnion function has to return:
auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 2, 3, 5];
assert(equal(nWayUnion(data), expected));
While the current nWayUnion function has to be renamed "nWayMerge".
auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 1, 2, 3, 3, 5];
assert(equal(nWayMerge(data), expected));
In my opinion both nWayUnion and nWayMerge are very useful operations useful in
various situations, but they are different things. In my opinion confusing the
two leads to bugs in programs that use arrays to represent sets (I have just
hit a but caused by this).
nWayUnion is probably uniq(nWayMerge). If you don't want to add the "nWayUnion"
function, then I suggest to just rename "nWayUnion" => "nWayMerge" (and put it
outside the documentation section about set functions) to avoid some mistakes.
Names are important, and classifications too.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Sep 22 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6718
Andrei Alexandrescu <andrei erdani.com> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei erdani.com
PST ---
Here "union" has the meaning in relational algebra, not in set theory. I'll
keep this opened until I add a mention to the documentation.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Feb 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6718Here "union" has the meaning in relational algebra, not in set theory. I'll keep this opened until I add a mention to the documentation.Thank you for the answer. So all those set functions turn out not being about _sets_. Adding a mention in the documentation is not enough. So: 1) I suggest to rename "nWayUnion" as "nWayMerge", to avoid any confusion in D programmers like me. In the Merge Sort that algorithm is named "Merge", so calling it "merge" has a very long tradition in computer science (http://en.wikipedia.org/wiki/Merge_algorithm ). 2) In this page: http://dlang.org/phobos/std_algorithm.html The "Set operations" column is misnamed because those are NOT sets. In Computer Science and in Mathematics those are named bags or multisets (http://en.wikipedia.org/wiki/Multiset ). But those functions work with a comparison operator and work on ordered sequences, so I suggest to call that column "Multiset operations" or "Ordered bags operations". 3) setIntersection is equally badly named. And it should be renamed "bagsIntersection", or something similar. Also the examples given in the site for setIntersection show only true sets, so this contribute to the confusion, it's even more easy to miss that it is actually a bags intersection: int[] a = [ 1, 2, 4, 5, 7, 9 ]; int[] b = [ 0, 1, 2, 4, 7, 8 ]; int[] c = [ 0, 1, 4, 5, 7, 8 ]; assert(equal(setIntersection(a, a), a)); assert(equal(setIntersection(a, b), [1, 2, 4, 7][])); assert(equal(setIntersection(a, b, c), [1, 4, 7][])); Those examples need to show clearly some repetitions, like this: int[] a = [1, 1, 2, 4, 5, 5, 5, 5, 7, 9]; int[] b = [0, 1, 2, 4, 7, 7, 8]; int[] c = [0, 1, 1, 1, 1, 4, 4, 5, 7, 8]; Extra) Is it worth to add a true nWayUnion (in set theory sense) to std.algorithm? It's a useful operation. A nWayUnion is possibly computed as: seqs.nWayMerge.uniq.array So maybe it's not worth it, I don't know. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
 Feb 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6718
bearophile_hugs eml.cc changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major
Raised importance to Major, because this naming cancer in std.algorithm is
worse than I expected in 2011.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Feb 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6718
Andrei Alexandrescu <andrei erdani.com> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |minor
PST ---
No need to raise importance out of proportion. Thanks.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Feb 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6718No need to raise importance out of proportion. Thanks.OK, sorry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
 Feb 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6718
Denis Shelomovskij <verylonglogin.reg gmail.com> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |verylonglogin.reg gmail.com
           Severity|minor                       |normal
18:00:41 MSD ---
Bearophile, what about to start NG thread about it? Bad names are always causes
of user code bugs so this issue should be thoroughly considered.
Also restored "normal" importance as terminology isn't a minor issue.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Sep 30 2013








 
  
  
 
 d-bugmail puremagic.com
 d-bugmail puremagic.com 