www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - cashew.utils.array 0.1a.2

reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
I've been pushing for some array utilities to get into Phobos, yes, but in the
meantime I 
updated the array module in my personal Cashew lib.  Included in the updates
are the 
rotl/rotr that I recall someone asking about.  In the process I've discovered
two bugs, as 
well: the behavior of 'is' versus '==' is incompatable if the operands are
arrays, and 
DDoc for abbreviated function templates is borked.  For an example of the
latter, just 
look at Cashew's own docs.

That said: if people think this is a decent collection, and Walter would take
it, I would 
be willing to release it to public domain for Phobos' sake.

The array module is attached, and the docs are at:
http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

-- Christopher Nicholson-Sauls
Sep 12 2006
next sibling parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
And just a few minutes later, I issue another release...  I'd neglected a
little something 
from rotl() and rotr(), namely I hadn't accounted for iterations greater than
the array's 
length.  Heh.  Fixed now.  New version attached.

-- Chris Nicholson-Sauls
Sep 12 2006
prev sibling next sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Chris Nicholson-Sauls wrote:
 I've been pushing for some array utilities to get into Phobos, yes, but 
 in the meantime I updated the array module in my personal Cashew lib.  
 Included in the updates are the rotl/rotr that I recall someone asking 
 about.  In the process I've discovered two bugs, as well: the behavior 
 of 'is' versus '==' is incompatable if the operands are arrays, and DDoc 
 for abbreviated function templates is borked.  For an example of the 
 latter, just look at Cashew's own docs.
Nice.. I hope something like this will make it into Phobos at some point, since everybody seems to have implemented these functions anyway.. One remark though: your removeAll is pretty slow. It should be enough to iterate the array only once. A indexOf that takes a starting_index would be a simple fix. L.
Sep 13 2006
parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Lionello Lunesu wrote:
 Chris Nicholson-Sauls wrote:
 
 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For an 
 example of the latter, just look at Cashew's own docs.
Nice.. I hope something like this will make it into Phobos at some point, since everybody seems to have implemented these functions anyway.. One remark though: your removeAll is pretty slow. It should be enough to iterate the array only once. A indexOf that takes a starting_index would be a simple fix. L.
True... and fixed. :) Functions indexOf, rindexOf, indexOfSub, and rindexOfSub all take an optional 'start' parameter now. This effected the design of functions removeAll and unique. I did, however, have to do something weird with removeAll. It just didn't want to work right any other way... must play with it some more later. -- Chris Nicholson-Sauls
Sep 13 2006
next sibling parent reply clayasaurus <clayasaurus gmail.com> writes:
Chris Nicholson-Sauls wrote:
 Lionello Lunesu wrote:
 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For 
 an example of the latter, just look at Cashew's own docs.
Nice.. I hope something like this will make it into Phobos at some point, since everybody seems to have implemented these functions anyway.. One remark though: your removeAll is pretty slow. It should be enough to iterate the array only once. A indexOf that takes a starting_index would be a simple fix. L.
True... and fixed. :) Functions indexOf, rindexOf, indexOfSub, and rindexOfSub all take an optional 'start' parameter now. This effected the design of functions removeAll and unique. I did, however, have to do something weird with removeAll. It just didn't want to work right any other way... must play with it some more later. -- Chris Nicholson-Sauls ------------------------------------------------------------------------ /***************************************************************************************** * Utility template functions for arrays. Example: * * $(CASHEW_HEAD) *---------------------------------------------------------------------------------------- * import cashew .utils .array ; * // ... * int[] foo = ...; * foo.remove(4); * foo.eat(foo.indexOf(2)); *---------------------------------------------------------------------------------------- */ module cashew.utils.array ; /*********************************************************************************** * Unit tests support. */ version (Unittest) { import std .stdio ; } unittest { writefln("Unittest: cashew.utils.array: begin"); } /*********************************************************************************** * Create an array. */ T[] array (T) (T[] arr ...) body { return arr.dup; } unittest { writef("\t array(...) --------> "); auto foo = array!(int)(1, 2, 3); assert(foo.length == 3U); assert(typeid(typeof(foo[0])) == typeid(typeof(1))); writefln("Pass"); } /*********************************************************************************** * Verify that a given value is present in an array. */ bool contains (T) (inout T[] haystack, T needle) body { return haystack.indexOf(needle) != size_t.max; } unittest { writef("\t.contains(T) -------> "); auto foo = array!(int)(1, 2, 3); assert( foo.contains(2)); assert(! foo.contains(4)); writefln("Pass"); } /*********************************************************************************** * Compose a new array of the values in the first parameter not present in the second. */ T[] diff (T) (inout T[] alpha, inout T[] beta) body { T[] result = alpha.dup; foreach (x; alpha) { if (beta.contains(x)) { while (result.contains(x)) { result.remove(x); } } } return result; } unittest { writef("\t.diff(T[]) ---------> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto bar = array!(int)(1, 3, 5); assert(foo.diff(bar) == array!(int)(2, 4)); writefln("Pass"); } /*********************************************************************************** * Remove some of the beginning of an array. */ void eat (T) (inout T[] haystack, size_t count) body { if (count > haystack.length) { haystack.length = 0; } else { haystack = haystack[count .. haystack.length]; } } unittest { writef("\t.eat(N) ------------> "); auto foo = array!(int)(1, 2, 3, 4, 5); foo.eat(3U); assert(foo == array!(int)(4, 5)); writefln("Pass"); } /*********************************************************************************** * Search an array for a given item, and return its index, or size_t.max if not found. */ size_t indexOf (T) (inout T[] haystack, T needle, size_t start = 0U) body { size_t result = size_t.max ; foreach (index, item; haystack[start .. haystack.length]) { if (item is needle) { result = index; break; } } if (result != size_t.max) { result += start; } return result; } unittest { writef("\t.indexOf(T) --------> "); auto foo = array!(int)(1, 2, 3); assert(foo.indexOf(2) == 1U); writefln("Pass"); } /*********************************************************************************** * Search an array for a given sub-array, and return its index, or size_t.max if not found. */ size_t indexOfSub (T) (inout T[] haystack, inout T[] bale, size_t start = 0U) body { size_t result = size_t.max , wall = haystack.length - bale.length ; for (size_t i = start; i < wall; i++) { if (haystack[i .. i + bale.length] == bale) { result = i; break; } } return result; } unittest { writef("\t.indexOfSub(T[]) ---> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto sub = array!(int)( 3, 4 ); assert(foo.indexOfSub(sub) == 2U); writefln("Pass"); } /*********************************************************************************** * Same as indexOf but work backwards from the end. */ size_t rindexOf (T) (inout T[] haystack, T needle, size_t start = size_t.max) body { size_t result = size_t.max ; if (start == size_t.max) { start = haystack.length - 1U; } for (size_t i = start; i >= 0U; i--) { if (haystack[i] is needle) { result = i; break; } } return result; } unittest { writef("\t.rindexOf(T) -------> "); auto foo = array!(int)(1, 9, 2, 9, 3); assert(foo.rindexOf(9) == 3U); writefln("Pass"); } /*********************************************************************************** * Same as indexOf but work backwards from the end. */ size_t rindexOfSub (T) (inout T[] haystack, inout T[] bale, size_t start = size_t.max) body { size_t result = size_t.max ; if (start == size_t.max) { start = haystack.length; } for (size_t i = start - bale.length; i >= 0; i--) { if (haystack[i .. i + bale.length] == bale) { result = i; break; } } return result; } unittest { writef("\t.rindexOfSub(T[]) --> "); auto foo = array!(int)(1, 2, 9, 8, 1, 2, 9, 8); auto sub = array!(int)(1, 2 ); assert(foo.rindexOfSub(sub) == 4U); writefln("Pass"); } /*********************************************************************************** * Return an array composed of common elements of two arrays. */ T[] intersect (T) (inout T[] alpha, inout T[] beta) body { T[] result; foreach (x; alpha) { if (beta.contains(x)) { result ~= x; } } return result; } unittest { writef("\t.intersect(T[]) ----> "); auto foo = array!(int)(1, 2, 3, 4, 5 ); auto bar = array!(int)( 3, 4, 5, 6, 7); assert(foo.intersect(bar) == array!(int)(3, 4, 5)); writefln("Pass"); } /*********************************************************************************** * Remove an item from an array. */ void remove (T) (inout T[] haystack, T needle) body { size_t index = haystack.indexOf(needle) ; if (index != size_t.max) { haystack.removeIndex(index); } } unittest { writef("\t.remove(T) ---------> "); auto foo = array!(int)(1, 2, 3); foo.remove(2); assert(foo == array!(int)(1, 3)); writefln("Pass"); } /*********************************************************************************** * Remove all occurances of an item from an array. */ void removeAll (T) (inout T[] haystack, T needle) body { size_t index = 0U ; size_t[] indices ; while ((index = haystack.indexOf(needle, index)) != size_t.max) { indices ~= index; index++; } foreach (i, x; indices) { haystack.removeIndex(x - i); } } unittest { writef("\t.removeAll(T) ------> "); auto foo = array!(int)(1, 2, 1, 3, 1, 4, 1, 5); foo.removeAll(1); assert(foo == array!(int)(2, 3, 4, 5)); writefln("Pass"); } /*********************************************************************************** * Remove the item at a given index. */ void removeIndex (T) (inout T[] haystack, size_t index) body { haystack = haystack[0 .. index] ~ haystack[index + 1 .. haystack.length]; } unittest { writef("\t.removeIndex(N) ----> "); auto foo = array!(int)(1, 2, 3, 4); foo.removeIndex(2U); assert(foo == array!(int)(1, 2, 4)); writefln("Pass"); } /*********************************************************************************** * Build an array by repeating an item. */ void repeat (T) (inout T[] haystack, T needle, size_t len = size_t.max) body { if (len == size_t.max) { len = haystack.length; } haystack.length = len; haystack[] = needle; } unittest { writef("\t.repeat(T [, N]) ---> "); int[] foo ; foo.repeat(3, 3U); assert(foo == array!(int)(3, 3, 3)); auto bar = array!(int)(0, 0, 0, 0, 0, 0, 0); bar.repeat(7); assert(bar == array!(int)(7, 7, 7, 7, 7, 7, 7)); writefln("Pass"); } /*********************************************************************************** * Build an array by repeating a smaller array. */ void repeatSub (T) (inout T[] haystack, inout T[] bale, size_t count) body { haystack.length = 0; for (int i; i < count; i++) { haystack ~= bale; } } unittest { writef("\t.repeatSub(T[], N) -> "); int[] foo , sub = array!(int)(4, 2) ; foo.repeatSub(sub, 3U); assert(foo == array!(int)(4, 2, 4, 2, 4, 2)); writefln("Pass"); } /*********************************************************************************** * Fill an array with a given smaller array. */ void fill (T) (inout T[] haystack, inout T[] bale, size_t len = size_t.max) body { if (len == size_t.max) { len = haystack.length; } haystack.length = 0; while (haystack.length < len) { haystack ~= bale; } haystack.length = len; } unittest { writef("\t.fill (T[] [, N]) --> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto sub = array!(int)(3, 2, 1 ); foo.fill(sub); assert(foo == array!(int)(3, 2, 1, 3, 2)); writefln("Pass"); } /*********************************************************************************** * Remove duplicate values from an array. */ void unique (T) (inout T[] haystack) body { size_t ridx ; for (int i = 0; i < haystack.length; i++) { ridx = size_t.max; while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) { haystack.removeIndex(ridx); } } } unittest { writef("\t.unique() ----------> "); auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5); foo.unique(); assert(foo == array!(int)(1, 2, 3, 4, 5)); writefln("Pass"); } /*********************************************************************************** * Pull a number of items from one array into another. */ T[] pull (T) (inout T[] haystack, size_t idx) body { T[] result ; if (idx >= haystack.length) { idx = haystack.length; } result = haystack[0 .. idx ]; haystack = haystack[idx .. haystack.length]; return result; } unittest { writef("\t.pull(N) -----------> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto bar = foo.pull(3U); assert(foo == array!(int)( 4, 5)); assert(bar == array!(int)(1, 2, 3 )); writefln("Pass"); } /*********************************************************************************** * Same as pull, but indexed from the end. */ T[] rpull (T) (inout T[] haystack, size_t idx) body { T[] result ; if (idx >= haystack.length) { result = haystack; haystack.length = 0; } else { idx--; result = haystack[idx .. haystack.length]; haystack = haystack[0 .. idx ]; } return result; } unittest { writef("\t.rpull(N) ----------> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto bar = foo.rpull(3U); assert(foo == array!(int)(1, 2 )); assert(bar == array!(int)( 3, 4, 5)); auto alpha = array!(int)(1, 2, 3); auto beta = alpha.rpull(4U); assert(alpha == array!(int)( )); assert(beta == array!(int)(1, 2, 3)); writefln("Pass"); } /*********************************************************************************** * Rotate an array's contents toward the left. */ void rotl (T) (inout T[] haystack, size_t iter) body { iter %= haystack.length; haystack = haystack[iter .. haystack.length] ~ haystack[0 .. iter]; } unittest { writef("\t.rotl(N) -----------> "); auto foo = array!(int)(1, 2, 3, 4, 5, 6); foo.rotl(2U); assert(foo == array!(int)(3, 4, 5, 6 ,1, 2)); auto bar = array!(int)(1, 2, 3); bar.rotl(4U); assert(bar == array!(int)(2, 3, 1)); writefln("Pass"); } /*********************************************************************************** * Rotate an array's contents toward the right. */ void rotr (T) (inout T[] haystack, size_t iter) body { size_t idx ; iter %= haystack.length ; idx = haystack.length - iter ; haystack = haystack[idx .. haystack.length] ~ haystack[0 .. idx]; } unittest { writef("\t.rotr(N) -----------> "); auto foo = array!(int)(1, 2, 3, 4, 5, 6); foo.rotr(2U); assert(foo == array!(int)(5, 6, 1, 2, 3, 4)); auto bar = array!(int)(1, 2, 3); bar.rotr(4U); assert(bar == array!(int)(3, 1, 2)); writefln("Pass"); } /*********************************************************************************** * Append to an array. */ void push (T) (inout T[] haystack, T[] bale ...) body { haystack ~= bale; } unittest { writef("\t.push(...) ---------> "); auto foo = array!(int)(1, 2, 3); foo.push(4, 5); assert(foo == array!(int)(1, 2, 3, 4, 5)); writefln("Pass"); } /*********************************************************************************** * Retrieve and remove the last value of an array. */ T pop (T) (inout T[] haystack) body { T result = haystack[haystack.length - 1]; haystack.length = haystack.length - 1; return result; } unittest { writef("\t.pop() -------------> "); auto foo = array!(int)(1, 2, 3); auto elm = foo.pop(); assert(foo == array!(int)(1, 2)); assert(elm == 3); writefln("Pass"); } /*********************************************************************************** * Prepend to an array. */ void backpush (T) (inout T[] haystack, T[] bale ...) body { haystack ~= bale; haystack.rotr(bale.length); } unittest { writef("\t.backpush(...) -----> "); auto foo = array!(int)(3, 4); foo.backpush(1, 2); assert(foo == array!(int)(1, 2, 3, 4)); writefln("Pass"); } /*********************************************************************************** * Retrieve and remove the first value of an array. */ T backpop (T) (inout T[] haystack) body { T result = haystack[0]; haystack = haystack[1 .. haystack.length]; return result; } unittest { writef("\t.backpop() ---------> "); auto foo = array!(int)(1, 2, 3); auto elm = foo.backpop(); assert(foo == array!(int)(2, 3)); assert(elm == 1); writefln("Pass"); } /*********************************************************************************** * Drop values from the beginning of an array. */ T[] shift (T) (inout T[] haystack, size_t count) body { T[] result ; if (count >= haystack.length) { result = haystack; haystack.length = 0; } else { result = haystack[0 .. count]; haystack = haystack[count .. haystack.length]; } return result; } unittest { writef("\t.shift(N) ----------> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto sub = foo.shift(3U); assert(foo == array!(int)( 4, 5)); assert(sub == array!(int)(1, 2, 3 )); writefln("Pass"); } /*********************************************************************************** * Drop values from the end of an array. */ T[] rshift (T) (inout T[] haystack, size_t count) body { T[] result ; if (count >= haystack.length) { result = haystack; haystack.length = 0; } else { result = haystack[haystack.length - count .. haystack.length]; haystack.length = haystack.length - count; } return result; } unittest { writef("\t.rshift(N) ---------> "); auto foo = array!(int)(1, 2, 3, 4, 5); auto sub = foo.rshift(3U); assert(foo == array!(int)(1, 2 )); assert(sub == array!(int)( 3, 4, 5)); writefln("Pass"); } /*********************************************************************************** * Unit tests support. */ unittest { writefln("Unittest: cashew.utils.array: end\n"); }
Phobos does need something like this.
Sep 13 2006
parent J Duncan <jtd514 nospam.ameritech.net> writes:
clayasaurus wrote:
 Phobos does need something like this.
very badly, everyone rolls their own so it would be nice to have stuff like that consolidated in the community.... I also think this would give a more 'complete' feeling to the built-in arrays
Sep 13 2006
prev sibling parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
Chris Nicholson-Sauls wrote:
 /***********************************************************************************
  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;
 
   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }
 
 /***********************************************************************************
  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;
 
   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
 
These algorithms can be implemented more efficiently. The running time for the intersect is O(n * m) (where n,m are the length of the two arrays). However, what about the case where alpha and beta are ordered? Here is a simple algorithm for getting the intersect: T[] intersect (T) (inout T[] alpha, inout T[] beta) body { /* Assume alpha, beta are ordered such that alpha[i] < alpha[i+1] * and beta[i] < beta[i+1] for all i. */ T[] result; size_t a_pos = 0; size_t b_pos = 0; while(a_pos < alpha.length && b_pos < beta.length) { if(alpha[a_pos] == beta[b_pos]) { result ~= alpha[a_pos]; ++a_pos; ++b_pos; } else if(alpha[a_pos] < beta[b_pos]) ++a_pos; else if(beta[b_pos] < alpha[a_pos]) ++b_pos; } return result; } The cost for this algorithm is only O(n + m). However, ordering the two arrays is effectively an O(nlogn + mlogm) operation. Therefore the running time for this modified algorithm is overall O(nlogn + mlogm). A similar argument can be made for the unique function.
Sep 14 2006
parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Mikola Lysenko wrote:
 Chris Nicholson-Sauls wrote:
 
 /***************************************************************
******************* 

  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;

   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }

 /***************************************************************
******************* 

  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;

   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
These algorithms can be implemented more efficiently. The running time for the intersect is O(n * m) (where n,m are the length of the two arrays). However, what about the case where alpha and beta are ordered? Here is a simple algorithm for getting the intersect: T[] intersect (T) (inout T[] alpha, inout T[] beta) body { /* Assume alpha, beta are ordered such that alpha[i] < alpha[i+1] * and beta[i] < beta[i+1] for all i. */ T[] result; size_t a_pos = 0; size_t b_pos = 0; while(a_pos < alpha.length && b_pos < beta.length) { if(alpha[a_pos] == beta[b_pos]) { result ~= alpha[a_pos]; ++a_pos; ++b_pos; } else if(alpha[a_pos] < beta[b_pos]) ++a_pos; else if(beta[b_pos] < alpha[a_pos]) ++b_pos; } return result; } The cost for this algorithm is only O(n + m). However, ordering the two arrays is effectively an O(nlogn + mlogm) operation. Therefore the running time for this modified algorithm is overall O(nlogn + mlogm). A similar argument can be made for the unique function.
True, and not a bad idea. I'm not sure how it would be with arrays of classes/structs which have no defined order. I'll try a rewrite and post it later. :) Maybe I should just ask for a DSource entry for Cashew. -- Chris Nicholson-Sauls
Sep 14 2006
parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Chris Nicholson-Sauls wrote:
 Mikola Lysenko wrote:
 
 Chris Nicholson-Sauls wrote:

 /***************************************************************
******************* 

  * Return an array composed of common elements of two arrays.
  */
 T[] intersect (T) (inout T[] alpha, inout T[] beta)
 body {
   T[] result;

   foreach (x; alpha) {
     if (beta.contains(x)) {
       result ~= x;
     }
   }
   return result;
 }
 unittest {
   writef("\t.intersect(T[]) ----> ");
   auto foo = array!(int)(1, 2, 3, 4, 5      );
   auto bar = array!(int)(      3, 4, 5, 6, 7);
   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
   writefln("Pass");
 }

 /***************************************************************
******************* 

  * Remove duplicate values from an array.
  */
 void unique (T) (inout T[] haystack)
 body {
   size_t ridx ;

   for (int i = 0; i < haystack.length; i++) {
     ridx = size_t.max;
     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
       haystack.removeIndex(ridx);
     }
   }
 }
 unittest {
   writef("\t.unique() ----------> ");
   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
   foo.unique();
   assert(foo == array!(int)(1, 2, 3, 4, 5));
   writefln("Pass");
 }
These algorithms can be implemented more efficiently. The running time for the intersect is O(n * m) (where n,m are the length of the two arrays). However, what about the case where alpha and beta are ordered? Here is a simple algorithm for getting the intersect: T[] intersect (T) (inout T[] alpha, inout T[] beta) body { /* Assume alpha, beta are ordered such that alpha[i] < alpha[i+1] * and beta[i] < beta[i+1] for all i. */ T[] result; size_t a_pos = 0; size_t b_pos = 0; while(a_pos < alpha.length && b_pos < beta.length) { if(alpha[a_pos] == beta[b_pos]) { result ~= alpha[a_pos]; ++a_pos; ++b_pos; } else if(alpha[a_pos] < beta[b_pos]) ++a_pos; else if(beta[b_pos] < alpha[a_pos]) ++b_pos; } return result; } The cost for this algorithm is only O(n + m). However, ordering the two arrays is effectively an O(nlogn + mlogm) operation. Therefore the running time for this modified algorithm is overall O(nlogn + mlogm). A similar argument can be made for the unique function.
True, and not a bad idea. I'm not sure how it would be with arrays of classes/structs which have no defined order. I'll try a rewrite and post it later. :) Maybe I should just ask for a DSource entry for Cashew. -- Chris Nicholson-Sauls
Well I just tried it and... woo, intersect is notably faster indeed. I did a benchmark with arrays of 1000 elements (random values), reset and intersected 1000 times, and the numbers are striking: <Benchmark old> 19.330000 <Benchmark new> 3.080000 I'll rewrite unique later, for now here's an attachment of the module with the rewritten intersect, which is actually almost identical to yours, so I'll be sure and include credit for it. :) -- Chris Nicholson-Sauls
Sep 14 2006
prev sibling next sibling parent "Nikita Kalaganov" <sapfirnik rambler.ru> writes:
On Wed, 13 Sep 2006 07:56:36 +0400, Chris Nicholson-Sauls  
<ibisbasenji gmail.com> wrote:

 I've been pushing for some array utilities to get into Phobos, yes, but  
 in the meantime I
 updated the array module in my personal Cashew lib.  Included in the  
 updates are the
 rotl/rotr that I recall someone asking about.
Ahh, thanks!
Sep 13 2006
prev sibling parent reply Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:
Chris Nicholson-Sauls wrote:
 I've been pushing for some array utilities to get into Phobos, yes, but 
 in the meantime I updated the array module in my personal Cashew lib.  
 Included in the updates are the rotl/rotr that I recall someone asking 
 about.  In the process I've discovered two bugs, as well: the behavior 
 of 'is' versus '==' is incompatable if the operands are arrays, and DDoc 
 for abbreviated function templates is borked.  For an example of the 
 latter, just look at Cashew's own docs.
 
 That said: if people think this is a decent collection, and Walter would 
 take it, I would be willing to release it to public domain for Phobos' 
 sake.
 
 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html
 
 -- Christopher Nicholson-Sauls
 
It looks good. Thanks for doing this. Some comments/questions: Why does indexOfSub have bale as 'inout', not simply 'in'? Same question for fill Couldn't indexOfSub be named indexOf? Same question for repeatSub and repeat. Also, thinking about its possible use as the starting point for a standard library, I've got some other thoughts for functions which are implemented in other array processing libraries (std.string, Oskar Linde's templated array functions, mintl.array...). Here is what I was thinking could be useful: splitting/joining: split join string processing: stripl/stripr/strip/chomp string matching/replacing: replace insert count squeeze // from std.string startsWith endsWith searching: findmax findmin find(delegate matches) findAll(delegate matches) I've written some code: T[][] split (T) (T[] arr, T[] delim ...) in { assert (delim.length > 0, "Splitting with no delimiters is useless"); } body { T[][] results; T[] token; foreach (i, x; arr) { if (delim.contains(x)) { if (token.length > 0) { results ~= token; token.length = 0; } } else { token ~= x; } } if (token.length > 0) results ~= token; return results; } unittest { writef("\t.split(...) --------> "); /+ auto foo = "a b cde fg"; auto result = foo.split(' '); assert(result == array!(char[])("a", "b", "cde", "fg")); +/ auto bar = "cashew casehew"; auto result2 = bar.split('s', 'h'); assert(result2 == array!(char[])("ca", "ew ca", "e", "ew")); writefln("Pass"); } Also, couldn't you avoid the temporary in removeAll by doing this: void removeAll (T) (inout T[] haystack, T needle) { size_t index = 0U; size_t indices; while ((index = haystack.indexOf(needle, index)) != size_t.max) { haystack.removeIndex(index); } } If you're interested, I'll write some more code. Cheers, Reiner
Sep 13 2006
next sibling parent Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:
Oh, and I forgot: a reverse iterator, like mintl.array has.
Sep 13 2006
prev sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:
 
 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For an 
 example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls
It looks good. Thanks for doing this. Some comments/questions: Why does indexOfSub have bale as 'inout', not simply 'in'? Same question for fill
Because I'm mildly paranoid and wanted to avoid memory allocation where I could. :) I do suppose it could safely be 'in' though.
 Couldn't indexOfSub be named indexOf?
 Same question for repeatSub and repeat.
That's a D limitation. At the moment, function templates cannot be overloaded. *snort stomp* Originally I /did/ have them named the same, but alas.
 Also, thinking about its possible use as the starting point for a 
 standard library, I've got some other thoughts for functions which are 
 implemented in other array processing libraries (std.string, Oskar 
 Linde's templated array functions, mintl.array...). Here is what I was 
 thinking could be useful:
 
 splitting/joining:
   split
   join
 
 string processing:
   stripl/stripr/strip/chomp
 
 string matching/replacing:
   replace
   insert
   count
   squeeze // from std.string
   startsWith
   endsWith
 
 searching:
   findmax
   findmin
   find(delegate matches)
   findAll(delegate matches)
 
 I've written some code:
 T[][] split (T) (T[] arr, T[] delim ...)
 in { assert (delim.length > 0, "Splitting with no delimiters is 
 useless"); }
 body
 {
   T[][] results;
   T[] token;
 
   foreach (i, x; arr)
   {
     if (delim.contains(x))
     {
       if (token.length > 0)
       {
          results ~= token;
          token.length = 0;
       }
     }
     else
     {
       token ~= x;
     }
   }
   if (token.length > 0) results ~= token;
   return results;
 }
 
 unittest {
   writef("\t.split(...) --------> ");
 /+  auto foo = "a b cde   fg";
   auto result = foo.split(' ');
   assert(result == array!(char[])("a", "b", "cde", "fg")); +/
 
   auto bar = "cashew casehew";
   auto result2 = bar.split('s', 'h');
   assert(result2 == array!(char[])("ca", "ew ca", "e", "ew"));
   writefln("Pass");
 }
 
 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;
 
   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }
That's exactly how it was written (once I'd modified .indexOf) but it doesn't act quite right for some reason. Until I figure out what the bug is, I figured I could slap together a less pretty but working version.
 If you're interested, I'll write some more code.
 
 Cheers,
 
 Reiner
-- Chris Nicholson-Sauls
Sep 14 2006
parent reply Reiner Pope <reiner.pope REMOVE.THIS.gmail.com> writes:
Chris Nicholson-Sauls wrote:
 
 
 Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall someone 
 asking about.  In the process I've discovered two bugs, as well: the 
 behavior of 'is' versus '==' is incompatable if the operands are 
 arrays, and DDoc for abbreviated function templates is borked.  For 
 an example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls
It looks good. Thanks for doing this. Some comments/questions: Why does indexOfSub have bale as 'inout', not simply 'in'? Same question for fill
Because I'm mildly paranoid and wanted to avoid memory allocation where I could. :) I do suppose it could safely be 'in' though.
Passing by 'inout' is probably less efficient and certainly doesn't save any memory allocation. 'inout' is equivalent to passing a pointer to the array reference, which itself contains a pointer to the data. The extra level of indirection can only add computation time.
 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;

   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }
That's exactly how it was written (once I'd modified .indexOf) but it doesn't act quite right for some reason. Until I figure out what the bug is, I figured I could slap together a less pretty but working version.
This passes the unit tests, though. Cheers, Reiner
Sep 14 2006
parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Reiner Pope wrote:
 Chris Nicholson-Sauls wrote:
 
 Reiner Pope wrote:

 Chris Nicholson-Sauls wrote:

 I've been pushing for some array utilities to get into Phobos, yes, 
 but in the meantime I updated the array module in my personal Cashew 
 lib.  Included in the updates are the rotl/rotr that I recall 
 someone asking about.  In the process I've discovered two bugs, as 
 well: the behavior of 'is' versus '==' is incompatable if the 
 operands are arrays, and DDoc for abbreviated function templates is 
 borked.  For an example of the latter, just look at Cashew's own docs.

 That said: if people think this is a decent collection, and Walter 
 would take it, I would be willing to release it to public domain for 
 Phobos' sake.

 The array module is attached, and the docs are at:
 http://www.codemeu.com:81/~pontiff/projects/cashew/doc/array.html

 -- Christopher Nicholson-Sauls
It looks good. Thanks for doing this. Some comments/questions: Why does indexOfSub have bale as 'inout', not simply 'in'? Same question for fill
Because I'm mildly paranoid and wanted to avoid memory allocation where I could. :) I do suppose it could safely be 'in' though.
Passing by 'inout' is probably less efficient and certainly doesn't save any memory allocation. 'inout' is equivalent to passing a pointer to the array reference, which itself contains a pointer to the data. The extra level of indirection can only add computation time.
Possibly true, so for now I took out some of the 'inout's.
 Also, couldn't you avoid the temporary in removeAll by doing this:
 void removeAll (T) (inout T[] haystack, T needle)
 {
   size_t index = 0U;
   size_t indices;

   while ((index = haystack.indexOf(needle, index)) != size_t.max) {
     haystack.removeIndex(index);
   }
 }
That's exactly how it was written (once I'd modified .indexOf) but it doesn't act quite right for some reason. Until I figure out what the bug is, I figured I could slap together a less pretty but working version.
This passes the unit tests, though. Cheers, Reiner
So it does... although I could swear that it didn't behave right before. Ah well, it works fine now. Also I've written a new version of .unique(), in the process of which I've added a .removeRange() function. A quick note on .removeRange() though: it mimics the slicing logic of D arrays. So, foo.removeRange(1, 4) actually removes [1, 2, 3], not [1, 2, 3, 4]. -- Chris Nicholson-Sauls
Sep 16 2006