www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.array suggestion

reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Hello,

With the new IFTI support I have been looking at ways of upgrading the 
standard library with new and more generic functions. What follows is my 
suggestions for functions to be added to std.array. I have implemented 
all of them and the current (0.149) limited IFTI-support is enough to 
support them. That being said, I wish to hold off making a source code 
submission until a) I get review comments on the suggested function 
prototypes and b) it is more clear how far D is taking IFTI support 
(meaning possibly neater implementation).

All functions are designed to be used both as free function and as 
implicit array methods. Except for the inplace versions, no functions 
modifies the array.

The prototype notation is my own. a|b means two alternative types. T is 
the generic element type. T[] is the array.

-----------

T fold(T[] arr, T init, T delegate|function combiner(T,T));

Generic array recursion function. combiner is called recursively:
	return combiner(init,fold(arr[1..$],arr[0],combiner));
(The actual implementation will of course call the combiner iteratively)


T max(T[] arr)

Returns the maximum element in arr as defined by the > operator.


T min(T[] arr)

Returns the minimum element in arr as defined by the < operator.


T sum(T[] arr)

Returns the sum of the element in arr as defined by the + operator.


ptrdiff_t find(T[] arr, T|delegate|function d);

Returns the index of the first occurence of d or the first true 
predicate d applied to the elements in order. Returns -1 when no element 
is found.


size_t indexOf(T[] arr, T|delegate|function d);

Like find, but throws on missing element.


T[][] split(T[] arr, T|T[]|delegate|function d);

Split the array arr using a predicate/element/subarray d.
(obsoletes std.string.split that only works for char[])


T[] join(T[][] arr, T|T[]|delegate|function separator);

Join the elements array arr using separator.
(obsoletes std.string.join that only works for char[])


T[] join(T[][] arr);

Join the array T[][] without separator.


U[] map(T[] arr, U delegate|function f(T));

Map the elements of arr over function f, returning an array of the results.


T[] filter(T[] arr, delegate|function p(T));

Filters arr over the predicate p, returning array of elements of arr 
where the predicate is true.


Possible inplace version of map:

T[] doMap(T[], T delegate|function f(T));

---------

I would also prefer those over the language built in .sort .reverse:

T[] sort(T[]);
T[] stableSort(T[]);
T[] sort(T[], delegate|function wrongOrder(T,T));
T[] reverse(T[]);


With the corresponding inplace versions:

T[] doSort(T[]);
T[] doStableSort(T[]);
T[] doSort(T[], delegate|function wrongOrder(T,T));
T[] doReverse(T[]);

---------

Is there in general even any interest in adding generic functions to the 
standard library?

Regards,

Oskar
Mar 09 2006
next sibling parent reply "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:dup4fr$2c0b$3 digitaldaemon.com...
 Hello,

 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. What follows is my 
 suggestions for functions to be added to std.array. I have implemented all 
 of them and the current (0.149) limited IFTI-support is enough to support 
 them. That being said, I wish to hold off making a source code submission 
 until a) I get review comments on the suggested function prototypes and b) 
 it is more clear how far D is taking IFTI support (meaning possibly neater 
 implementation).

 All functions are designed to be used both as free function and as 
 implicit array methods. Except for the inplace versions, no functions 
 modifies the array.

 The prototype notation is my own. a|b means two alternative types. T is 
 the generic element type. T[] is the array.

 -----------

 T fold(T[] arr, T init, T delegate|function combiner(T,T));

 Generic array recursion function. combiner is called recursively:
 return combiner(init,fold(arr[1..$],arr[0],combiner));
 (The actual implementation will of course call the combiner iteratively)


 T max(T[] arr)

 Returns the maximum element in arr as defined by the > operator.


 T min(T[] arr)

 Returns the minimum element in arr as defined by the < operator.


 T sum(T[] arr)

 Returns the sum of the element in arr as defined by the + operator.


 ptrdiff_t find(T[] arr, T|delegate|function d);

 Returns the index of the first occurence of d or the first true predicate 
 d applied to the elements in order. Returns -1 when no element is found.


 size_t indexOf(T[] arr, T|delegate|function d);

 Like find, but throws on missing element.


 T[][] split(T[] arr, T|T[]|delegate|function d);

 Split the array arr using a predicate/element/subarray d.
 (obsoletes std.string.split that only works for char[])


 T[] join(T[][] arr, T|T[]|delegate|function separator);

 Join the elements array arr using separator.
 (obsoletes std.string.join that only works for char[])


 T[] join(T[][] arr);

 Join the array T[][] without separator.


 U[] map(T[] arr, U delegate|function f(T));

 Map the elements of arr over function f, returning an array of the 
 results.


 T[] filter(T[] arr, delegate|function p(T));

 Filters arr over the predicate p, returning array of elements of arr where 
 the predicate is true.


 Possible inplace version of map:

 T[] doMap(T[], T delegate|function f(T));

 ---------

 I would also prefer those over the language built in .sort .reverse:

 T[] sort(T[]);
 T[] stableSort(T[]);
 T[] sort(T[], delegate|function wrongOrder(T,T));
 T[] reverse(T[]);


 With the corresponding inplace versions:

 T[] doSort(T[]);
 T[] doStableSort(T[]);
 T[] doSort(T[], delegate|function wrongOrder(T,T));
 T[] doReverse(T[]);

 ---------

 Is there in general even any interest in adding generic functions to the 
 standard library?

 Regards,

 Oskar
I like it. It would be especially cool if we could get rid of the necessary () after each call when using property syntax, thus making truely plugable properties.
Mar 09 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Ameer Armaly wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dup4fr$2c0b$3 digitaldaemon.com...
 Hello,

 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. What follows is my 
 suggestions for functions to be added to std.array. I have implemented all 
 of them and the current (0.149) limited IFTI-support is enough to support 
 them. That being said, I wish to hold off making a source code submission 
 until a) I get review comments on the suggested function prototypes and b) 
 it is more clear how far D is taking IFTI support (meaning possibly neater 
 implementation).

 All functions are designed to be used both as free function and as 
 implicit array methods. Except for the inplace versions, no functions 
 modifies the array.

 The prototype notation is my own. a|b means two alternative types. T is 
 the generic element type. T[] is the array.

 -----------

 T fold(T[] arr, T init, T delegate|function combiner(T,T));

 Generic array recursion function. combiner is called recursively:
 return combiner(init,fold(arr[1..$],arr[0],combiner));
 (The actual implementation will of course call the combiner iteratively)


 T max(T[] arr)

 Returns the maximum element in arr as defined by the > operator.


 T min(T[] arr)

 Returns the minimum element in arr as defined by the < operator.


 T sum(T[] arr)

 Returns the sum of the element in arr as defined by the + operator.


 ptrdiff_t find(T[] arr, T|delegate|function d);

 Returns the index of the first occurence of d or the first true predicate 
 d applied to the elements in order. Returns -1 when no element is found.


 size_t indexOf(T[] arr, T|delegate|function d);

 Like find, but throws on missing element.


 T[][] split(T[] arr, T|T[]|delegate|function d);

 Split the array arr using a predicate/element/subarray d.
 (obsoletes std.string.split that only works for char[])


 T[] join(T[][] arr, T|T[]|delegate|function separator);

 Join the elements array arr using separator.
 (obsoletes std.string.join that only works for char[])


 T[] join(T[][] arr);

 Join the array T[][] without separator.


 U[] map(T[] arr, U delegate|function f(T));

 Map the elements of arr over function f, returning an array of the 
 results.


 T[] filter(T[] arr, delegate|function p(T));

 Filters arr over the predicate p, returning array of elements of arr where 
 the predicate is true.


 Possible inplace version of map:

 T[] doMap(T[], T delegate|function f(T));

 ---------

 I would also prefer those over the language built in .sort .reverse:

 T[] sort(T[]);
 T[] stableSort(T[]);
 T[] sort(T[], delegate|function wrongOrder(T,T));
 T[] reverse(T[]);


 With the corresponding inplace versions:

 T[] doSort(T[]);
 T[] doStableSort(T[]);
 T[] doSort(T[], delegate|function wrongOrder(T,T));
 T[] doReverse(T[]);

 ---------

 Is there in general even any interest in adding generic functions to the 
 standard library?

 Regards,

 Oskar
Upon rereading, I realized that the inplace versions should be void functions - not returning an array.
 I like it.  It would be especially cool if we could get rid of the necessary 
 () after each call when using property syntax, thus making truely plugable 
 properties. 
Yes, I agree. I would like to know if all pairs of empty parentheses after functions are supposed to be redundant or if calls without parentheses should be reserved to property like methods. Considering the current .sort and .reverse semantics, I guess the former is the case and DMD not allowing calls without parentheses for implicit array methods is an unintentional omission. /Oskar
Mar 09 2006
parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Oskar Linde wrote:

...

Totally agree that something like this (and more) should be in the 
standard library.

 
 Upon rereading, I realized that the inplace versions should be void 
 functions - not returning an array.
Not sure about this one. Returning an array allows chaining: array.doMap(someDelegate).doSort(); ...
 
 I like it.  It would be especially cool if we could get rid of the 
 necessary () after each call when using property syntax, thus making 
 truely plugable properties. 
Yes, I agree. I would like to know if all pairs of empty parentheses after functions are supposed to be redundant or if calls without parentheses should be reserved to property like methods. Considering the current .sort and .reverse semantics, I guess the former is the case and DMD not allowing calls without parentheses for implicit array methods is an unintentional omission. /Oskar
Mar 09 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Ivan Senji wrote:
 Oskar Linde wrote:
 
 ...
 
 Totally agree that something like this (and more) should be in the 
 standard library.
Please elaborate on (and more).
 Upon rereading, I realized that the inplace versions should be void 
 functions - not returning an array.
Not sure about this one. Returning an array allows chaining: array.doMap(someDelegate).doSort();
Yes, but void removes the chance of using inplace versions by mistake and makes it extra obvious that they are not ordinary functions. I think this is more important than the ability of chaining. /Oskar
Mar 09 2006
parent Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Oskar Linde wrote:
 Ivan Senji wrote:
 
 Oskar Linde wrote:

 ...

 Totally agree that something like this (and more) should be in the 
 standard library.
Please elaborate on (and more).
:P I knew you were going to say that. I was just trying to say that the new features open up a lot of possibilites. (not trying to say you didn't to a great job, more that there is more that could be done in a standard library) No ideas at the moment, but when I think of something I'll let you know.
 Upon rereading, I realized that the inplace versions should be void 
 functions - not returning an array.
Not sure about this one. Returning an array allows chaining: array.doMap(someDelegate).doSort();
Yes, but void removes the chance of using inplace versions by mistake and makes it extra obvious that they are not ordinary functions. I think this is more important than the ability of chaining.
What you say does make sense. But I don't think it would be souch a big problem. It is always safe to use ordinary array functions, and it would be nice to be able to just replace them with inplace ones for performance reasons if the original array is not needed any more. So if I have: array.map(someDelegate).sort(); and I figure out that it could be optimized because I don'n need the original array I could just change the functions used. But if the general consensus would be for inplace functions to return void I would have to agree :)
Mar 09 2006
prev sibling next sibling parent reply Don Clugston <dac nospam.com.au> writes:
Oskar Linde wrote:
 Hello,
 
 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. 
[snip]
 Is there in general even any interest in adding generic functions to the
 standard library?
Some of these functions are the last thing remaining in std.math2 (but they definitely don't below there, none of them are truly mathematical). We definitely want to remove std.math2 prior to 1.0, std.array sounds good to me. I'll just comment on one function:
 T sum(T[] arr)

 Returns the sum of the element in arr as defined by the + operator.
This one has an interesting twist. I'm not sure that the sum should necessarily be of type T. I've been playing around with the concept of what I've called the Archetype of a type, which is the largest type with the same semantics as T (possibly with the same calculation speed). (ie, Archetype!(byte)= Archetype!(short) Archetype!(int) = long Archetype!(float) = Archetype!(double) = real, Archetype!(cfloat)= creal, etc). Obviously it's a trivial template. I think that at least, sum(double[] ) should internally use a real while accumulating the sum, so that it can satisfy this test (at least on x86 platforms): unittest { const double a = [ double.max, double.max, -double.max]; assert(sum(a) == double.max); } After all, this is one of the reasons why reals exist. I'm still not sure if sum(double []) should return a double or a real, although I'm inclined to think that *any* function that returns a single floating point value should return a real (on x87, the 80-bit result is just left on the FPU stack anyway). But, I'm less confident about how a sum of ints should behave. However, all the other functions seem to be free of mathematical subtleties. sum() is the only one which involves arithmetic operators, and therefore it might not belong with the rest.
Mar 09 2006
next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Don Clugston wrote:
 Oskar Linde wrote:
 I'll just comment on one function:
 
  > T sum(T[] arr)
  >
  > Returns the sum of the element in arr as defined by the + operator.
 
 This one has an interesting twist. I'm not sure that the sum should 
 necessarily be of type T. I've been playing around with the concept of 
 what I've called the Archetype of a type, which is the largest type with 
 the same semantics as T (possibly with the same calculation speed). (ie,
 Archetype!(byte)= Archetype!(short)
 Archetype!(int) = long
 Archetype!(float) = Archetype!(double) = real,
 Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.
Interesting...
 I think that at least,  sum(double[] ) should internally use a real 
 while accumulating the sum, so that it can satisfy this test (at least 
 on x86 platforms):
 
 unittest {
    const double a = [ double.max, double.max, -double.max];
    assert(sum(a) == double.max);
 }
 
 After all, this is one of the reasons why reals exist. I'm still not 
 sure if sum(double []) should return a double or a real, although I'm 
 inclined to think that *any* function that returns a single floating 
 point value should return a real (on x87, the 80-bit result is just left 
 on the FPU stack anyway). But, I'm less confident about how a sum of 
 ints should behave.
Int overflows is well defined, making the sum of ints behave correctly in the case above. Int is also the promotion type for integral operations and would be the natural return type for sum(int[]). A sum of shorts should probably return an int though, following integer promotion rules. Should Archetype for integers follow the integer promotion rules or all be long?
 However, all the other functions seem to be free of mathematical 
 subtleties. sum() is the only one which involves arithmetic operators, 
 and therefore it might not belong with the rest.
You make a strong argument and I agree. sum() is used as join() in other languages that lack the distinction between addition and concatenation. D doesn't need a non-arithmetic sum function. With the proposed fold function, you could also easily implement sum as: sum = fold(arr,0,int function(int a, int b) { return a+b; }); prod = fold(arr,1,int function(int a, int b) { return a*b; }); Or generic, using Archetype: fold(arr,0,Archetype!(T) function(Archetype!(T) a, T b) {return a+b;}); /Oskar
Mar 09 2006
next sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Oskar Linde wrote:
 sum = fold(arr,0,int function(int a, int b) { return a+b; });
 prod = fold(arr,1,int function(int a, int b) { return a*b; });
 fold(arr,0,Archetype!(T) function(Archetype!(T) a, T b) {return a+b;});
Oops, those should of course read: sum = fold(arr,0, function int(int a, int b) { return a+b; }); prod = fold(arr,1, function int(int a, int b) { return a*b; }); fold(arr,0, function Archetype!(T)(Archetype!(T) a, T b) {return a+b;}); /Oskar
Mar 09 2006
prev sibling parent Don Clugston <dac nospam.com.au> writes:
Oskar Linde wrote:
 Don Clugston wrote:
 Oskar Linde wrote:
 I'll just comment on one function:

  > T sum(T[] arr)
  >
  > Returns the sum of the element in arr as defined by the + operator.

 This one has an interesting twist. I'm not sure that the sum should 
 necessarily be of type T. I've been playing around with the concept of 
 what I've called the Archetype of a type, which is the largest type 
 with the same semantics as T (possibly with the same calculation 
 speed). (ie,
 Archetype!(byte)= Archetype!(short)
 Archetype!(int) = long
 Archetype!(float) = Archetype!(double) = real,
 Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.
Interesting...
 I think that at least,  sum(double[] ) should internally use a real 
 while accumulating the sum, so that it can satisfy this test (at least 
 on x86 platforms):

 unittest {
    const double a = [ double.max, double.max, -double.max];
    assert(sum(a) == double.max);
 }

 After all, this is one of the reasons why reals exist. I'm still not 
 sure if sum(double []) should return a double or a real, although I'm 
 inclined to think that *any* function that returns a single floating 
 point value should return a real (on x87, the 80-bit result is just 
 left on the FPU stack anyway). But, I'm less confident about how a sum 
 of ints should behave.
Int overflows is well defined, making the sum of ints behave correctly in the case above. Int is also the promotion type for integral operations and would be the natural return type for sum(int[]). A sum of shorts should probably return an int though, following integer promotion rules. Should Archetype for integers follow the integer promotion rules or all be long?
I'm really not sure. But you make a good point -- if it mimics the promotion rules, it's easy to justify the choice. There's an implementation of Archetype!() with a sum!() that behaves in this way at http://svn.dsource.org/projects/mathextra/trunk/mathtempl.d I also have a sumList(a[]...) implementation there, but it's just an experiment. Doesn't work too well with the current IFTI limitations. If IFTI worked for ... arguments, we could just write template min(T) { T min(T[]...) { } } and then it would work for both: int z = 4; int [] a = [3, 5, 67]; int x = min(5, 6, z, 2, 9); int y = min(a); but unfortunately ... arguments don't let you do the implicit property trick. So I don't know if it's a good idea or not.
 However, all the other functions seem to be free of mathematical 
 subtleties. sum() is the only one which involves arithmetic operators, 
 and therefore it might not belong with the rest.
You make a strong argument and I agree. sum() is used as join() in other languages that lack the distinction between addition and concatenation. D doesn't need a non-arithmetic sum function.
The ~ operator was a stroke of genius. It has many wonderful consequences.
Mar 09 2006
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
Don Clugston wrote:
 Oskar Linde wrote:
 Hello,

 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. 
[snip] > Is there in general even any interest in adding generic functions to the > standard library? Some of these functions are the last thing remaining in std.math2 (but they definitely don't below there, none of them are truly mathematical). We definitely want to remove std.math2 prior to 1.0, std.array sounds good to me. I'll just comment on one function: > T sum(T[] arr) > > Returns the sum of the element in arr as defined by the + operator. This one has an interesting twist. I'm not sure that the sum should necessarily be of type T. I've been playing around with the concept of what I've called the Archetype of a type, which is the largest type with the same semantics as T (possibly with the same calculation speed). (ie, Archetype!(byte)= Archetype!(short) Archetype!(int) = long Archetype!(float) = Archetype!(double) = real, Archetype!(cfloat)= creal, etc). Obviously it's a trivial template.
Interesting idea. I've used "LargerOf" templates for min/max comparisons before: template min(T,U) { LargerOf!(T,U) min( T t, U u ) {} } but the idea of Archetypes is a nice abstraction of this idea.
 I think that at least,  sum(double[] ) should internally use a real 
 while accumulating the sum, so that it can satisfy this test (at least 
 on x86 platforms)
Agreed.
 However, all the other functions seem to be free of mathematical 
 subtleties. sum() is the only one which involves arithmetic operators, 
 and therefore it might not belong with the rest.
sum almost seems out of place here, except that it's an array algorithm. I wonder if it might be more appropriate to create a numerics module for this sort of thing? Sean
Mar 09 2006
prev sibling parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <dup8rj$68s$1 digitaldaemon.com>, Don Clugston says...
Oskar Linde wrote:
 Hello,
 
 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. 
[snip]
 Is there in general even any interest in adding generic functions to the
 standard library?
Some of these functions are the last thing remaining in std.math2 (but they definitely don't below there, none of them are truly mathematical). We definitely want to remove std.math2 prior to 1.0, std.array sounds good to me. I'll just comment on one function:
 T sum(T[] arr)

 Returns the sum of the element in arr as defined by the + operator.
This one has an interesting twist. I'm not sure that the sum should necessarily be of type T. I've been playing around with the concept of what I've called the Archetype of a type, which is the largest type with the same semantics as T (possibly with the same calculation speed). (ie, Archetype!(byte)= Archetype!(short) Archetype!(int) = long Archetype!(float) = Archetype!(double) = real, Archetype!(cfloat)= creal, etc). Obviously it's a trivial template. I think that at least, sum(double[] ) should internally use a real while accumulating the sum, so that it can satisfy this test (at least on x86 platforms): unittest { const double a = [ double.max, double.max, -double.max]; assert(sum(a) == double.max); } After all, this is one of the reasons why reals exist. I'm still not sure if sum(double []) should return a double or a real, although I'm inclined to think that *any* function that returns a single floating point value should return a real (on x87, the 80-bit result is just left on the FPU stack anyway). But, I'm less confident about how a sum of ints should behave. However, all the other functions seem to be free of mathematical subtleties. sum() is the only one which involves arithmetic operators, and therefore it might not belong with the rest.
I would prefer a syntax like this, instead of (or in addition to) the archetype version: void sum(in T1 a, out T2 b) This way, I can sum ints into a double or doubles into a float if that's what I really want to do. A user who is worried about overflow of int, can sum into a long. Otherwise, most users will do a cast of the return value back to T, which means there is cast proliferation. One can imagine a "product" version: void product(int T1 a, out T2 b); In this case, "long" won't cut it for many cases - but a user can supply a variable length integer type of their own design, so long as it provides a method like "opAddAssign(int foo)". Also note: I deliberately use T1 instead of T1[] here. I think the container should not be restrained to a regular array. This code (template syntax omitted): : void sum(T1 inp, T2 outp) : { : foreach(auto sub; inp) { : outp += sub; : } : } . is more flexible than: : void sum(T1[] inp, T2 outp) : { : foreach(T1 sub; inp) { : outp += sub; : } : } Because it works with any class that can do opApply(), not just a vanilla array. Kevin
Mar 09 2006
parent Sean Kelly <sean f4.ca> writes:
Kevin Bealer wrote:
 
 Also note:  I deliberately use T1 instead of T1[] here.  I think the container
 should not be restrained to a regular array.  This code (template syntax
 omitted):
 
 : void sum(T1 inp, T2 outp)
 : {
 :     foreach(auto sub; inp) {
 :         outp += sub;
 :     }
 : }
 
 . is more flexible than:
 
 : void sum(T1[] inp, T2 outp)
 : {
 :     foreach(T1 sub; inp) {
 :         outp += sub;
 :     }
 : }
 
 Because it works with any class that can do opApply(), not just a vanilla
array.
I agree. However, opApply ranges don't support random access, so I think there's value in providing an array-specific overload. find() is a good example here, as the opApply version would need to accept a delegate(size_t,inout T) and fewer optimizations would be possible. Sean
Mar 09 2006
prev sibling next sibling parent reply "John C" <johnch_atms hotmail.com> writes:
"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:dup4fr$2c0b$3 digitaldaemon.com...
 With the corresponding inplace versions:

 T[] doSort(T[]);
 T[] doStableSort(T[]);
 T[] doSort(T[], delegate|function wrongOrder(T,T));
 T[] doReverse(T[]);
My only reservation would be the decision to prefix these functions with "do". It's tautological and doesn't express anything about them being in-place versions. I suggest just calling them sortInPlace/inPlaceSort (or even sortInSitu or inSituSort).
Mar 09 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
John C wrote:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:dup4fr$2c0b$3 digitaldaemon.com...
 With the corresponding inplace versions:

 T[] doSort(T[]);
 T[] doStableSort(T[]);
 T[] doSort(T[], delegate|function wrongOrder(T,T));
 T[] doReverse(T[]);
My only reservation would be the decision to prefix these functions with "do". It's tautological and doesn't express anything about them being in-place versions. I suggest just calling them sortInPlace/inPlaceSort (or even sortInSitu or inSituSort).
I have not been giving much thought to the naming of the in-place versions. I briefly considered sortInPlace and inPlaceSort, but they appeared too unwieldy. "do" was meant to signal the side-effect by implying that the function "does" something, rather than just being a definition of a mapping of values onto other values. (A void return type would help too.) If I were a native English speaker, I might have felt differently about the "do" prefix though. The general idea is to make the default (short) form not have side-effects. (Which I believe reduces surprises). Otherwise, I guess you could have named the non-in-place versions sorted() filtered() mapped() reversed() etc. Before trying to justify the naming, I would need to justify the decision to make non-in-place versions the default. This is really a philosophical question, but I think the most important thing is to be consistent throughout the library. This choice seems to go hand in hand with the Phobos philosophy of using copy-on-write as often as possible. There are also plenty of functions that behave like this: std.string.tolower, toupper, capitalize, capwords, ljustify and replace to name a few. On the other hand, we have the in-place .sort and .reverse that behaves contrary to this. I'm not really set on any naming and appreciate all suggestions. /Oskar
Mar 09 2006
prev sibling next sibling parent reply David Medlock <noone nowhere.com> writes:
Oskar Linde wrote:
<snip>
 Oskar
I would also add: T[] left( T[] array, int n ) T[] right( T[] array, int n ) T[] skip( T[] array, int n ) In my own personal parsing stuff I regularly use the above plus: T[] countUntil( char[] str, bool delegate(char) pred ); T[] countWhile( char[] str, bool delegate(char) pred ); So skipping whitespace becomes: text = skip( text.countWhile( delegate bool(char c){ c<=32; } ) ); -DavidM
Mar 09 2006
parent reply Oskar Linde <olREM OVEnada.kth.se> writes:
David Medlock wrote:

 Oskar Linde wrote:
 <snip>
 I would also add:
I think the following are redundant as functions as they all have an equivalent splice syntax:
 T[]  left( T[] array, int n )
identical to array[0..n];
 T[]  right( T[] array, int n )
identical to array[$-n..$];
 T[]  skip( T[] array, int n )
identical to array[n..$];
 In my own personal parsing stuff I regularly use the above plus:
 
 T[]  countUntil( char[] str, bool delegate(char) pred );
 T[]  countWhile( char[] str, bool delegate(char) pred );
I assume those should return a size_t and not a T[]? If I understand correctly, those functions would be similar to find, but return arr.length instead of -1 when no matching element is found? (The second one would also invert the predicate.)
 So skipping whitespace becomes:
 
 text = skip( text.countWhile( delegate bool(char c){ c<=32; } ) );
/Oskar
Mar 09 2006
parent reply David Medlock <noone nowhere.com> writes:
Oskar Linde wrote:
 David Medlock wrote:
 
 
Oskar Linde wrote:
<snip>
I would also add:
I think the following are redundant as functions as they all have an equivalent splice syntax:
T[]  left( T[] array, int n )
identical to array[0..n];
T[]  right( T[] array, int n )
identical to array[$-n..$];
T[]  skip( T[] array, int n )
identical to array[n..$];
Yes but the last two avoid the hideous Perl-like $...hehe. These 3 are just more readable to me, doesnt matter though.
 
In my own personal parsing stuff I regularly use the above plus:

T[]  countUntil( char[] str, bool delegate(char) pred );
T[]  countWhile( char[] str, bool delegate(char) pred );
I assume those should return a size_t and not a T[]? If I understand correctly, those functions would be similar to find, but return arr.length instead of -1 when no matching element is found? (The second one would also invert the predicate.)
Actually I copied the wrong utility functions. The countWhile/countUntil do return size_t/uint. The ones I meant to post are skipWhile/skipUntil which do return a slice of the T[].
 
So skipping whitespace becomes:

text = skip( text.countWhile( delegate bool(char c){ c<=32; } ) );
should be: text = text.skip( text.countWhile( delegate bool(char c){ c<=32; } ) ); -DavidM
Mar 10 2006
parent David Medlock <noone nowhere.com> writes:
David Medlock wrote:

 should be:
 
 text = text.skip( text.countWhile( delegate bool(char c){ c<=32; } ) );
 -DavidM
Oops again. should be: text = text.skipWhile( delegate bool(char c){ c<=32; } ); arrgh. -DavidM
Mar 10 2006
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Oskar Linde wrote:
<snip>
 T[] doSort(T[], delegate|function wrongOrder(T,T));
<snip> You've left off the return type of wrongOrder. Moreover, why the name wrongOrder? To simply return whether or not these two elements are out of order? Why not use what people are used to, i.e. a function that returns -ve, 0 or +ve for <, = or > respectively? Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 09 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Stewart Gordon wrote:
 Oskar Linde wrote:
 <snip>
 T[] doSort(T[], delegate|function wrongOrder(T,T));
<snip> You've left off the return type of wrongOrder. Moreover, why the name wrongOrder? To simply return whether or not these two elements are out of order? Why not use what people are used to, i.e. a function that returns -ve, 0 or +ve for <, = or > respectively?
The return value would be anything usable as a condition in a conditional function if(wrongOrder(a,b)) {...} The reason for not having -ve,0,+ve is that you only need a binary predicate for sorting. It is also in many cases simpler to define a ordering predicate than a 3-valued ordering. The name wrongOrder just helps as a reminder of what the predicate should return. names.sort(delegate bool(char[] a, char[] b) { return a > b; }); would sort the names in alphabetical order. To sort in reverse alphabetical order: names.sort(delegate bool(char[] a, char[] b) { return a < b; }); records.sort(delegate bool(Person a, Person b) { return a.name < b.name;}); /Oskar
Mar 09 2006
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Oskar Linde wrote:
 Stewart Gordon wrote:
 Oskar Linde wrote:
 <snip>
 T[] doSort(T[], delegate|function wrongOrder(T,T));
<snip> You've left off the return type of wrongOrder. Moreover, why the name wrongOrder? To simply return whether or not these two elements are out of order? Why not use what people are used to, i.e. a function that returns -ve, 0 or +ve for <, = or > respectively?
The return value would be anything usable as a condition in a conditional function if(wrongOrder(a,b)) {...} The reason for not having -ve,0,+ve is that you only need a binary predicate for sorting.
But depending on the sorting algorithm, I can imagine that either may be more efficient. Quite a lot of things at the moment rely on a three-valued ordering. C: strcmp, memcmp, qsort Java: compareTo (sic), Comparator D: opCmp, std.string.cmp Even if you do have it documented that doSort is unusual, somebody might either forget this or be already thinking on the three-valued terms when they see code that uses it.
 It is also in many cases simpler to define a 
 ordering predicate than a 3-valued ordering. The name wrongOrder just 
 helps as a reminder of what the predicate should return.
<snip> If that's so, I wonder why so many things have stuck with the three-valued ordering. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 09 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Stewart Gordon wrote:
 Oskar Linde wrote:
 It is also in many cases simpler to define a ordering predicate than a 
 3-valued ordering. The name wrongOrder just helps as a reminder of 
 what the predicate should return.
<snip> If that's so, I wonder why so many things have stuck with the three-valued ordering.
A comparison operator is not the same thing as an ordering predicate. Consider sorting phone-book entries (Sorted by first city, then surname and finally first name): bool phoneBookOrder(Record a, Record b) { return a.city > b.city || a.surname > b.surname || a.name > b.name; } Forcing the user to write three-valued ordering functions for this is both unnecessary (the sorthing function will not use all 3 values), harder to get right (try it yourself) and probably less efficient. Instead of wrongOrder, I could name the predicate lessThan and swap its arguments. This is what C++ STL uses and may be more logical. /Oskar
Mar 10 2006
next sibling parent Don Clugston <dac nospam.com.au> writes:
Oskar Linde wrote:
 Stewart Gordon wrote:
 Oskar Linde wrote:
 It is also in many cases simpler to define a ordering predicate than 
 a 3-valued ordering. The name wrongOrder just helps as a reminder of 
 what the predicate should return.
<snip> If that's so, I wonder why so many things have stuck with the three-valued ordering.
A comparison operator is not the same thing as an ordering predicate. Consider sorting phone-book entries (Sorted by first city, then surname and finally first name): bool phoneBookOrder(Record a, Record b) { return a.city > b.city || a.surname > b.surname || a.name > b.name; } Forcing the user to write three-valued ordering functions for this is both unnecessary (the sorthing function will not use all 3 values), harder to get right (try it yourself) and probably less efficient. Instead of wrongOrder, I could name the predicate lessThan and swap its arguments. This is what C++ STL uses and may be more logical.
I like wrongOrder. It clearly states that it's a < rather than a <= comparison. Also, 'lessThan' sounds rather formal, whereas wrongOrder sounds more general.
Mar 10 2006
prev sibling parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <dus0hn$2tb0$1 digitaldaemon.com>, Oskar Linde says...
Stewart Gordon wrote:
 Oskar Linde wrote:
 It is also in many cases simpler to define a ordering predicate than a 
 3-valued ordering. The name wrongOrder just helps as a reminder of 
 what the predicate should return.
<snip> If that's so, I wonder why so many things have stuck with the three-valued ordering.
A comparison operator is not the same thing as an ordering predicate. Consider sorting phone-book entries (Sorted by first city, then surname and finally first name): bool phoneBookOrder(Record a, Record b) { return a.city > b.city || a.surname > b.surname || a.name > b.name; } Forcing the user to write three-valued ordering functions for this is both unnecessary (the sorthing function will not use all 3 values), harder to get right (try it yourself) and probably less efficient. Instead of wrongOrder, I could name the predicate lessThan and swap its arguments. This is what C++ STL uses and may be more logical. /Oskar
This contains a bug. It should be like this:
bool phoneBookOrder(Record a, Record b) {
    if (a.city < b.city) return false;
    if (a.city > b.city) return true;

    if (a.surname < b.surname) return false;
    if (a.surname > b.surname) return true;

    return a.name > b.name;
}
.. Otherwise, (a < b) && (b < a) is possible, i.e. when the cities are equal. Kevin
Mar 10 2006
parent Oskar Linde <olREM OVEnada.kth.se> writes:
Kevin Bealer wrote:

 In article <dus0hn$2tb0$1 digitaldaemon.com>, Oskar Linde says...
Consider sorting phone-book entries (Sorted by first city, then surname
and finally first name):

bool phoneBookOrder(Record a, Record b) {
return a.city > b.city ||
                a.surname > b.surname ||
                a.name > b.name;
}
 This contains a bug.  It should be like this:
 
bool phoneBookOrder(Record a, Record b) {
    if (a.city < b.city) return false;
    if (a.city > b.city) return true;

    if (a.surname < b.surname) return false;
    if (a.surname > b.surname) return true;

    return a.name > b.name;
}
.. Otherwise, (a < b) && (b < a) is possible, i.e. when the cities are equal.
You are correct. How embarassing. I was sure I had read the above example somewhere, so I didn't care to think before typing it in... /Oskar
Mar 10 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Hello,
 
 With the new IFTI support I have been looking at ways of upgrading the 
 standard library with new and more generic functions. What follows is my 
 suggestions for functions to be added to std.array. I have implemented 
 all of them and the current (0.149) limited IFTI-support is enough to 
 support them. That being said, I wish to hold off making a source code 
 submission until a) I get review comments on the suggested function 
 prototypes and b) it is more clear how far D is taking IFTI support 
 (meaning possibly neater implementation).
Just a few quick comments, as I've begun thinking about this for Ares as well.
 All functions are designed to be used both as free function and as 
 implicit array methods. Except for the inplace versions, no functions 
 modifies the array.
 
 The prototype notation is my own. a|b means two alternative types. T is 
 the generic element type. T[] is the array.
To those who are unfamiliar with ITI, it's worth noting that any template parameters that cannot be determined implicitly should be placed at the beginning of the template parameter list. Doing so should allow this to be possible: template func( Ret, Val ) { Ret func( Val val ) {} } int i; char c = func!(char)( i ); This isn't implemented yet, but I suspect it will be soon.
 -----------
 
 T fold(T[] arr, T init, T delegate|function combiner(T,T));
 
 Generic array recursion function. combiner is called recursively:
     return combiner(init,fold(arr[1..$],arr[0],combiner));
 (The actual implementation will of course call the combiner iteratively)
 
 
 T max(T[] arr)
 
 Returns the maximum element in arr as defined by the > operator.
 
 
 T min(T[] arr)
 
 Returns the minimum element in arr as defined by the < operator.
min and max should probably allow a comparison predicate to be supplied as well. Thus the declarations would be something like this: T min(T[] arr, P pred = &less!(T)); I'm also hoping the above syntax will actually work, and that P will resolve to being a function pointer by default.
 T sum(T[] arr)
 
 Returns the sum of the element in arr as defined by the + operator.

 ptrdiff_t find(T[] arr, T|delegate|function d);
 
 Returns the index of the first occurence of d or the first true 
 predicate d applied to the elements in order. Returns -1 when no element 
 is found.
I suggest returning size_t instead and using size_t.max in place of -1. The two are basically equivalent, but using size_t offers a bit more range in legal array sizes as the flag value occupies the least significant bit rather than the most significant. I also almost suggested adding a "substring" find function, except that seems more appropriate for a std.string module. Have you thought about how the two will overlap? I'm not entirely certain how template overloading will work for functions defined in different modules. In fact, if this is indeed a limitation them it might be prudent to put all such algorithms in std.array instead of splitting them between std.array and std.string (I just thought of this--I had been considering multiple modules for overloads in Ares, and I suspect this won't work).
 size_t indexOf(T[] arr, T|delegate|function d);
 
 Like find, but throws on missing element.
 
 
 T[][] split(T[] arr, T|T[]|delegate|function d);
 
 Split the array arr using a predicate/element/subarray d.
 (obsoletes std.string.split that only works for char[])
 
 
 T[] join(T[][] arr, T|T[]|delegate|function separator);
 
 Join the elements array arr using separator.
 (obsoletes std.string.join that only works for char[])
How do you envision the delegate version being applied? Is it expected to return an element/array for a separator?
 T[] join(T[][] arr);
 
 Join the array T[][] without separator.
 
 
 U[] map(T[] arr, U delegate|function f(T));
 
 Map the elements of arr over function f, returning an array of the results.
 
 
 T[] filter(T[] arr, delegate|function p(T));
 
 Filters arr over the predicate p, returning array of elements of arr 
 where the predicate is true.
 
 
 Possible inplace version of map:
 
 T[] doMap(T[], T delegate|function f(T));
 
 ---------
 
 I would also prefer those over the language built in .sort .reverse:
 
 T[] sort(T[]);
 T[] stableSort(T[]);
 T[] sort(T[], delegate|function wrongOrder(T,T));
 T[] reverse(T[]);
 
 
 With the corresponding inplace versions:
 
 T[] doSort(T[]);
 T[] doStableSort(T[]);
 T[] doSort(T[], delegate|function wrongOrder(T,T));
 T[] doReverse(T[]);
 
 ---------
 
 Is there in general even any interest in adding generic functions to the 
 standard library?
Heck yes. Many of the functions you've mentioned above were ones I was planning to implement. I also like that you're using delegates as predicates, as it makes them far more generic than some of the hardcoded versions in Phobos' std.string. I don't suppose you'd like to submit them to Ares as well? Or please don't object if I end up implementing something quite similar :-) Sean
Mar 09 2006
parent Oskar Linde <olREM OVEnada.kth.se> writes:
Thank you for your comments.

Sean Kelly wrote:
 Oskar Linde wrote:
 All functions are designed to be used both as free function and as
 implicit array methods. Except for the inplace versions, no functions
 modifies the array.
 
 The prototype notation is my own. a|b means two alternative types. T is
 the generic element type. T[] is the array.
To those who are unfamiliar with ITI, it's worth noting that any template parameters that cannot be determined implicitly should be placed at the beginning of the template parameter list. Doing so should allow this to be possible: template func( Ret, Val ) { Ret func( Val val ) {} } int i; char c = func!(char)( i ); This isn't implemented yet, but I suspect it will be soon.
Yes, I see no obstacles in implementing this.
 
 T min(T[] arr)
 
 Returns the minimum element in arr as defined by the < operator.
min and max should probably allow a comparison predicate to be supplied as well.
That is a good idea. The meaning should be the same as for the sort predicate.
 Thus the declarations would be something like this:
 
 T min(T[] arr, P pred = &less!(T));
 
 I'm also hoping the above syntax will actually work, and that P will
 resolve to being a function pointer by default.
What you can do today is this: template min(ArrTy, PredTy) { eltype!(ArrTy) min(ArrTy array, PredTy predicate) { ... } } template min(ArrTy) { eltype!(ArrTy) min(ArrTy array) { return array.min(&less!(eltype!(ArrTy))); } }
 T sum(T[] arr)
 
 Returns the sum of the element in arr as defined by the + operator.

 ptrdiff_t find(T[] arr, T|delegate|function d);
 
 Returns the index of the first occurence of d or the first true
 predicate d applied to the elements in order. Returns -1 when no element
 is found.
I suggest returning size_t instead and using size_t.max in place of -1. The two are basically equivalent, but using size_t offers a bit more range in legal array sizes as the flag value occupies the least significant bit rather than the most significant.
Yes. This is probably better. size_t.max is still == -1 in comparisons.
 I also almost suggested adding a "substring" find function, except that
 seems more appropriate for a std.string module.  Have you thought about
 how the two will overlap?  I'm not entirely certain how template
 overloading will work for functions defined in different modules.  In
 fact, if this is indeed a limitation them it might be prudent to put all
 such algorithms in std.array instead of splitting them between std.array
 and std.string (I just thought of this--I had been considering multiple
 modules for overloads in Ares, and I suspect this won't work).
I'm not certain, but I believe the current dmd-ifti will have problems with splitting template functions with the same name and number of template arguments. It would probably be best to put a generic subarray find in std.array rather than extending the find function for just {,d,w}char[]. But while one may argue for a generic subarray find, it becomes less certain that there should be a generic subarray replace for instance.
 T[] join(T[][] arr, T|T[]|delegate|function separator);
 
 Join the elements array arr using separator.
 (obsoletes std.string.join that only works for char[])
How do you envision the delegate version being applied? Is it expected to return an element/array for a separator?
Oops, this is a copy-paste error. I have not implemented this as taking a delegate/function. I did not see any use for that. The signature should be: T[] join(T[][] arr, T|T[] separator);
 ---------
 
 Is there in general even any interest in adding generic functions to the
 standard library?
Heck yes. Many of the functions you've mentioned above were ones I was planning to implement. I also like that you're using delegates as predicates, as it makes them far more generic than some of the hardcoded versions in Phobos' std.string. I don't suppose you'd like to submit them to Ares as well? Or please don't object if I end up implementing something quite similar :-)
I would definitely not want to exclude my implementation from Ares. Once the function specifications are defined, the implementation is really straight forward. I have no objections to you implementing similar functions either. My only concern is having this functionality in some form in a D standard library. :) /Oskar
Mar 09 2006