digitalmars.D.learn - A few range questions

Charles Smith (32/32) Jan 05 2016 Hi,

Gary Willoughby (6/11) Jan 05 2016 Use this instead:
=?UTF-8?Q?Ali_=c3=87ehreli?= (39/45) Jan 05 2016 You are traversing the array per index value (1000 times), which can be

Charles Smith (10/32) Jan 05 2016 I'm not sure what this explicitly is doing... Are you defining an

=?UTF-8?Q?Ali_=c3=87ehreli?= (51/67) Jan 05 2016 Yes. Since benchmark() expects no-parameter functions, that's a useful

Charles Smith (7/12) Jan 05 2016 Yeah, I've been reading an algorithm book, and this was in there.

Charles Smith <csmith.ku2013 gmail.com> writes:

Hi,

I'm trying to implement counting sort with ranges; however, hit a 
roadblock here while testing. I'm using the following code:

     int[1_000_000] arr;

     void main() {
         import std.conv : to;
         import std.datetime;
         import std.random;
         import std.stdio;

         for(int i; i < arr.length; ++i)
             arr[i] = uniform(0, 1000);

         auto benchmarks = benchmark!(algorithmSort, 
countingSort)(1);

         writeln("Agorithm's Sort: ", to!Duration(benchmarks[0]));
         writeln("Counting Sort: ", to!Duration(benchmarks[1]));
     }

     void algorithmSort() {
         import std.algorithm : sort;

         auto result = arr[].sort;
     }

     void countingSort() {
         import std.algorithm : count, map;
         import std.range : array, chain, iota, repeat;

         auto result = 1_000.iota.map!(a => a.repeat(count(arr[], 
a))).chain.array;
     }

1. `arr[].sort` is changing arr in place. Any way to not do that?
2. I noticed that result within `countingSort` is an array of 
arrays. I thought `chain` would concat them... did I miss 
something obvious?
3. It's kind of hard to compare performance because of (1), but 
is there a better way to do this?

Jan 05 2016

Gary Willoughby <dev nomad.so> writes:

On Tuesday, 5 January 2016 at 18:47:30 UTC, Charles Smith wrote:
 1. `arr[].sort` is changing arr in place. Any way to not do 
 that?

Use this instead:

auto result = sort(arr[].dup);

.dup duplicates the array and sort(...) uses the std.algorithm 
sort and not the built-in array sort method.

 2. I noticed that result within `countingSort` is an array of 
 arrays. I thought `chain` would concat them... did I miss 
 something obvious?

No idea yet.

Jan 05 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 01/05/2016 10:47 AM, Charles Smith wrote:

          auto result = 1_000.iota.map!(a => a.repeat(count(arr[],
 a))).chain.array;

You are traversing the array per index value (1000 times), which can be 
done once up front:

enum elementCount = 1_000_000;
enum valueCount = 1000;

void main() {
     import std.conv : to;
     import std.datetime;
     import std.random;
     import std.stdio;

     int[elementCount] arr;

     for(int i; i < arr.length; ++i)
         arr[i] = uniform(0, valueCount);

     // Now they get their own arrays:
     auto benchmarks = benchmark!(() => algorithmSort(arr.dup),
                                  () => countingSort(arr.dup))(1);

     writeln("Algorithm's Sort: ", to!Duration(benchmarks[0]));
     writeln("Counting Sort   : ", to!Duration(benchmarks[1]));
}

auto algorithmSort(int[] arr) {
     import std.algorithm : sort, sum;

     auto result = sort(arr[]);
     return result.sum;
}

auto countingSort(int[] arr) {
     import std.algorithm : count, map, joiner, sum, each;
     import std.range : array, repeat, enumerate;

     uint[valueCount] hist;
     arr.each!(a => ++hist[a]);

     auto result = hist[].enumerate.map!(t => t[0].repeat(t[1])).joiner;
     // To make sure that we consume the lazy range:
     return result.sum;
}

 2. I noticed that result within `countingSort` is an array of arrays. I
 thought `chain` would concat them... did I miss something obvious?

I think .joiner is what you're looking for.

 3. It's kind of hard to compare performance because of (1), but is there
 a better way to do this?

I changed the code to pass each function a different array.

My results with -O -inline -noboundscheck:

Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs
Counting Sort   : 21 ms, 563 μs, and 9 hnsecs

Ali

Jan 05 2016

Charles Smith <csmith.ku2013 gmail.com> writes:

On Tuesday, 5 January 2016 at 19:42:42 UTC, Ali Çehreli wrote:
 You are traversing the array per index value (1000 times), 
 which can be done once up front:

Good point. I guess I assumed `map!` was magic.


     // Now they get their own arrays:
     auto benchmarks = benchmark!(() => algorithmSort(arr.dup),
                                  () => 
 countingSort(arr.dup))(1);

I'm not sure what this explicitly is doing... Are you defining an 
anonymous function? If so, that seems really useful.

     uint[valueCount] hist;
     arr.each!(a => ++hist[a]);

     auto result = hist[].enumerate.map!(t => 
 t[0].repeat(t[1])).joiner;
     // To make sure that we consume the lazy range:
     return result.sum;
 }
 I think .joiner is what you're looking for.

Thanks a lot for this. Coming from a webdev background I'm 
ashamed I didn't guess `join`. That said, I don't think the 
example for `chain` should compile then. Just as an aside, I was 
searching the terms `concat` and `flatten ` when I was looking 
for this.

 3. It's kind of hard to compare performance because of (1),

 but is there
 a better way to do this?

 I changed the code to pass each function a different array.

 My results with -O -inline -noboundscheck:

 Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs
 Counting Sort   : 21 ms, 563 μs, and 9 hnsecs

 Ali

Awesome

Jan 05 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 01/05/2016 01:48 PM, Charles Smith wrote:
 On Tuesday, 5 January 2016 at 19:42:42 UTC, Ali Çehreli wrote:

     // Now they get their own arrays:
     auto benchmarks = benchmark!(() => algorithmSort(arr.dup),
                                  () => countingSort(arr.dup))(1);

 I'm not sure what this explicitly is doing... Are you defining an
 anonymous function? If so, that seems really useful.

Yes. Since benchmark() expects no-parameter functions, that's a useful 
way of testing functions that take parameters.

However, I should have created the arrays before calling benchmark(). 
Otherwise, the timings include .dup as well:

     auto asArr = arr.dup;
     auto csArr = arr.dup;

     auto benchmarks = benchmark!(() => algorithmSort(asArr),
                                  () => countingSort(csArr))(10);

 I think .joiner is what you're looking for.

 I was searching the terms `concat` and `flatten ` when I was
 looking for this.

Yep, I always start searching for 'flatten' and then remember joiner. :p

 That said, I don't think the example for `chain` should compile then.

That worked because chain() received just a single range (of ranges), in 
which case it had no effect.

 My results with -O -inline -noboundscheck:

 Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs
 Counting Sort   : 21 ms, 563 μs, and 9 hnsecs


 Awesome

I am amazed! :) countingSort() beats algorithmSort() almost always. :) 
Here is the final version of the program with 10 million elements and 4 
million values (array that are so large cannot be allocated on the stack 
for me; so, I used new):

enum elementCount = 10_000_000;
enum valueCount = 4_000_000;

void main() {
     import std.conv : to;
     import std.datetime;
     import std.random;
     import std.stdio;

     auto arr = new int[elementCount];

     for(int i; i < arr.length; ++i)
         arr[i] = uniform(0, valueCount);

     auto asArr = arr.dup;
     auto csArr = arr.dup;

     // Now they get their own arrays:
     auto benchmarks = benchmark!(() => algorithmSort(asArr),
                                  () => countingSort(csArr))(10);

     writeln("Algorithm's Sort: ", to!Duration(benchmarks[0]));
     writeln("Counting Sort   : ", to!Duration(benchmarks[1]));
}

auto algorithmSort(int[] arr) {
     import std.algorithm : sort, sum;

     arr.sort();
     return arr.sum;
}

auto countingSort(int[] arr) {
     import std.algorithm : count, map, joiner, sum, each;
     import std.range : array, repeat, enumerate;

     auto hist = new uint[valueCount];
     arr.each!(a => ++hist[a]);

     auto result = hist[].enumerate.map!(t => t[0].repeat(t[1])).joiner;
     return result.sum;
}

Now they are comparable:

Algorithm's Sort: 3 secs, 881 ms, 146 μs, and 1 hnsec
Counting Sort   : 3 secs, 990 ms, 315 μs, and 4 hnsecs

Ali

Jan 05 2016

Charles Smith <csmith.ku2013 gmail.com> writes:

On Tuesday, 5 January 2016 at 22:34:52 UTC, Ali Çehreli wrote:
 Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs
 Counting Sort   : 21 ms, 563 μs, and 9 hnsecs


 Awesome

 I am amazed! :) countingSort() beats algorithmSort() almost 
 always. :)

Yeah, I've been reading an algorithm book, and this was in there.

It has O(elementCount + valueCount) time complexity under a 
correct implementation, so it is fast. Also its best case is 
quick sort's worst case (lots of duplicated data), at which point 
it has O(elementCount) time complexity. I think D's sort uses 
timsort, so I'd expect it to be no worse.

Jan 05 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - A few range questions