digitalmars.D.learn - A few range questions
- Charles Smith (32/32) Jan 05 2016 Hi,
- Gary Willoughby (6/11) Jan 05 2016 Use this instead:
- =?UTF-8?Q?Ali_=c3=87ehreli?= (39/45) Jan 05 2016 You are traversing the array per index value (1000 times), which can be
- Charles Smith (10/32) Jan 05 2016 I'm not sure what this explicitly is doing... Are you defining an
- =?UTF-8?Q?Ali_=c3=87ehreli?= (51/67) Jan 05 2016 Yes. Since benchmark() expects no-parameter functions, that's a useful
- Charles Smith (7/12) Jan 05 2016 Yeah, I've been reading an algorithm book, and this was in there.
Hi, I'm trying to implement counting sort with ranges; however, hit a roadblock here while testing. I'm using the following code: int[1_000_000] arr; void main() { import std.conv : to; import std.datetime; import std.random; import std.stdio; for(int i; i < arr.length; ++i) arr[i] = uniform(0, 1000); auto benchmarks = benchmark!(algorithmSort, countingSort)(1); writeln("Agorithm's Sort: ", to!Duration(benchmarks[0])); writeln("Counting Sort: ", to!Duration(benchmarks[1])); } void algorithmSort() { import std.algorithm : sort; auto result = arr[].sort; } void countingSort() { import std.algorithm : count, map; import std.range : array, chain, iota, repeat; auto result = 1_000.iota.map!(a => a.repeat(count(arr[], a))).chain.array; } 1. `arr[].sort` is changing arr in place. Any way to not do that? 2. I noticed that result within `countingSort` is an array of arrays. I thought `chain` would concat them... did I miss something obvious? 3. It's kind of hard to compare performance because of (1), but is there a better way to do this?
Jan 05 2016
On Tuesday, 5 January 2016 at 18:47:30 UTC, Charles Smith wrote:1. `arr[].sort` is changing arr in place. Any way to not do that?Use this instead: auto result = sort(arr[].dup); .dup duplicates the array and sort(...) uses the std.algorithm sort and not the built-in array sort method.2. I noticed that result within `countingSort` is an array of arrays. I thought `chain` would concat them... did I miss something obvious?No idea yet.
Jan 05 2016
On 01/05/2016 10:47 AM, Charles Smith wrote:auto result = 1_000.iota.map!(a => a.repeat(count(arr[], a))).chain.array;You are traversing the array per index value (1000 times), which can be done once up front: enum elementCount = 1_000_000; enum valueCount = 1000; void main() { import std.conv : to; import std.datetime; import std.random; import std.stdio; int[elementCount] arr; for(int i; i < arr.length; ++i) arr[i] = uniform(0, valueCount); // Now they get their own arrays: auto benchmarks = benchmark!(() => algorithmSort(arr.dup), () => countingSort(arr.dup))(1); writeln("Algorithm's Sort: ", to!Duration(benchmarks[0])); writeln("Counting Sort : ", to!Duration(benchmarks[1])); } auto algorithmSort(int[] arr) { import std.algorithm : sort, sum; auto result = sort(arr[]); return result.sum; } auto countingSort(int[] arr) { import std.algorithm : count, map, joiner, sum, each; import std.range : array, repeat, enumerate; uint[valueCount] hist; arr.each!(a => ++hist[a]); auto result = hist[].enumerate.map!(t => t[0].repeat(t[1])).joiner; // To make sure that we consume the lazy range: return result.sum; }2. I noticed that result within `countingSort` is an array of arrays. I thought `chain` would concat them... did I miss something obvious?I think .joiner is what you're looking for.3. It's kind of hard to compare performance because of (1), but is there a better way to do this?I changed the code to pass each function a different array. My results with -O -inline -noboundscheck: Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs Counting Sort : 21 ms, 563 μs, and 9 hnsecs Ali
Jan 05 2016
On Tuesday, 5 January 2016 at 19:42:42 UTC, Ali Çehreli wrote:You are traversing the array per index value (1000 times), which can be done once up front:Good point. I guess I assumed `map!` was magic.// Now they get their own arrays: auto benchmarks = benchmark!(() => algorithmSort(arr.dup), () => countingSort(arr.dup))(1);I'm not sure what this explicitly is doing... Are you defining an anonymous function? If so, that seems really useful.uint[valueCount] hist; arr.each!(a => ++hist[a]); auto result = hist[].enumerate.map!(t => t[0].repeat(t[1])).joiner; // To make sure that we consume the lazy range: return result.sum; } I think .joiner is what you're looking for.Thanks a lot for this. Coming from a webdev background I'm ashamed I didn't guess `join`. That said, I don't think the example for `chain` should compile then. Just as an aside, I was searching the terms `concat` and `flatten ` when I was looking for this.Awesome3. It's kind of hard to compare performance because of (1),but is therea better way to do this?I changed the code to pass each function a different array. My results with -O -inline -noboundscheck: Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs Counting Sort : 21 ms, 563 μs, and 9 hnsecs Ali
Jan 05 2016
On 01/05/2016 01:48 PM, Charles Smith wrote:On Tuesday, 5 January 2016 at 19:42:42 UTC, Ali Çehreli wrote:Yes. Since benchmark() expects no-parameter functions, that's a useful way of testing functions that take parameters. However, I should have created the arrays before calling benchmark(). Otherwise, the timings include .dup as well: auto asArr = arr.dup; auto csArr = arr.dup; auto benchmarks = benchmark!(() => algorithmSort(asArr), () => countingSort(csArr))(10);// Now they get their own arrays: auto benchmarks = benchmark!(() => algorithmSort(arr.dup), () => countingSort(arr.dup))(1);I'm not sure what this explicitly is doing... Are you defining an anonymous function? If so, that seems really useful.Yep, I always start searching for 'flatten' and then remember joiner. :pI think .joiner is what you're looking for.I was searching the terms `concat` and `flatten ` when I was looking for this.That said, I don't think the example for `chain` should compile then.That worked because chain() received just a single range (of ranges), in which case it had no effect.My results with -O -inline -noboundscheck: Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs Counting Sort : 21 ms, 563 μs, and 9 hnsecsAwesomeI am amazed! :) countingSort() beats algorithmSort() almost always. :) Here is the final version of the program with 10 million elements and 4 million values (array that are so large cannot be allocated on the stack for me; so, I used new): enum elementCount = 10_000_000; enum valueCount = 4_000_000; void main() { import std.conv : to; import std.datetime; import std.random; import std.stdio; auto arr = new int[elementCount]; for(int i; i < arr.length; ++i) arr[i] = uniform(0, valueCount); auto asArr = arr.dup; auto csArr = arr.dup; // Now they get their own arrays: auto benchmarks = benchmark!(() => algorithmSort(asArr), () => countingSort(csArr))(10); writeln("Algorithm's Sort: ", to!Duration(benchmarks[0])); writeln("Counting Sort : ", to!Duration(benchmarks[1])); } auto algorithmSort(int[] arr) { import std.algorithm : sort, sum; arr.sort(); return arr.sum; } auto countingSort(int[] arr) { import std.algorithm : count, map, joiner, sum, each; import std.range : array, repeat, enumerate; auto hist = new uint[valueCount]; arr.each!(a => ++hist[a]); auto result = hist[].enumerate.map!(t => t[0].repeat(t[1])).joiner; return result.sum; } Now they are comparable: Algorithm's Sort: 3 secs, 881 ms, 146 μs, and 1 hnsec Counting Sort : 3 secs, 990 ms, 315 μs, and 4 hnsecs Ali
Jan 05 2016
On Tuesday, 5 January 2016 at 22:34:52 UTC, Ali Çehreli wrote:Yeah, I've been reading an algorithm book, and this was in there. It has O(elementCount + valueCount) time complexity under a correct implementation, so it is fast. Also its best case is quick sort's worst case (lots of duplicated data), at which point it has O(elementCount) time complexity. I think D's sort uses timsort, so I'd expect it to be no worse.Algorithm's Sort: 76 ms, 910 μs, and 8 hnsecs Counting Sort : 21 ms, 563 μs, and 9 hnsecsAwesomeI am amazed! :) countingSort() beats algorithmSort() almost always. :)
Jan 05 2016