www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - sort, .array and folding on immutable data (finding most common

reply Ali <something something.com> writes:
Hey, so I have this data file that has a list of a string of 
characters separated by new lines. The task is to find the most 
common letter in each column. Ie if file is:

abc
axy
cxc

Then the letters are a (column 1), x and c.

I've written the code to do this at compile time. But I have a 
few questions about sorting, immutablity casting and the need to 
use .array. The code follows:

==================================
import std.stdio, std.range, std.algorithm;

// Line 1
static immutable data = 
import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;

// Line 2
alias T = typeof(data[0][0]);

auto redux(T[] charData) {
     return charData.fold!((a, b) {
         return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
     })(T.init);
}

// Line 3
static immutable word = data.map!redux.array.map!"a[0]".array;

void main() {
     word.writeln;
}
==================================

Well mostly I'm looking for how to simplify this even further. So 
any hints there would be awesome.

And, there're a few questions I have about the code (I guess 
mainly because of my of my lack of understanding of the type 
system)

1. The first line with the splitting, I need to use .array three 
times. The last one I understand is because on "line 2" I alias T 
as the type of the data, and if I don't call .array then it's a 
MapResult type which has no opIndex. Yes?

2. On "line 1" I have to call .array.sort(). Why can't I just 
call .sort() on the transposed rows?

3. If I call .sort (without parenthesis) I get a compiler error. 
What up with that?

4. On "line 3" I call map with the free function "redux". In this 
function, if I return just "a", it's all good. If I return "b", 
then I get "Incompatible function/seed/element: 
__lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". So 
my seed is not immutable, but the elements of "data" are. I can 
understand that. But how do I work around it without having to do 
"typeof(a)(b[0], b[1])" ?

5. Is there anyway to get rid of the the "alias" I have in the 
code and just use an inline lambda in my map call on "line 3"?
Dec 18 2016
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:
 Hey, so I have this data file that has a list of a string of 
 characters separated by new lines. The task is to find the most 
 common letter in each column. Ie if file is:

 abc
 axy
 cxc

 Then the letters are a (column 1), x and c.

 I've written the code to do this at compile time. But I have a 
 few questions about sorting, immutablity casting and the need 
 to use .array. The code follows:

 ==================================
 import std.stdio, std.range, std.algorithm;

 // Line 1
 static immutable data = 
 import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;

 // Line 2
 alias T = typeof(data[0][0]);

 auto redux(T[] charData) {
     return charData.fold!((a, b) {
         return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
     })(T.init);
 }

 // Line 3
 static immutable word = data.map!redux.array.map!"a[0]".array;

 void main() {
     word.writeln;
 }
 ==================================

 Well mostly I'm looking for how to simplify this even further. 
 So any hints there would be awesome.

 And, there're a few questions I have about the code (I guess 
 mainly because of my of my lack of understanding of the type 
 system)

 1. The first line with the splitting, I need to use .array 
 three times. The last one I understand is because on "line 2" I 
 alias T as the type of the data, and if I don't call .array 
 then it's a MapResult type which has no opIndex. Yes?

 2. On "line 1" I have to call .array.sort(). Why can't I just 
 call .sort() on the transposed rows?
Because sort requires a random access range.
 3. If I call .sort (without parenthesis) I get a compiler 
 error. What up with that?
Unfortunately arrays have a builtin property sort that for some reason takes precedence of std.algorithm.sort when called without (). The compiler error is probably because .sort is a mutating operation and you're working with immutable data.
 4. On "line 3" I call map with the free function "redux". In 
 this function, if I return just "a", it's all good. If I return 
 "b", then I get "Incompatible function/seed/element: 
 __lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". 
 So my seed is not immutable, but the elements of "data" are. I 
 can understand that. But how do I work around it without having 
 to do "typeof(a)(b[0], b[1])" ?
Try })(ImmutableOf!T.init); or use the seedless version of fold. may need to import std.traits(?)
 5. Is there anyway to get rid of the the "alias" I have in the 
 code and just use an inline lambda in my map call on "line 3"?
does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` work?
Dec 18 2016
parent reply Ali <something something.com> writes:
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
wrote:
 On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:
 1. The first line with the splitting, I need to use .array 
 three times. The last one I understand is because on "line 2" 
 I alias T as the type of the data, and if I don't call .array 
 then it's a MapResult type which has no opIndex. Yes?

 2. On "line 1" I have to call .array.sort(). Why can't I just 
 call .sort() on the transposed rows?
Because sort requires a random access range.
Ok so laziness stops as soon as sort is required on a range then? Ahh, because in place algorithms? Are there any plans in D to make is to that you can output copies to collections so that you could do something like filter.transpose.sort and just have it output a modified copy?
 Unfortunately arrays have a builtin property sort that for some 
 reason takes precedence of std.algorithm.sort when called 
 without (). The compiler error is probably because .sort is a 
 mutating operation and you're working with immutable data.
Ah ok. I think it's just that array.sort doesn't have CTFE capabilities then: "Error: _adSort cannot be interpreted at compile time, because it has no available source code"
 Try
      })(ImmutableOf!T.init);
 or use the seedless version of fold.
 may need to import std.traits(?)
Same error message :(
 does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` 
 work?
Actually I guess what's stopping this is my problem above maybe? Since I can't jut return b because of that error. I can't use the seedless version of fold either because Another version of the problem requires the seed be T.init(dchar.max, uint.max). Thanks for input so far!
Dec 19 2016
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
 On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
 wrote:
 [...]
Ok so laziness stops as soon as sort is required on a range then?
No. Because a lazy range is not random access, and therefore does not meet sorts requirement.
 Ahh, because in place algorithms?
Yes
Are there any plans in
 D to make is to that you can output copies to collections so 
 that you could do something like filter.transpose.sort and just 
 have it output a modified copy?
None that I know.
 [...]
Hmm, for the other problem you could do chain(only(T(dchar.max, uint.max)),range) and still use the seedless version.
 Thanks for input so far!
Dec 19 2016
parent reply Ali <something something.com> writes:
On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson 
wrote:
 On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
 On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
 wrote:
 [...]
Ok so laziness stops as soon as sort is required on a range then?
No. Because a lazy range is not random access, and therefore does not meet sorts requirement.
 Ahh, because in place algorithms?
Yes
Are there any plans in
 D to make is to that you can output copies to collections so 
 that you could do something like filter.transpose.sort and 
 just have it output a modified copy?
None that I know.
 [...]
Hmm, for the other problem you could do chain(only(T(dchar.max, uint.max)),range) and still use the seedless version.
 Thanks for input so far!
The seedless version without the typeof(a)(b[0], b[1]) hack (with default inited T) seems to crap out with: "Unable to deduce an acceptable seed type for __lambda2 with element type immutable(Tuple!(dchar, uint))." BTW: Your chain fix does indeed make the T(dchar.init, uint.init) seeded fold work on a seedless version of fold :D. Only problem is the need for the typeof hack again though. Some more info: The following alternate version of this program works great. // Swapping the tuple elements that are produced by "group" around is necessary for reduce!max to work. auto data = import("data_06.txt").split("\n").transposed.map!`a.array.sort().gr up.map!"tuple(a[1], a[0])".array`.array; void main() { auto word = data.map!(reduce!max).array.map!"a[1]".array; word.writeln; } But this stops working as soon as you take the "word" variable outside the scope of main and in to global scope. If you make everything static immutable then you get: "Error: static assert "Unable to deduce an acceptable seed type for std.algorithm.comparison.max with element type immutable(Tuple!(uint, dchar))."
Dec 19 2016
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 19 December 2016 at 11:42:55 UTC, Ali wrote:
 On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson 
 wrote:
 [...]
The seedless version without the typeof(a)(b[0], b[1]) hack (with default inited T) seems to crap out with: [...]
Ah, oh well. It was nice in theory.
 [...]
auto word = data.map!(reduce!max).array.map!"a[1]".array; you want auto word = data.map!"a[1]".map!(reduce!max).array; which will try to take the max of a single variable, not a tuple, which should succeed.
Dec 19 2016
parent Ali <something something.com> writes:
On Monday, 19 December 2016 at 12:45:48 UTC, Nicholas Wilson 
wrote:
 Ah, oh well. It was nice in theory.
Indeed. Thank you for trying Nicholas :)
 auto word = data.map!(reduce!max).array.map!"a[1]".array;

 you want

 auto word = data.map!"a[1]".map!(reduce!max).array;
Problem max has to performed on the frequency part of the tuple and "word" has to result in a concatenation of all the character parts of the highest frequencied letters. So that up there will result in "word" = 364782 // or something.
Dec 19 2016
prev sibling parent reply pineapple <meapineapple gmail.com> writes:
On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
 Ok so laziness stops as soon as sort is required on a range 
 then? Ahh, because in place algorithms? Are there any plans in 
 D to make is to that you can output copies to collections so 
 that you could do something like filter.transpose.sort and just 
 have it output a modified copy?
This is a shortcoming of Phobos - here is a package of sorting algorithms including some that do not require their inputs to be mutable, random access, and/or finite: https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort The library is permissively licensed; feel free to take out whatever you need. It's worth noting that giving up eagerness, random access, etc. often comes with a speed penalty. It may be more efficient just to copy the lazy things into memory first and then sort in-place, as you have been doing.
Dec 19 2016
parent Ali <something something.com> writes:
On Monday, 19 December 2016 at 14:09:47 UTC, pineapple wrote:
 This is a shortcoming of Phobos - here is a package of sorting 
 algorithms including some that do not require their inputs to 
 be mutable, random access, and/or finite:

 https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort
Oh nice! Will take a looksies
 It's worth noting that giving up eagerness, random access, etc. 
 often comes with a speed penalty. It may be more efficient just 
 to copy the lazy things into memory first and then sort 
 in-place, as you have been doing.
True dat!
Dec 19 2016