digitalmars.D.learn - sort, .array and folding on immutable data (finding most common
- Ali (50/50) Dec 18 2016 Hey, so I have this data file that has a list of a string of
- Nicholas Wilson (11/61) Dec 18 2016 Unfortunately arrays have a builtin property sort that for some
- Ali (16/37) Dec 19 2016 Ok so laziness stops as soon as sort is required on a range then?
- Nicholas Wilson (8/20) Dec 19 2016 No. Because a lazy range is not random access, and therefore does
- Ali (25/47) Dec 19 2016 The seedless version without the typeof(a)(b[0], b[1]) hack (with
- Nicholas Wilson (7/14) Dec 19 2016 auto word = data.map!(reduce!max).array.map!"a[1]".array;
- Ali (7/11) Dec 19 2016 Indeed. Thank you for trying Nicholas :)
- pineapple (11/16) Dec 19 2016 This is a shortcoming of Phobos - here is a package of sorting
- Ali (3/11) Dec 19 2016 True dat!
Hey, so I have this data file that has a list of a string of characters separated by new lines. The task is to find the most common letter in each column. Ie if file is: abc axy cxc Then the letters are a (column 1), x and c. I've written the code to do this at compile time. But I have a few questions about sorting, immutablity casting and the need to use .array. The code follows: ================================== import std.stdio, std.range, std.algorithm; // Line 1 static immutable data = import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array; // Line 2 alias T = typeof(data[0][0]); auto redux(T[] charData) { return charData.fold!((a, b) { return a[1] > b[1] ? a : typeof(a)(b[0], b[1]); })(T.init); } // Line 3 static immutable word = data.map!redux.array.map!"a[0]".array; void main() { word.writeln; } ================================== Well mostly I'm looking for how to simplify this even further. So any hints there would be awesome. And, there're a few questions I have about the code (I guess mainly because of my of my lack of understanding of the type system) 1. The first line with the splitting, I need to use .array three times. The last one I understand is because on "line 2" I alias T as the type of the data, and if I don't call .array then it's a MapResult type which has no opIndex. Yes? 2. On "line 1" I have to call .array.sort(). Why can't I just call .sort() on the transposed rows? 3. If I call .sort (without parenthesis) I get a compiler error. What up with that? 4. On "line 3" I call map with the free function "redux". In this function, if I return just "a", it's all good. If I return "b", then I get "Incompatible function/seed/element: __lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". So my seed is not immutable, but the elements of "data" are. I can understand that. But how do I work around it without having to do "typeof(a)(b[0], b[1])" ? 5. Is there anyway to get rid of the the "alias" I have in the code and just use an inline lambda in my map call on "line 3"?
Dec 18 2016
On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:Hey, so I have this data file that has a list of a string of characters separated by new lines. The task is to find the most common letter in each column. Ie if file is: abc axy cxc Then the letters are a (column 1), x and c. I've written the code to do this at compile time. But I have a few questions about sorting, immutablity casting and the need to use .array. The code follows: ================================== import std.stdio, std.range, std.algorithm; // Line 1 static immutable data = import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array; // Line 2 alias T = typeof(data[0][0]); auto redux(T[] charData) { return charData.fold!((a, b) { return a[1] > b[1] ? a : typeof(a)(b[0], b[1]); })(T.init); } // Line 3 static immutable word = data.map!redux.array.map!"a[0]".array; void main() { word.writeln; } ================================== Well mostly I'm looking for how to simplify this even further. So any hints there would be awesome. And, there're a few questions I have about the code (I guess mainly because of my of my lack of understanding of the type system) 1. The first line with the splitting, I need to use .array three times. The last one I understand is because on "line 2" I alias T as the type of the data, and if I don't call .array then it's a MapResult type which has no opIndex. Yes? 2. On "line 1" I have to call .array.sort(). Why can't I just call .sort() on the transposed rows?Because sort requires a random access range.3. If I call .sort (without parenthesis) I get a compiler error. What up with that?Unfortunately arrays have a builtin property sort that for some reason takes precedence of std.algorithm.sort when called without (). The compiler error is probably because .sort is a mutating operation and you're working with immutable data.4. On "line 3" I call map with the free function "redux". In this function, if I return just "a", it's all good. If I return "b", then I get "Incompatible function/seed/element: __lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". So my seed is not immutable, but the elements of "data" are. I can understand that. But how do I work around it without having to do "typeof(a)(b[0], b[1])" ?Try })(ImmutableOf!T.init); or use the seedless version of fold. may need to import std.traits(?)5. Is there anyway to get rid of the the "alias" I have in the code and just use an inline lambda in my map call on "line 3"?does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` work?
Dec 18 2016
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson wrote:On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:Ok so laziness stops as soon as sort is required on a range then? Ahh, because in place algorithms? Are there any plans in D to make is to that you can output copies to collections so that you could do something like filter.transpose.sort and just have it output a modified copy?1. The first line with the splitting, I need to use .array three times. The last one I understand is because on "line 2" I alias T as the type of the data, and if I don't call .array then it's a MapResult type which has no opIndex. Yes? 2. On "line 1" I have to call .array.sort(). Why can't I just call .sort() on the transposed rows?Because sort requires a random access range.Unfortunately arrays have a builtin property sort that for some reason takes precedence of std.algorithm.sort when called without (). The compiler error is probably because .sort is a mutating operation and you're working with immutable data.Ah ok. I think it's just that array.sort doesn't have CTFE capabilities then: "Error: _adSort cannot be interpreted at compile time, because it has no available source code"Try })(ImmutableOf!T.init); or use the seedless version of fold. may need to import std.traits(?)Same error message :(does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` work?Actually I guess what's stopping this is my problem above maybe? Since I can't jut return b because of that error. I can't use the seedless version of fold either because Another version of the problem requires the seed be T.init(dchar.max, uint.max). Thanks for input so far!
Dec 19 2016
On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson wrote:No. Because a lazy range is not random access, and therefore does not meet sorts requirement.[...]Ok so laziness stops as soon as sort is required on a range then?Ahh, because in place algorithms?YesAre there any plans in D to make is to that you can output copies to collections so that you could do something like filter.transpose.sort and just have it output a modified copy?None that I know.[...]Hmm, for the other problem you could do chain(only(T(dchar.max, uint.max)),range) and still use the seedless version.Thanks for input so far!
Dec 19 2016
On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson wrote:On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:The seedless version without the typeof(a)(b[0], b[1]) hack (with default inited T) seems to crap out with: "Unable to deduce an acceptable seed type for __lambda2 with element type immutable(Tuple!(dchar, uint))." BTW: Your chain fix does indeed make the T(dchar.init, uint.init) seeded fold work on a seedless version of fold :D. Only problem is the need for the typeof hack again though. Some more info: The following alternate version of this program works great. // Swapping the tuple elements that are produced by "group" around is necessary for reduce!max to work. auto data = import("data_06.txt").split("\n").transposed.map!`a.array.sort().gr up.map!"tuple(a[1], a[0])".array`.array; void main() { auto word = data.map!(reduce!max).array.map!"a[1]".array; word.writeln; } But this stops working as soon as you take the "word" variable outside the scope of main and in to global scope. If you make everything static immutable then you get: "Error: static assert "Unable to deduce an acceptable seed type for std.algorithm.comparison.max with element type immutable(Tuple!(uint, dchar))."On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson wrote:No. Because a lazy range is not random access, and therefore does not meet sorts requirement.[...]Ok so laziness stops as soon as sort is required on a range then?Ahh, because in place algorithms?YesAre there any plans in D to make is to that you can output copies to collections so that you could do something like filter.transpose.sort and just have it output a modified copy?None that I know.[...]Hmm, for the other problem you could do chain(only(T(dchar.max, uint.max)),range) and still use the seedless version.Thanks for input so far!
Dec 19 2016
On Monday, 19 December 2016 at 11:42:55 UTC, Ali wrote:On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson wrote:Ah, oh well. It was nice in theory.The seedless version without the typeof(a)(b[0], b[1]) hack (with default inited T) seems to crap out with: [...][...][...]auto word = data.map!(reduce!max).array.map!"a[1]".array; you want auto word = data.map!"a[1]".map!(reduce!max).array; which will try to take the max of a single variable, not a tuple, which should succeed.
Dec 19 2016
On Monday, 19 December 2016 at 12:45:48 UTC, Nicholas Wilson wrote:Ah, oh well. It was nice in theory.Indeed. Thank you for trying Nicholas :)auto word = data.map!(reduce!max).array.map!"a[1]".array; you want auto word = data.map!"a[1]".map!(reduce!max).array;Problem max has to performed on the frequency part of the tuple and "word" has to result in a concatenation of all the character parts of the highest frequencied letters. So that up there will result in "word" = 364782 // or something.
Dec 19 2016
On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:Ok so laziness stops as soon as sort is required on a range then? Ahh, because in place algorithms? Are there any plans in D to make is to that you can output copies to collections so that you could do something like filter.transpose.sort and just have it output a modified copy?This is a shortcoming of Phobos - here is a package of sorting algorithms including some that do not require their inputs to be mutable, random access, and/or finite: https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort The library is permissively licensed; feel free to take out whatever you need. It's worth noting that giving up eagerness, random access, etc. often comes with a speed penalty. It may be more efficient just to copy the lazy things into memory first and then sort in-place, as you have been doing.
Dec 19 2016
On Monday, 19 December 2016 at 14:09:47 UTC, pineapple wrote:This is a shortcoming of Phobos - here is a package of sorting algorithms including some that do not require their inputs to be mutable, random access, and/or finite: https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sortOh nice! Will take a looksiesIt's worth noting that giving up eagerness, random access, etc. often comes with a speed penalty. It may be more efficient just to copy the lazy things into memory first and then sort in-place, as you have been doing.True dat!
Dec 19 2016