digitalmars.D.learn - sort, .array and folding on immutable data (finding most common

• Ali (50/50) Dec 18 2016 Hey, so I have this data file that has a list of a string of
• Nicholas Wilson (11/61) Dec 18 2016 Unfortunately arrays have a builtin property sort that for some
• Ali (16/37) Dec 19 2016 Ok so laziness stops as soon as sort is required on a range then?
• Nicholas Wilson (8/20) Dec 19 2016 No. Because a lazy range is not random access, and therefore does
• Ali (25/47) Dec 19 2016 The seedless version without the typeof(a)(b[0], b[1]) hack (with
• Nicholas Wilson (7/14) Dec 19 2016 auto word = data.map!(reduce!max).array.map!"a[1]".array;
• Ali (7/11) Dec 19 2016 Indeed. Thank you for trying Nicholas :)
• pineapple (11/16) Dec 19 2016 This is a shortcoming of Phobos - here is a package of sorting
• Ali (3/11) Dec 19 2016 True dat!
Ali <something something.com> writes:
```Hey, so I have this data file that has a list of a string of
characters separated by new lines. The task is to find the most
common letter in each column. Ie if file is:

abc
axy
cxc

Then the letters are a (column 1), x and c.

I've written the code to do this at compile time. But I have a
few questions about sorting, immutablity casting and the need to
use .array. The code follows:

==================================
import std.stdio, std.range, std.algorithm;

// Line 1
static immutable data =
import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;

// Line 2
alias T = typeof(data[0][0]);

auto redux(T[] charData) {
return charData.fold!((a, b) {
return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
})(T.init);
}

// Line 3
static immutable word = data.map!redux.array.map!"a[0]".array;

void main() {
word.writeln;
}
==================================

Well mostly I'm looking for how to simplify this even further. So
any hints there would be awesome.

And, there're a few questions I have about the code (I guess
mainly because of my of my lack of understanding of the type
system)

1. The first line with the splitting, I need to use .array three
times. The last one I understand is because on "line 2" I alias T
as the type of the data, and if I don't call .array then it's a
MapResult type which has no opIndex. Yes?

2. On "line 1" I have to call .array.sort(). Why can't I just
call .sort() on the transposed rows?

3. If I call .sort (without parenthesis) I get a compiler error.
What up with that?

4. On "line 3" I call map with the free function "redux". In this
function, if I return just "a", it's all good. If I return "b",
then I get "Incompatible function/seed/element:
__lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". So
my seed is not immutable, but the elements of "data" are. I can
understand that. But how do I work around it without having to do
"typeof(a)(b[0], b[1])" ?

5. Is there anyway to get rid of the the "alias" I have in the
code and just use an inline lambda in my map call on "line 3"?
```
Dec 18 2016
Nicholas Wilson <iamthewilsonator hotmail.com> writes:
```On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:
Hey, so I have this data file that has a list of a string of
characters separated by new lines. The task is to find the most
common letter in each column. Ie if file is:

abc
axy
cxc

Then the letters are a (column 1), x and c.

I've written the code to do this at compile time. But I have a
few questions about sorting, immutablity casting and the need
to use .array. The code follows:

==================================
import std.stdio, std.range, std.algorithm;

// Line 1
static immutable data =
import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;

// Line 2
alias T = typeof(data[0][0]);

auto redux(T[] charData) {
return charData.fold!((a, b) {
return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
})(T.init);
}

// Line 3
static immutable word = data.map!redux.array.map!"a[0]".array;

void main() {
word.writeln;
}
==================================

Well mostly I'm looking for how to simplify this even further.
So any hints there would be awesome.

And, there're a few questions I have about the code (I guess
mainly because of my of my lack of understanding of the type
system)

1. The first line with the splitting, I need to use .array
three times. The last one I understand is because on "line 2" I
alias T as the type of the data, and if I don't call .array
then it's a MapResult type which has no opIndex. Yes?

2. On "line 1" I have to call .array.sort(). Why can't I just
call .sort() on the transposed rows?

Because sort requires a random access range.

3. If I call .sort (without parenthesis) I get a compiler
error. What up with that?

Unfortunately arrays have a builtin property sort that for some
reason takes precedence of std.algorithm.sort when called without
(). The compiler error is probably because .sort is a mutating
operation and you're working with immutable data.

4. On "line 3" I call map with the free function "redux". In
this function, if I return just "a", it's all good. If I return
"b", then I get "Incompatible function/seed/element:
__lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))".
So my seed is not immutable, but the elements of "data" are. I
can understand that. But how do I work around it without having
to do "typeof(a)(b[0], b[1])" ?

Try
})(ImmutableOf!T.init);
or use the seedless version of fold.
may need to import std.traits(?)
5. Is there anyway to get rid of the the "alias" I have in the
code and just use an inline lambda in my map call on "line 3"?

does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` work?
```
Dec 18 2016
Ali <something something.com> writes:
```On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson
wrote:
On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:
1. The first line with the splitting, I need to use .array
three times. The last one I understand is because on "line 2"
I alias T as the type of the data, and if I don't call .array
then it's a MapResult type which has no opIndex. Yes?

2. On "line 1" I have to call .array.sort(). Why can't I just
call .sort() on the transposed rows?

Because sort requires a random access range.

Ok so laziness stops as soon as sort is required on a range then?
Ahh, because in place algorithms? Are there any plans in D to
make is to that you can output copies to collections so that you
could do something like filter.transpose.sort and just have it
output a modified copy?

Unfortunately arrays have a builtin property sort that for some
reason takes precedence of std.algorithm.sort when called
without (). The compiler error is probably because .sort is a
mutating operation and you're working with immutable data.

Ah ok. I think it's just that array.sort doesn't have CTFE
capabilities then: "Error: _adSort cannot be interpreted at
compile time, because it has no available source code"

Try
})(ImmutableOf!T.init);
or use the seedless version of fold.
may need to import std.traits(?)

Same error message :(

does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;`
work?

Actually I guess what's stopping this is my problem above maybe?
Since I can't jut return b because of that error. I can't use the
seedless version of fold either because Another version of the
problem requires the seed be T.init(dchar.max, uint.max).

Thanks for input so far!
```
Dec 19 2016
Nicholas Wilson <iamthewilsonator hotmail.com> writes:
```On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson
wrote:
[...]

Ok so laziness stops as soon as sort is required on a range
then?

No. Because a lazy range is not random access, and therefore does
not meet sorts requirement.

Ahh, because in place algorithms?

Yes

Are there any plans in
D to make is to that you can output copies to collections so
that you could do something like filter.transpose.sort and just
have it output a modified copy?

None that I know.

[...]

Hmm, for the other problem you could do

chain(only(T(dchar.max, uint.max)),range)

and still use the seedless version.

Thanks for input so far!

```
Dec 19 2016
Ali <something something.com> writes:
```On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson
wrote:
On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson
wrote:
[...]

Ok so laziness stops as soon as sort is required on a range
then?

No. Because a lazy range is not random access, and therefore
does not meet sorts requirement.

Ahh, because in place algorithms?

Yes

Are there any plans in
D to make is to that you can output copies to collections so
that you could do something like filter.transpose.sort and
just have it output a modified copy?

None that I know.

[...]

Hmm, for the other problem you could do

chain(only(T(dchar.max, uint.max)),range)

and still use the seedless version.

Thanks for input so far!

The seedless version without the typeof(a)(b[0], b[1]) hack (with
default inited T) seems to crap out with:

"Unable to deduce an acceptable seed type for __lambda2 with
element type immutable(Tuple!(dchar, uint))."

BTW: Your chain fix does indeed make the T(dchar.init, uint.init)
seeded fold work on a seedless version of fold :D. Only problem
is the need for the typeof hack again though.

works great.

// Swapping the tuple elements that are produced by "group"
around is necessary for reduce!max to work.
auto data =
import("data_06.txt").split("\n").transposed.map!`a.array.sort().gr
up.map!"tuple(a[1], a[0])".array`.array;
void main() {
auto word = data.map!(reduce!max).array.map!"a[1]".array;
word.writeln;
}

But this stops working as soon as you take the "word" variable
outside the scope of main and in to global scope. If you make
everything static immutable then you get:

"Error: static assert  "Unable to deduce an acceptable seed type
for std.algorithm.comparison.max with element type
immutable(Tuple!(uint, dchar))."
```
Dec 19 2016
Nicholas Wilson <iamthewilsonator hotmail.com> writes:
```On Monday, 19 December 2016 at 11:42:55 UTC, Ali wrote:
On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson
wrote:
[...]

The seedless version without the typeof(a)(b[0], b[1]) hack
(with default inited T) seems to crap out with:

[...]

Ah, oh well. It was nice in theory.

[...]

auto word = data.map!(reduce!max).array.map!"a[1]".array;

you want

auto word = data.map!"a[1]".map!(reduce!max).array;

which will try to take the max of a single variable, not a tuple,
which should succeed.
```
Dec 19 2016
Ali <something something.com> writes:
```On Monday, 19 December 2016 at 12:45:48 UTC, Nicholas Wilson
wrote:
Ah, oh well. It was nice in theory.

Indeed. Thank you for trying Nicholas :)

auto word = data.map!(reduce!max).array.map!"a[1]".array;

you want

auto word = data.map!"a[1]".map!(reduce!max).array;

Problem max has to performed on the frequency part of the tuple
and "word" has to result in a concatenation of all the character
parts of the highest frequencied letters. So that up there will
result in "word" = 364782 // or something.
```
Dec 19 2016
pineapple <meapineapple gmail.com> writes:
```On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
Ok so laziness stops as soon as sort is required on a range
then? Ahh, because in place algorithms? Are there any plans in
D to make is to that you can output copies to collections so
that you could do something like filter.transpose.sort and just
have it output a modified copy?

This is a shortcoming of Phobos - here is a package of sorting
algorithms including some that do not require their inputs to be
mutable, random access, and/or finite:

https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort

The library is permissively licensed; feel free to take out
whatever you need.

It's worth noting that giving up eagerness, random access, etc.
often comes with a speed penalty. It may be more efficient just
to copy the lazy things into memory first and then sort in-place,
as you have been doing.
```
Dec 19 2016
Ali <something something.com> writes:
```On Monday, 19 December 2016 at 14:09:47 UTC, pineapple wrote:
This is a shortcoming of Phobos - here is a package of sorting
algorithms including some that do not require their inputs to
be mutable, random access, and/or finite:

https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort

Oh nice! Will take a looksies

It's worth noting that giving up eagerness, random access, etc.
often comes with a speed penalty. It may be more efficient just
to copy the lazy things into memory first and then sort
in-place, as you have been doing.

True dat!
```
Dec 19 2016