digitalmars.D.learn - Why std.algorithm.sort can't be applied to char[]?

"hane" <han.ehit.suzi.0 gmail.com> writes:
```and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D, but
it's very unnatural.
```
May 12 2014
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:
and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D,
but it's very unnatural.

char[] is a rather special type of array: the language has
unicode support and iterates over it by code-point (i.e. not
guaranteed to be a single char per iteration).

If you want to sort chars and are assuming ASCII, you can just
use std.string.representation to work with them as integer types:

import std.algorithm;
import std.range;
import std.string;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2.representation); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error
}
```
May 12 2014
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:
On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:
and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D,
but it's very unnatural.

char[] is a rather special type of array: the language has
unicode support and iterates over it by code-point (i.e. not
guaranteed to be a single char per iteration).

If you want to sort chars and are assuming ASCII, you can just
use std.string.representation to work with them as integer
types:

import std.algorithm;
import std.range;
import std.string;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2.representation); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error
}

woops, should have deleted the // error comments
```
May 12 2014
"hane" <han.ehit.suzi.0 gmail.com> writes:
```On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:
char[] is a rather special type of array: the language has
unicode support and iterates over it by code-point (i.e. not
guaranteed to be a single char per iteration).

If you want to sort chars and are assuming ASCII, you can just
use std.string.representation to work with them as integer
types:

On Monday, 12 May 2014 at 16:30:12 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
All strings in D are treated as ranges of dchar, not their
element type. This
has to with the fact that a char or wchar are only part of a
character. If you
want to sort arrays of characters, you need to use dchar[].

I understand the reason. Thanks!
```
May 13 2014
Jonathan M Davis via Digitalmars-d-learn writes:
```On Mon, 12 May 2014 14:49:52 +0000
hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D, but
it's very unnatural.

All strings in D are treated as ranges of dchar, not their element type. This
has to with the fact that a char or wchar are only part of a character. If you
want to sort arrays of characters, you need to use dchar[].

http://stackoverflow.com/questions/12288465
http://stackoverflow.com/questions/16590650

- Jonathan M Davis
```
May 12 2014
Charles Hixson via Digitalmars-d-learn writes:
```On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn wrote:
On Mon, 12 May 2014 14:49:52 +0000
hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D, but
it's very unnatural.

All strings in D are treated as ranges of dchar, not their element type. This
has to with the fact that a char or wchar are only part of a character. If you
want to sort arrays of characters, you need to use dchar[].

http://stackoverflow.com/questions/12288465
http://stackoverflow.com/questions/16590650

- Jonathan M Davis

Given that he was working with pure ASCII, he should be able to cast the
array to byte[] and sort it, but I haven't tried.  Also char[] isn't
string.  Strings are immutable, and thus cannot be sorted in place.  To
me this looks like an error in sort that he should be able to work
around with a cast.

--
Charles Hixson
```
May 12 2014
Jonathan M Davis via Digitalmars-d-learn writes:
```On Mon, 12 May 2014 11:08:47 -0700
Charles Hixson via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn
wrote:
On Mon, 12 May 2014 14:49:52 +0000
hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
int[] arr = [5, 3, 7];
sort(arr); // OK

char[] arr2 = ['z', 'g', 'c'];
sort(arr2); // error
sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D, but
it's very unnatural.

All strings in D are treated as ranges of dchar, not their element
type. This has to with the fact that a char or wchar are only part
of a character. If you want to sort arrays of characters, you need
to use dchar[].

http://stackoverflow.com/questions/12288465
http://stackoverflow.com/questions/16590650

- Jonathan M Davis

Given that he was working with pure ASCII, he should be able to cast
the array to byte[] and sort it, but I haven't tried.

Sure, you can cast char[] to ubyte[] and sort that if you know that the array
only holds pure ASCII. In fact, you can use std.string.representation to do it
- e.g.

auto ascii = str.representation;

and if str were mutable, then you could sort it. But that will only work if
the string only contains ASCII characters. Regardless, he wanted to know why
he couldn't sort char[], and I explained why - all strings are treated as
ranges of dchar, making it so that if their element type is char or wchar, so
they're not random access and thus can't be sorted.

Also char[] isn't string.

Yes, string is aliased to immutable(string)[], but char[] is still a string
type, and what I said applies to all string types. In particular, arrays of
char or wchar are called "narrow strings," because it's not guaranteed that
one of their code units is a full code point, unlike arrays of dchar. But all
arrays of char, wchar, or dchar are strings.

Strings are immutable, and thus cannot be sorted in place.

"string" is immutable and thus couldn't be sorted regardless of whether narrow
strings were treated as ranges of dchar or not, but char[] is a string just as
much as immutable(char)[] is. It's just not string.

- Jonathan M Davis
```
May 12 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
Sure, you can cast char[] to ubyte[] and sort that if you know
that the array
only holds pure ASCII. In fact, you can use
std.string.representation to do it
- e.g.

auto ascii = str.representation;

and if str were mutable, then you could sort it. But that will
only work if
the string only contains ASCII characters. Regardless, he
wanted to know why
he couldn't sort char[], and I explained why - all strings are
treated as
ranges of dchar, making it so that if their element type is
char or wchar, so
they're not random access and thus can't be sorted.

Arguably, a smart enough implementation should know how to sort a
"char[]", while still preserving codepoint integrity.

As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s = "éöeèûà".dup;
s.sort;
writeln(s);
}
//prints:
eàèéöû
```
May 14 2014
"bearophile" <bearophileHUGS lycos.com> writes:
```monarch_dodra:

As a matter of fact, the built in "sort" property does it.

It's going to be deprecated "soon".

Bye,
bearophile
```
May 14 2014
"John Colvin" <john.loughran.colvin gmail.com> writes:
```On Wednesday, 14 May 2014 at 08:27:46 UTC, monarch_dodra wrote:
On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
Sure, you can cast char[] to ubyte[] and sort that if you know
that the array
only holds pure ASCII. In fact, you can use
std.string.representation to do it
- e.g.

auto ascii = str.representation;

and if str were mutable, then you could sort it. But that will
only work if
the string only contains ASCII characters. Regardless, he
wanted to know why
he couldn't sort char[], and I explained why - all strings are
treated as
ranges of dchar, making it so that if their element type is
char or wchar, so
they're not random access and thus can't be sorted.

Arguably, a smart enough implementation should know how to sort
a "char[]", while still preserving codepoint integrity.

As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s = "éöeèûà".dup;
s.sort;
writeln(s);
}
//prints:
eàèéöû

Why would anyone ever want to sort code-points?

They might want to sort graphemes, but that's difficult to do
in-place (needs O(n) memory, I think...). If out-of-place is good
enough

someStr.byGrapheme.array.sort();
```
May 14 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Wednesday, 14 May 2014 at 09:01:23 UTC, John Colvin wrote:
Why would anyone ever want to sort code-points?

Why not? To remove duplicate characters?

They might want to sort graphemes, but that's difficult to do
in-place (needs O(n) memory, I think...). If out-of-place is
good enough

someStr.byGrapheme.array.sort();

The current "status quo" in D is that a dchar basically
represents a character.
```
May 15 2014
Jonathan M Davis via Digitalmars-d-learn writes:
```On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
Sure, you can cast char[] to ubyte[] and sort that if you know
that the array
only holds pure ASCII. In fact, you can use
std.string.representation to do it
- e.g.

auto ascii = str.representation;

and if str were mutable, then you could sort it. But that will
only work if
the string only contains ASCII characters. Regardless, he
wanted to know why
he couldn't sort char[], and I explained why - all strings are
treated as
ranges of dchar, making it so that if their element type is
char or wchar, so
they're not random access and thus can't be sorted.

Arguably, a smart enough implementation should know how to sort a
"char[]", while still preserving codepoint integrity.

I don't think that that can be done at the same algorithmic complexity though.
So, I don't know if that would be acceptable or not from the standpoint of
std.algorithm.sort. But even if it's a good idea, someone would have to
special case sort for char[], and no one has done that.

As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s = "éöeèûà".dup;
s.sort;
writeln(s);
}
//prints:
eàèéöû

I'm surprised. I thought that one of Bearophile's favorite complaints was that
it didn't sort unicode properly (and hence one of the reasons that it should
be removed). Regardless, I do think that it should be removed.

- Jonathan M Davis
```
May 14 2014
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via  =

Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
Sure, you can cast char[] to ubyte[] and sort that if you know
that the array
only holds pure ASCII. In fact, you can use
std.string.representation to do it
- e.g.

auto ascii =3D str.representation;

and if str were mutable, then you could sort it. But that will
only work if
the string only contains ASCII characters. Regardless, he
wanted to know why
he couldn't sort char[], and I explained why - all strings are
treated as
ranges of dchar, making it so that if their element type is
char or wchar, so
they're not random access and thus can't be sorted.

Arguably, a smart enough implementation should know how to sort a
"char[]", while still preserving codepoint integrity.

I don't think that that can be done at the same algorithmic complexity=

=

though.
So, I don't know if that would be acceptable or not from the standpoin=

t  =

of
std.algorithm.sort. But even if it's a good idea, someone would have t=

o
special case sort for char[], and no one has done that.

I think it can be done at O(nlgn) complexity, but you must allocate a  =

block of scratch space to do the sorting. You can't do it in-place becau=
se  =

swapping isn't available.

Might as well convert to dchar and back explicitly.

Merge sort may allow more promising speedups, but I still think you will=
=

need to allocate extra space.

As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup;
s.sort;
writeln(s);
}
//prints:
e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB

I'm surprised. I thought that one of Bearophile's favorite complaints =

=

was that
it didn't sort unicode properly (and hence one of the reasons that it =

=

should
be removed). Regardless, I do think that it should be removed.

I can't believe this worked. I want to say that it's a freak accident fo=
r  =

that set of characters. Looking in druntime, I don't see where the speci=
al  =

case is.

-Steve
```
May 15 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
wrote:
On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via
Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:
As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s = "éöeèûà".dup;
s.sort;
writeln(s);
}
//prints:
eàèéöû

I'm surprised. I thought that one of Bearophile's favorite
complaints was that
it didn't sort unicode properly (and hence one of the reasons
that it should
be removed). Regardless, I do think that it should be removed.

I can't believe this worked. I want to say that it's a freak
accident for that set of characters. Looking in druntime, I
don't see where the special case is.

-Steve

Must be a hell of a freak accident ;)

auto s = "é東öe京ûタèワà".dup;
writeln(s.sort);

=> eàèéöûタワ京東

It's basically: string=>dstring=>sort=>dstring=>string.

BTW, the built in reverse also works with char[], and so does
std.algorithm.reverse (and it does it pretty cleverly too, might
I say).

As far as I'm concerned, if we *can* do it in n.log(n), and
somebody provides the implementation, then there is no reason to
not offer dchar sorting for char[]/wchar.
```
May 15 2014
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra <monarchdodra gmail.co=
m>  =

wrote:

On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
wrote:
On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via  =

Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:
As a matter of fact, the built in "sort" property does it.

void main()
{
char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup;
s.sort;
writeln(s);
}
//prints:
e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB

I'm surprised. I thought that one of Bearophile's favorite complaint=

s  =

was that
it didn't sort unicode properly (and hence one of the reasons that i=

t  =

should
be removed). Regardless, I do think that it should be removed.

I can't believe this worked. I want to say that it's a freak accident=

=

for that set of characters. Looking in druntime, I don't see where th=

e  =

special case is.

-Steve

Must be a hell of a freak accident ;)

auto s =3D "=C3=A9=E6=9D=B1=C3=B6e=E4=BA=AC=C3=BB=E3=82=BF=C3=A8=

=E3=83=AF=C3=A0".dup;
writeln(s.sort);

=3D> e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB=E3=82=BF=E3=83=AF=E4=BA=AC=E6=9D=B1=

Seriously, I fucking hate the file names in druntime.

I looked in qsort.d.

It's basically: string=3D>dstring=3D>sort=3D>dstring=3D>string.

OK, that's what I would have expected. I don't think we should support  =

this.

BTW, the built in reverse also works with char[], and so does  =

std.algorithm.reverse (and it does it pretty cleverly too, might I say=

).

This is a different algorithm. Sort requires random-access swapping.  =

Reverse does not.

As far as I'm concerned, if we *can* do it in n.log(n), and somebody  =

provides the implementation, then there is no reason to not offer dcha=

r  =

sorting for char[]/wchar.

I think there is nothing wrong with requiring the steps to be explicit. =
We  =

should not hide such bloat behind sort.

-Steve
```
May 15 2014
"monarch_dodra" <monarchdodra gmail.com> writes:
```On Thursday, 15 May 2014 at 17:46:52 UTC, Steven Schveighoffer
wrote:
As far as I'm concerned, if we *can* do it in n.log(n), and
somebody provides the implementation, then there is no reason
to not offer dchar sorting for char[]/wchar.

I think there is nothing wrong with requiring the steps to be
explicit. We should not hide such bloat behind sort.

-Steve

Of course, but I meant if we could find an algoirthm O(1) (or