digitalmars.D.learn - shuffle a character array
- celavek (26/26) Jul 20 2016 Hi
- Mike Parker (23/24) Jul 20 2016 You can. However, if you take a look at the documentation for
- Mike Parker (4/7) Jul 20 2016 And I forgot to add:
- Mike Parker (4/12) Jul 20 2016 Actually, std.conv.to might be better, since toUTF32 returns
- celavek (11/21) Jul 20 2016 Ahhh! That again. I was thinking about using the representation.
- Mike Parker (14/20) Jul 20 2016 representation does not allocate any new memory. It points to the
- celavek (5/18) Jul 20 2016 Thank you for the very useful information. I really appreciate
- pineapple (4/12) Jul 20 2016 There's also the shuffle module in mach.range which doesn't do
- Mike Parker (4/17) Jul 20 2016 There is no auto-decoding going on here, as char[] and wchar[]
- pineapple (4/7) Jul 20 2016 They are considered random access ranges by my ranges library,
- pineapple (4/11) Jul 20 2016 On second thought that's not even relevant - the linked-to module
- pineapple (8/20) Jul 20 2016 Pardon my being scatterbrained (and there not being an "edit
- Mike Parker (7/14) Jul 20 2016 The relevant lines I quoted from the docs above explain quite
- ag0aep6g (6/11) Jul 20 2016 Without auto decoding, char[] would (most probably) be a random access
- Jesse Phillips (9/16) Jul 20 2016 I think you mean that your range library treats them as arrays of
- pineapple (6/9) Jul 21 2016 Right - I disagree with the assessment that all (or even most)
- ketmar (3/6) Jul 20 2016 ...due to autodecoding.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/13) Jul 20 2016 I think both not being random access ranges and there is auto-decoding
- ketmar (5/10) Jul 20 2016 but it does happen that we have autodecoding, and
- Jack Stouffer (3/4) Jul 20 2016 making it impossible to access randomly __correctly__, unless
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/8) Jul 20 2016 Yes, perhaps I should have said "making it not meaningful to access
- Mike Parker (5/11) Jul 20 2016 No, due to them being multi-byte formats. I don't see what auto
- celavek (2/5) Jul 20 2016 Interesting project. Thanks for the link.
Hi I'm trying to shuffle a character array but I get some compilation errors. * char[] upper = std.ascii.uppercase.dup; randomShuffle!(typeof(upper))(upper); randomShuffle(upper); example.d(34): Error: template std.random.randomShuffle cannot deduce function from argument types !(char[])(char[]), candidates are: /usr/include/dmd/phobos/std/random.d(1822): std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen) /usr/include/dmd/phobos/std/random.d(1829): std.random.randomShuffle(Range)(Range r) if (isRandomAccessRange!Range) example.d(34): Error: template std.random.randomShuffle cannot deduce function from argument types !()(char[]), candidates are: /usr/include/dmd/phobos/std/random.d(1822): std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen) /usr/include/dmd/phobos/std/random.d(1829): std.random.randomShuffle(Range)(Range r) if (isRandomAccessRange!Range) * I thought that I could use a dynamic array as a range ...
Jul 20 2016
On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:I thought that I could use a dynamic array as a range ...You can. However, if you take a look at the documentation for std.random.randomShuffle [1], you'll find the following constraint: if (isRandomAccessRange!Range); You can then go to the documentation for std.range.primitives.isRandomAccessRange [2], where you'll find the following: "Although char[] and wchar[] (as well as their qualified versions including string and wstring) are arrays, isRandomAccessRange yields false for them because they use variable-length encodings (UTF-8 and UTF-16 respectively). These types are bidirectional ranges only." If you are absolutely, 100% certain that you are dealing with ASCII, you can do this: ``` import std.string : representation; randomShuffle(charArray.representation); That will give you a ubyte[] for char[] and a ushort[] for wchar[]. [1] https://dlang.org/phobos/std_random.html#.randomShuffle [2] https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRange
Jul 20 2016
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:If you are absolutely, 100% certain that you are dealing with ASCII, you can do this:And I forgot to add: Otherwise, you'll want to convert to dchar[] (probably via std.utf.toUTF32) and pass that along instead.
Jul 20 2016
On Wednesday, 20 July 2016 at 08:05:20 UTC, Mike Parker wrote:On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:Actually, std.conv.to might be better, since toUTF32 returns dstring: auto dcharArray = to!(dchar[])(charArray);On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:If you are absolutely, 100% certain that you are dealing with ASCII, you can do this:And I forgot to add: Otherwise, you'll want to convert to dchar[] (probably via std.utf.toUTF32) and pass that along instead.
Jul 20 2016
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:If you are absolutely, 100% certain that you are dealing with ASCII, you can do this: ``` import std.string : representation; randomShuffle(charArray.representation); That will give you a ubyte[] for char[] and a ushort[] for wchar[]. [1] https://dlang.org/phobos/std_random.html#.randomShuffle [2] https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRangeAhhh! That again. I was thinking about using the representation. I should take a deeper look at the documentation. As far as my current understanding goes the shuffle will be done in place. If I use the "representation" would that still hold, that is will I be able to use the same char[] but in the shuffled form? (of course I will test that) Thank you
Jul 20 2016
On Wednesday, 20 July 2016 at 08:18:55 UTC, celavek wrote:As far as my current understanding goes the shuffle will be done in place. If I use the "representation" would that still hold, that is will I be able to use the same char[] but in the shuffled form? (of course I will test that)representation does not allocate any new memory. It points to the same memory, same data. If we think of D arrays as something like this: struct Array(T) { size_t len; T* ptr; } Then representation is doing this: Array original; Array representation(original.len, original.ptr); So, yes, the char data will still be shuffled in place. All you're doing is getting a ubyte view onto it so that it can be treated as a range.
Jul 20 2016
On Wednesday, 20 July 2016 at 08:30:37 UTC, Mike Parker wrote:representation does not allocate any new memory. It points to the same memory, same data. If we think of D arrays as something like this: struct Array(T) { size_t len; T* ptr; } Then representation is doing this: Array original; Array representation(original.len, original.ptr); So, yes, the char data will still be shuffled in place. All you're doing is getting a ubyte view onto it so that it can be treated as a range.Thank you for the very useful information. I really appreciate taking the time to explain these, maybe trivial, things to me. I confirmed the behavior with a test. working as expected.
Jul 20 2016
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:You can then go to the documentation for std.range.primitives.isRandomAccessRange [2], where you'll find the following: "Although char[] and wchar[] (as well as their qualified versions including string and wstring) are arrays, isRandomAccessRange yields false for them because they use variable-length encodings (UTF-8 and UTF-16 respectively). These types are bidirectional ranges only."There's also the shuffle module in mach.range which doesn't do any auto-decoding: https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d
Jul 20 2016
On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.You can then go to the documentation for std.range.primitives.isRandomAccessRange [2], where you'll find the following: "Although char[] and wchar[] (as well as their qualified versions including string and wstring) are arrays, isRandomAccessRange yields false for them because they use variable-length encodings (UTF-8 and UTF-16 respectively). These types are bidirectional ranges only."There's also the shuffle module in mach.range which doesn't do any auto-decoding: https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d
Jul 20 2016
On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
Jul 20 2016
On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:On second thought that's not even relevant - the linked-to module performs an out-of-place shuffle and so does not even require the input range to have random access.There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
Jul 20 2016
On Wednesday, 20 July 2016 at 16:04:50 UTC, pineapple wrote:On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:Pardon my being scatterbrained (and there not being an "edit post" function) - you're referring to phobos not considering char[] and wchar[] to have random access? The reason they are not considered to have random access is because they are auto-decoded by other functions that handle them, and the auto-decoding makes random access inefficient. Not because shuffleRandom itself auto-decodes them.On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:On second thought that's not even relevant - the linked-to module performs an out-of-place shuffle and so does not even require the input range to have random access.There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
Jul 20 2016
On Wednesday, 20 July 2016 at 16:08:26 UTC, pineapple wrote:Pardon my being scatterbrained (and there not being an "edit post" function) - you're referring to phobos not considering char[] and wchar[] to have random access? The reason they are not considered to have random access is because they are auto-decoded by other functions that handle them, and the auto-decoding makes random access inefficient. Not because shuffleRandom itself auto-decodes them.The relevant lines I quoted from the docs above explain quite clearly that it's because they are multi-byte formats. Indexing them is not inefficient, it simply makes no sense. What does it mean to take the value at index i when it is part of a multi-byte sequence that continues at index i+1? Auto-decoding has nothing to do with it.
Jul 20 2016
On 07/20/2016 06:18 PM, Mike Parker wrote:The relevant lines I quoted from the docs above explain quite clearly that it's because they are multi-byte formats. Indexing them is not inefficient, it simply makes no sense. What does it mean to take the value at index i when it is part of a multi-byte sequence that continues at index i+1? Auto-decoding has nothing to do with it.Without auto decoding, char[] would (most probably) be a random access range of code units. Taking the value at index i would return the code unit at index i, like it does for the array. It's not that way, because narrow strings are decoded by the range primitives (auto decoding).
Jul 20 2016
On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:I think you mean that your range library treats them as arrays of code units, meaning your library will break (some) unicode strings. Note that auto decoding and random access range are different. The isRandomAccess check must make a special condition that the string is not "narrow" else they would be considered random access even though front automatically decodes. 922: static assert(!isNarrowString!R);There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
Jul 20 2016
On Wednesday, 20 July 2016 at 18:32:15 UTC, Jesse Phillips wrote:I think you mean that your range library treats them as arrays of code units, meaning your library will break (some) unicode strings.Right - I disagree with the assessment that all (or even most) char[] types are intended to represent unicode strings, rather than arrays containing chars. If you want your array to be interpreted as a unicode string, then you should use std.utc's byGrapheme or similar functions.
Jul 21 2016
On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:There is no auto-decoding going on here,...as char[] and wchar[] are rejected outright since they are not considered random access ranges....due to autodecoding.
Jul 20 2016
On 07/20/2016 09:44 AM, ketmar wrote:On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:I think both not being random access ranges and there is auto-decoding in Phobos are design decisions due to the fact that char[] is a multi-byte encoding. Phobos could choose not to auto-decode but char[] would still be multi-byte, making it impossible to access randomly. AliThere is no auto-decoding going on here,...as char[] and wchar[] are rejected outright since they are not considered random access ranges....due to autodecoding.
Jul 20 2016
On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:I think both not being random access ranges and there is auto-decoding in Phobos are design decisions due to the fact that char[] is a multi-byte encoding. Phobos could choose not to auto-decode but char[] would still be multi-byte, making it impossible to access randomly.but it does happen that we have autodecoding, and non-random-access char ranges, and it is clearly tied. so, leaving aside "what if..." things, we can say that it is autodecoding issue. ;-)
Jul 20 2016
On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:making it impossible to access randomlymaking it impossible to access randomly __correctly__, unless you're safely assuming there's only ASCII in your string.
Jul 20 2016
On 07/20/2016 10:40 AM, Jack Stouffer wrote:On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:Yes, perhaps I should have said "making it not meaningful to access randomly" (in general, as you note). Alimaking it impossible to access randomlymaking it impossible to access randomly __correctly__, unless you're safely assuming there's only ASCII in your string.
Jul 20 2016
On Wednesday, 20 July 2016 at 16:44:11 UTC, ketmar wrote:On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:No, due to them being multi-byte formats. I don't see what auto decoding has to do with it. That's a separate concept. We could take auto decoding out of Phobos and still disqualify them as random access ranges.There is no auto-decoding going on here,...as char[] and wchar[] are rejected outright since they are not considered random access ranges....due to autodecoding.
Jul 20 2016
On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:There's also the shuffle module in mach.range which doesn't do any auto-decoding: https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.dInteresting project. Thanks for the link.
Jul 20 2016