digitalmars.D.learn - Generating Strings with Random Contents

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/2) Jul 14 2014 Is there a natural way of generating/filling a

bearophile (15/18) Jul 14 2014 Do you mean something like this?

Brad Anderson (3/21) Jul 14 2014 Alternative:

Brad Anderson (3/5) Jul 14 2014 std.ascii should really be using std.encoding.AsciiString. Then
bearophile (4/9) Jul 14 2014 Bye,

Brad Anderson (2/12) Jul 14 2014 Hmm, good catch. Not the behavior I expected.

Joseph Rushton Wakeling via Digitalmars-d-learn (6/8) Jul 16 2014 No, I don't think that's appropriate, because it will pick 10 individual...

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/22) Jul 14 2014 I was specifically interested in something that exercises (random

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/4) Jul 14 2014 On Monday, 14 July 2014 at 22:32:51 UTC, Nordlöw wrote:

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/9) Jul 14 2014 isValidCodePoint()

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (6/7) Jul 14 2014 Is it really this simple?

bearophile (6/9) Jul 14 2014 Several combinations of unicode chars are not meaningful/valid

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/6) Jul 14 2014 So I guess we need something more than just isValidCodePoint

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/5) Jul 14 2014 Here's a first try:

bearophile (5/6) Jul 14 2014 Isn't @trusted mostly for small parts of Phobos code? I suggest

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/3) Jul 15 2014 Could someone elaborate shortly which cases this means?

bearophile (5/6) Jul 15 2014 All cases where you really can't live without it :-) It's like a

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/4) Jul 15 2014 Hmm. I guess I'm gonna have to remove some @trusted tagging then

bearophile (5/8) Jul 14 2014 That's harder. Generating all uints and then testing if it's a

Joseph Rushton Wakeling via Digitalmars-d-learn (6/8) Jul 16 2014 I think you need to be more specific about what kind of random contents ...

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (43/47) Jul 17 2014 Just a random dchar (ciode point) sample like this:

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

Is there a natural way of generating/filling a 
string/wstring/dstring of a specific length with random contents?

Jul 14 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?

Do you mean something like this?


import std.stdio, std.random, std.ascii, std.range, std.conv;

string genRandomString(in size_t len) {
     return len
            .iota
            .map!(_ => lowercase[uniform(0, $)])
            .text;
}

void main() {
     import std.stdio;

     10.genRandomString.writeln;
}


Bye,
bearophile

Jul 14 2014

"Brad Anderson" <eco gnuk.net> writes:

On Monday, 14 July 2014 at 22:21:36 UTC, bearophile wrote:
 Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?

 Do you mean something like this?


 import std.stdio, std.random, std.ascii, std.range, std.conv;

 string genRandomString(in size_t len) {
     return len
            .iota
            .map!(_ => lowercase[uniform(0, $)])
            .text;
 }

 void main() {
     import std.stdio;

     10.genRandomString.writeln;
 }


 Bye,
 bearophile

Alternative:

randomSample(lowercase, 10, lowercase.length).writeln;

Jul 14 2014

"Brad Anderson" <eco gnuk.net> writes:

On Monday, 14 July 2014 at 22:27:57 UTC, Brad Anderson wrote:
 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;

std.ascii should really be using std.encoding.AsciiString. Then
that length wouldn't be necessary.

Jul 14 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Brad Anderson:

 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;

 From randomSample docs:

Selects a random subsample out of r, containing exactly n 
elements. The order of elements is the same as in the original 
range.<

Bye,
bearophile

Jul 14 2014

"Brad Anderson" <eco gnuk.net> writes:

On Monday, 14 July 2014 at 22:32:25 UTC, bearophile wrote:
 Brad Anderson:

 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;

 From randomSample docs:

Selects a random subsample out of r, containing exactly n 
elements. The order of elements is the same as in the original 
range.<

 Bye,
 bearophile

Hmm, good catch. Not the behavior I expected.

Jul 14 2014

Joseph Rushton Wakeling via Digitalmars-d-learn writes:

 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;

No, I don't think that's appropriate, because it will pick 10 individual 
characters from a, b, c, ... , z (i.e. no character will appear more than
once), 
and the characters picked will appear in alphabetical order.

Incidentally, if lowercase has the .length property, there's no need to pass
the 
length separately to randomSample.  Just passing lowercase itself and the
number 
of sample points desired is sufficient.

Jul 16 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:21:36 UTC, bearophile wrote:
 Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?

 Do you mean something like this?


 import std.stdio, std.random, std.ascii, std.range, std.conv;

 string genRandomString(in size_t len) {
     return len
            .iota
            .map!(_ => lowercase[uniform(0, $)])
            .text;
 }

 void main() {
     import std.stdio;

     10.genRandomString.writeln;
 }


 Bye,
 bearophile

I was specifically interested in something that exercises (random 
samples) potentially _all_ code points for string, wstring and 
dstring (all code units that is).

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:32:51 UTC, Nordlöw wrote:

I believe defining a complete random sampling of all code units 
in dchar is a good start right? This can then be reused to lazily 
convert while filling in a string and wstring.

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:35:59 UTC, Nordlöw wrote:
 On Monday, 14 July 2014 at 22:32:51 UTC, Nordlöw wrote:

 I believe defining a complete random sampling of all code units 
 in dchar is a good start right? This can then be reused to 
 lazily convert while filling in a string and wstring.

isValidCodePoint()

at

http://dlang.org/phobos/std_encoding.html

might be were to start.

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:39:08 UTC, Nordlöw wrote:
 might be were to start.

Is it really this simple?

bool isValidCodePoint(dchar c)
{
     return c < 0xD800 || (c >= 0xE000 && c < 0x110000);
}

Jul 14 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nordlöw:

 I believe defining a complete random sampling of all code units 
 in dchar is a good start right? This can then be reused to 
 lazily convert while filling in a string and wstring.

Several combinations of unicode chars are not meaningful/valid 
(like pairs of ligatures). Any thing that has to work correctly 
with Unicode is complex.

Bye,
bearophile

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:39:15 UTC, bearophile wrote:
 Several combinations of unicode chars are not meaningful/valid 
 (like pairs of ligatures). Any thing that has to work correctly 
 with Unicode is complex.

So I guess we need something more than just isValidCodePoint 
right?

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 14 July 2014 at 22:45:29 UTC, Nordlöw wrote:
 So I guess we need something more than just isValidCodePoint 
 right?

Here's a first try:

https://github.com/nordlow/justd/blob/master/random_ex.d#L53

Jul 14 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nordlöw:

 https://github.com/nordlow/justd/blob/master/random_ex.d#L53

Isn't  trusted mostly for small parts of Phobos code? I suggest 
to avoid using  trusted in most cases.

Bye,
bearophile

Jul 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Tuesday, 15 July 2014 at 00:03:04 UTC, bearophile wrote:
 to avoid using  trusted in most cases.

Could someone elaborate shortly which cases this means?

Jul 15 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nordlöw:

 Could someone elaborate shortly which cases this means?

All cases where you really can't live without it :-) It's like a 
cast(.

Bye,
bearophile

Jul 15 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Tuesday, 15 July 2014 at 18:50:06 UTC, bearophile wrote:
 All cases where you really can't live without it :-) It's like

Hmm. I guess I'm gonna have to remove some  trusted tagging then 
;)

Jul 15 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nordlöw:

 I was specifically interested in something that exercises 
 (random samples) potentially _all_ code points for string, 
 wstring and dstring (all code units that is).

That's harder. Generating all uints and then testing if it's a 
Unicode dchar seems possible.

Bye,
bearophile

Jul 14 2014

Joseph Rushton Wakeling via Digitalmars-d-learn writes:

On 15/07/14 00:16, "Nordlöw" via Digitalmars-d-learn wrote:
 Is there a natural way of generating/filling a string/wstring/dstring of a
 specific length with random contents?

I think you need to be more specific about what kind of random contents you are 
interested in having.

Are you interested in having each character in the sequence randomly chosen 
independently of all the others, or do you want a random subset of all
available 
characters (i.e. no character appears more than once), or something else again?

Jul 16 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 16 July 2014 at 23:24:24 UTC, Joseph Rushton 
Wakeling via Digitalmars-d-learn wrote:
 Are you interested in having each character in the sequence 
 randomly chosen independently of all the others, or do you want 
 a random subset of all available characters (i.e. no character 
 appears more than once), or something else again?

Just a random dchar (ciode point) sample like this:

/** Generate Random Contents of $(D x).
     See also: 
http://forum.dlang.org/thread/emlgflxpgecxsqweauhc forum.dlang.org
  */
auto ref randInPlace(ref dchar x)  trusted
{
     auto ui = uniform(0,
                       0xD800 +
                       (0x110000 - 0xE000) - 2 // minus two for 
U+FFFE and U+FFFF
         );
     if (ui < 0xD800)
     {
         return x = ui;
     }
     else
     {
         ui -= 0xD800;
         ui += 0xE000;

         // skip undefined
         if (ui < 0xFFFE)
             return x = ui;
         else
             ui += 2;

         assert(ui < 0x110000);
         return x = ui;
     }
}

I don't know how well this plays with

unittest
{
     import dbg;
     dln(randomized!dchar);
     dstring d = 
"alphaalphaalphaalphaalphaalphaalphaalphaalphaalpha";
     dln(d.randomize);
}

though.

See complete logic at

https://github.com/nordlow/justd/blob/master/random_ex.d

Jul 17 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Generating Strings with Random Contents