www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Generating Strings with Random Contents

reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
Is there a natural way of generating/filling a 
string/wstring/dstring of a specific length with random contents?
Jul 14 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?
Do you mean something like this? import std.stdio, std.random, std.ascii, std.range, std.conv; string genRandomString(in size_t len) { return len .iota .map!(_ => lowercase[uniform(0, $)]) .text; } void main() { import std.stdio; 10.genRandomString.writeln; } Bye, bearophile
Jul 14 2014
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Monday, 14 July 2014 at 22:21:36 UTC, bearophile wrote:
 Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?
Do you mean something like this? import std.stdio, std.random, std.ascii, std.range, std.conv; string genRandomString(in size_t len) { return len .iota .map!(_ => lowercase[uniform(0, $)]) .text; } void main() { import std.stdio; 10.genRandomString.writeln; } Bye, bearophile
Alternative: randomSample(lowercase, 10, lowercase.length).writeln;
Jul 14 2014
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Monday, 14 July 2014 at 22:27:57 UTC, Brad Anderson wrote:
 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;
std.ascii should really be using std.encoding.AsciiString. Then that length wouldn't be necessary.
Jul 14 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Brad Anderson:

 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;
From randomSample docs:
Selects a random subsample out of r, containing exactly n 
elements. The order of elements is the same as in the original 
range.<
Bye, bearophile
Jul 14 2014
parent "Brad Anderson" <eco gnuk.net> writes:
On Monday, 14 July 2014 at 22:32:25 UTC, bearophile wrote:
 Brad Anderson:

 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;
From randomSample docs:
Selects a random subsample out of r, containing exactly n 
elements. The order of elements is the same as in the original 
range.<
Bye, bearophile
Hmm, good catch. Not the behavior I expected.
Jul 14 2014
prev sibling parent Joseph Rushton Wakeling via Digitalmars-d-learn writes:
 Alternative:

 randomSample(lowercase, 10, lowercase.length).writeln;
No, I don't think that's appropriate, because it will pick 10 individual characters from a, b, c, ... , z (i.e. no character will appear more than once), and the characters picked will appear in alphabetical order. Incidentally, if lowercase has the .length property, there's no need to pass the length separately to randomSample. Just passing lowercase itself and the number of sample points desired is sufficient.
Jul 16 2014
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:21:36 UTC, bearophile wrote:
 Nordlöw:

 Is there a natural way of generating/filling a 
 string/wstring/dstring of a specific length with random 
 contents?
Do you mean something like this? import std.stdio, std.random, std.ascii, std.range, std.conv; string genRandomString(in size_t len) { return len .iota .map!(_ => lowercase[uniform(0, $)]) .text; } void main() { import std.stdio; 10.genRandomString.writeln; } Bye, bearophile
I was specifically interested in something that exercises (random samples) potentially _all_ code points for string, wstring and dstring (all code units that is).
Jul 14 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:32:51 UTC, Nordlöw wrote:

I believe defining a complete random sampling of all code units 
in dchar is a good start right? This can then be reused to lazily 
convert while filling in a string and wstring.
Jul 14 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:35:59 UTC, Nordlöw wrote:
 On Monday, 14 July 2014 at 22:32:51 UTC, Nordlöw wrote:

 I believe defining a complete random sampling of all code units 
 in dchar is a good start right? This can then be reused to 
 lazily convert while filling in a string and wstring.
isValidCodePoint() at http://dlang.org/phobos/std_encoding.html might be were to start.
Jul 14 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:39:08 UTC, Nordlöw wrote:
 might be were to start.
Is it really this simple? bool isValidCodePoint(dchar c) { return c < 0xD800 || (c >= 0xE000 && c < 0x110000); }
Jul 14 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Nordlöw:

 I believe defining a complete random sampling of all code units 
 in dchar is a good start right? This can then be reused to 
 lazily convert while filling in a string and wstring.
Several combinations of unicode chars are not meaningful/valid (like pairs of ligatures). Any thing that has to work correctly with Unicode is complex. Bye, bearophile
Jul 14 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:39:15 UTC, bearophile wrote:
 Several combinations of unicode chars are not meaningful/valid 
 (like pairs of ligatures). Any thing that has to work correctly 
 with Unicode is complex.
So I guess we need something more than just isValidCodePoint right?
Jul 14 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Monday, 14 July 2014 at 22:45:29 UTC, Nordlöw wrote:
 So I guess we need something more than just isValidCodePoint 
 right?
Here's a first try: https://github.com/nordlow/justd/blob/master/random_ex.d#L53
Jul 14 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Nordlöw:

 https://github.com/nordlow/justd/blob/master/random_ex.d#L53
Isn't trusted mostly for small parts of Phobos code? I suggest to avoid using trusted in most cases. Bye, bearophile
Jul 14 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Tuesday, 15 July 2014 at 00:03:04 UTC, bearophile wrote:
 to avoid using  trusted in most cases.
Could someone elaborate shortly which cases this means?
Jul 15 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Nordlöw:

 Could someone elaborate shortly which cases this means?
All cases where you really can't live without it :-) It's like a cast(. Bye, bearophile
Jul 15 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Tuesday, 15 July 2014 at 18:50:06 UTC, bearophile wrote:
 All cases where you really can't live without it :-) It's like
Hmm. I guess I'm gonna have to remove some trusted tagging then ;)
Jul 15 2014
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Nordlöw:

 I was specifically interested in something that exercises 
 (random samples) potentially _all_ code points for string, 
 wstring and dstring (all code units that is).
That's harder. Generating all uints and then testing if it's a Unicode dchar seems possible. Bye, bearophile
Jul 14 2014
prev sibling parent reply Joseph Rushton Wakeling via Digitalmars-d-learn writes:
On 15/07/14 00:16, "Nordlöw" via Digitalmars-d-learn wrote:
 Is there a natural way of generating/filling a string/wstring/dstring of a
 specific length with random contents?
I think you need to be more specific about what kind of random contents you are interested in having. Are you interested in having each character in the sequence randomly chosen independently of all the others, or do you want a random subset of all available characters (i.e. no character appears more than once), or something else again?
Jul 16 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 16 July 2014 at 23:24:24 UTC, Joseph Rushton 
Wakeling via Digitalmars-d-learn wrote:
 Are you interested in having each character in the sequence 
 randomly chosen independently of all the others, or do you want 
 a random subset of all available characters (i.e. no character 
 appears more than once), or something else again?
Just a random dchar (ciode point) sample like this: /** Generate Random Contents of $(D x). See also: http://forum.dlang.org/thread/emlgflxpgecxsqweauhc forum.dlang.org */ auto ref randInPlace(ref dchar x) trusted { auto ui = uniform(0, 0xD800 + (0x110000 - 0xE000) - 2 // minus two for U+FFFE and U+FFFF ); if (ui < 0xD800) { return x = ui; } else { ui -= 0xD800; ui += 0xE000; // skip undefined if (ui < 0xFFFE) return x = ui; else ui += 2; assert(ui < 0x110000); return x = ui; } } I don't know how well this plays with unittest { import dbg; dln(randomized!dchar); dstring d = "alphaalphaalphaalphaalphaalphaalphaalphaalphaalpha"; dln(d.randomize); } though. See complete logic at https://github.com/nordlow/justd/blob/master/random_ex.d
Jul 17 2014