www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Random string samples & unicode

reply bearophile <bearophileHUGS lycos.com> writes:
The need to take a random sample without replacement is very common. For
example this is how in Python 2.x I create a random string without replacement
of fixed size from a input string of chars:

from random import sample
d = "0123456789"
print "".join(sample(d, 2))


This seems similar D2 code:

import std.stdio, std.random, std.array, std.range;
void main() {
    dchar[] d = "0123456789"d.dup;
    dchar[] res = array(take(randomCover(d, rndGen), 2));
    writeln(res);
}


There randomCover() doesn't work with a string, a dstrings or with a char[]. If
later you need to process that res dchar[] with std.string you will have
troubles.


But randomShuffle() is able to shuffle a char[] in place:

import std.stdio, std.random;
void main() {
    char[] d = "0123456789".dup;
    randomShuffle(d);
    writeln(d);
}


If randomCover() receives a char[] I think in theory it has to yield its
shuffled chars. And if it receives a string it has to yield its shuffled dchars
(converted from the chars). A string may contain UFT8 chars that are longer
than 1 byte, but a char[] is not a string, and if you want its items in random
order, it has to act like randomShuffle().

My head hurts, and I don't know what the right thing to do is.

Maybe I have to work with ubyte[] instead of char[], and add casts:

import std.stdio, std.random, std.array, std.range;
void main() {
    char[] d = "0123456789".dup;
    char[] res = cast(char[])array(take(randomCover(cast(ubyte[])d, rndGen),
2));
    writeln(res);
}


Ideas welcome.

Bye,
bearophile
Sep 10 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
 There randomCover() doesn't work with a string, a dstrings or with a char[].
 If later you need to process that res dchar[] with std.string you will have
troubles.
The problems are more widespread, this is a simple generator of terms of the "look and say" sequence (to generate a member of the sequence from the previous member, read off the digits of the previous member, counting the number of digits in groups of the same digit: http://en.wikipedia.org/wiki/Look_and_say_sequence ): import std.stdio, std.conv, std.algorithm; string lookAndSay(string input) { string result; foreach (g; group(input)) result ~= to!string(g._1) ~ (cast(char)g._0); return result; } void main() { string last = "1"; writeln(last); foreach (i; 0 .. 10) { last = lookAndSay(last); writeln(last); } } I was not able to remove that cast(char), even if I replace all strings in that program with dstrings. Is someone else using D2? Bye, bearophile
Sep 11 2010
parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I think this might be a compiler bug:


import std.conv : to;

void main()
{
    string mystring;
    dchar mydchar;

    // ok, appending dchar to string
    mystring ~=3D mydchar;

    // error:  incompatible types for
    // ((cast(uint)mydchar) ~ (cast(uint)mydchar)): 'uint' and 'uint'
    mystring ~=3D mydchar ~ mydchar;
}


On Sat, Sep 11, 2010 at 3:42 PM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 There randomCover() doesn't work with a string, a dstrings or with a cha=
r[].
 If later you need to process that res dchar[] with std.string you will h=
ave troubles.
 The problems are more widespread, this is a simple generator of terms of =
the "look and say" sequence (to generate a member of the sequence from the = previous member, read off the digits of the previous member, counting the n= umber of digits in groups of the same digit: http://en.wikipedia.org/wiki/L= ook_and_say_sequence ):
 import std.stdio, std.conv, std.algorithm;

 string lookAndSay(string input) {
 =A0 =A0string result;
 =A0 =A0foreach (g; group(input))
 =A0 =A0 =A0 =A0result ~=3D to!string(g._1) ~ (cast(char)g._0);
 =A0 =A0return result;
 }

 void main() {
 =A0 =A0string last =3D "1";
 =A0 =A0writeln(last);
 =A0 =A0foreach (i; 0 .. 10) {
 =A0 =A0 =A0 =A0last =3D lookAndSay(last);
 =A0 =A0 =A0 =A0writeln(last);
 =A0 =A0}
 }


 I was not able to remove that cast(char), even if I replace all strings i=
n that program with dstrings.
 Is someone else using D2?

 Bye,
 bearophile
Sep 11 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 I think this might be a compiler bug:
I'll add it to Bugzilla later. But even if you remove that bug, forcing me to use dstrings in the whole program is strange. Or maybe it's a good thing, and the natural state for D programs is to just use dstrings everywhere. Andrei may offer his opinion on the situation. Bye, bearophile
Sep 11 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/11/10 10:24 CDT, bearophile wrote:
 Andrej Mitrovic:

 I think this might be a compiler bug:
I'll add it to Bugzilla later. But even if you remove that bug, forcing me to use dstrings in the whole program is strange. Or maybe it's a good thing, and the natural state for D programs is to just use dstrings everywhere. Andrei may offer his opinion on the situation. Bye, bearophile
This goes into "bearophile's odd posts coming now and then". Andrei
Sep 11 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".
You aren't helping solve those problems. Bye, bearophile
Sep 11 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/11/10 9:48 CDT, Andrej Mitrovic wrote:
 I think this might be a compiler bug:


 import std.conv : to;

 void main()
 {
      string mystring;
      dchar mydchar;

      // ok, appending dchar to string
      mystring ~= mydchar;

      // error:  incompatible types for
      // ((cast(uint)mydchar) ~ (cast(uint)mydchar)): 'uint' and 'uint'
      mystring ~= mydchar ~ mydchar;
 }
You can't concatenate two integrals. Andrei
Sep 11 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 You can't concatenate two integrals.
The compiler has full type information, so what's wrong in concatenating two char or two dchar into a string or dstring? And I think there are other problems: http://d.puremagic.com/issues/show_bug.cgi?id=4853 Bye, bearophile
Sep 11 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
 The compiler has full type information, so what's wrong in concatenating two
 char or two dchar into a string or dstring?
But in C the ~ among two chars has a different meaning, so in D you may at best disallow it.
 And I think there are other problems:
 http://d.puremagic.com/issues/show_bug.cgi?id=4853
So that's invalid, I have closed it. Using a bit of contortions it's possible to write lookAndSay() with no casts, but the code is not good still: import std.stdio, std.conv, std.algorithm; string lookAndSay(string input) { string result; foreach (g; group(input)) { string s = to!string(g._1); s ~= g._0; // string ~ dchar wrong, string ~= dchar good result ~= s; } return result; } void main() { string last = "1"; writeln(last); foreach (i; 0 .. 10) { last = lookAndSay(last); writeln(last); } } Bye, bearophile
Sep 11 2010
parent bearophile <bearophileHUGS lycos.com> writes:
     foreach (g; group(input)) {
         string s = to!string(g._1);
         s ~= g._0; // string ~ dchar wrong, string ~= dchar good
         result ~= s;
     }
Shorter: foreach (g; group(input)) result ~= text(g._1, g._0); bearophile
Sep 11 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 11 Sep 2010 13:20:25 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Andrei Alexandrescu:
 You can't concatenate two integrals.
The compiler has full type information, so what's wrong in concatenating two char or two dchar into a string or dstring?
It's ambiguous also: string s1 = "abc", s2 = "def"; auto x = s1 ~ s2; would you expect x to be "abcdef" or ["abc", "def"]? Essentially, one of the arguments to concatenation must be an array type in order to avoid ambiguity. Fortunately, you can get the results you wish with the bracket notation: auto x = [s1, s2]; -Steve
Sep 13 2010
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 Well, then in comparing python 3 with D, [...]
If you want to discuss about Python2/Python3 I think that the Python newsgroup is a better place. I know this sounds like a bit rough answer, but Python in the end is OT here, and most people here show some ignorance about Python matters. ----------------------- Daniel Gibson:
 Can't you just use byte[] for that? If you're 100% sure your string
 only contains ASCII characters, you can just cast it to byte[], feed
 that into algorithms and cast it back to char[] afterwards, I guess.
ubyte[] sounds better :-) (Yes, I'd like D to use sbyte/ubyte names). Yes, that's what I sometimes do. The usage of ubyte[] is the last possible solution I have suggested in my first post on this dual thread: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=117206 But that strategy needs casts, I don't like casts a lot, and I don't know how much SafeD supports casts (and you may want to use SafeD for the typically small script-like programs used for some intermediate processing of biological information). I think that generally it's better to use strategies that avoid casts. dsimcha's AsciiString may be able to reduce the need of casts. ----------------------- Kagamin:
 Why they're chars but not numbers?
I presume it's mostly a matter of taste. There's no need to use chars here, but in scripting languages (especially Tcl) you sometimes use strings/chars even in situations where in C you want to use just numbers. Strings in Python are very handy to use, safe, compact in both memory and visual representation on screen. ----------------------- Steven Schveighoffer:
 Fortunately, you can get the results you wish with the bracket notation:
 auto x = [s1, s2];
Right, thank you. Bye, bearophile
Sep 13 2010
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, September 13, 2010 10:45:48 bearophile wrote:
 Jonathan M Davis:
 Well, then in comparing python 3 with D, [...]
If you want to discuss about Python2/Python3 I think that the Python newsgroup is a better place. I know this sounds like a bit rough answer, but Python in the end is OT here, and most people here show some ignorance about Python matters.
I wasn't really trying discuss python 2 vs 3 so much as point out that while you were lamenting the issues with porting python 2 code to D, it looks like the situation that you described for python 3 is essentially the same as for D, so it's a non-issue with regards to porting python 3 code. - Jonathan M Davis
Sep 13 2010