digitalmars.D - Random string samples & unicode

digitalmars.D - Random string samples & unicode - Reprise

bearophile (40/41) Sep 12 2010 I assume you have missed most of the things I was trying to say, maybe y...

Jonathan M Davis (20/47) Sep 12 2010 You do seem to try to do a lot of things that most other folks never eve...

Andrei Alexandrescu (5/25) Sep 12 2010 I think it's not difficult to infer I wouldn't advocate using 32 bits

bearophile (5/7) Sep 12 2010 Yet, using std.algorithm on strings you may end doing that.

bearophile (6/13) Sep 12 2010 I see.

Jonathan M Davis (15/28) Sep 12 2010 Well, I don't think that I've ever seen a program that did that sort of ...

bearophile (5/9) Sep 12 2010 I understand, this is probably the answer I was looking for, thank you :...

dsimcha (11/20) Sep 12 2010 I think what we need here is an AsciiString type. Such a type would be ...

Jonathan M Davis (9/38) Sep 12 2010 It's not necessarily a bad idea, but I'm not sure that we want to encour...

bearophile (5/11) Sep 12 2010 On the other hand there are situations when you know you are dealing jus...

Daniel Gibson (13/24) Sep 13 2010 o end up

Brad Roberts (4/10) Sep 12 2010 Existence != common.

bearophile (4/7) Sep 12 2010 Please Brad. I didn't mean that you are able to find thousands of string...

Andrej Mitrovic (9/16) Sep 12 2010 The "".join idiom itself is widespread (amongst those who know about

Walter Bright (6/10) Sep 17 2010 Yes, taking random substring samples seems very obscure to me.

Andrei Alexandrescu (6/17) Sep 12 2010 No, you end up having string-processing code dealing with ranges of

bearophile (5/10) Sep 12 2010 Right. My code was written in Python 2.x. In Python 3.x the situation is...

Jonathan M Davis (21/40) Sep 12 2010 Personally, I've had to use strict functions rather than lazy ones in ha...

Andrei Alexandrescu (17/35) Sep 12 2010 Well it's not that common code. How often would one need to generate a

Jonathan M Davis (4/10) Sep 12 2010 Skipping the array() call and going straight to to!string() would certai...
bearophile (15/36) Sep 12 2010 It's not easy to give a good answer to this question. In Python it's nor...

Kagamin (3/11) Sep 12 2010 Why they're chars but not numbers?
Walter Bright (3/9) Sep 17 2010 Generate a 4 digit random integer and convert it to a string. It's proba...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (8/22) Sep 18 2010 Except that the Python version ensures that you don't have the same

Walter Bright (2/4) Sep 18 2010 I didn't know sample() did that.
Kagamin (2/7) Sep 18 2010 So this trick is not good for captcha and password generation.

dsimcha (13/44) Sep 12 2010 have not even read the original post. So I try to explain better a subse...

bearophile (5/6) Sep 12 2010 Good. I was thinking about something similar, but your details are more ...
Pelle (17/61) Sep 13 2010 pp ~% python

bearophile (11/18) Sep 13 2010 On the other hand D/Phobos/DMD have several thousand problems, small, bi...

Walter Bright (2/3) Sep 18 2010 Python has 2507 open issues. http://bugs.python.org/

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

I assume you have missed most of the things I was trying to say, maybe you have
not even read the original post. So I try to explain better a subset of the
things I have written.

This is a quite common piece of Python code:

from random import sample
d = "0123456789"
print "".join(sample(d, 2))


I need to perform the same thing in D.
For me it's not easy to do that in D2 with Phobos2.

This doesn't work:

import std.stdio, std.random, std.array, std.range;
void main() {
    string d = "0123456789";
    string res = array(take(randomCover(d, rndGen), 2));
    writeln(res);
}

It returns:
test.d(4): Error: cannot implicitly convert expression
(array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string


If I change it like this:

import std.stdio, std.random, std.array, std.range;
void main() {
    string d = "0123456789";
    dchar[] res = array(take(randomCover(d, rndGen), 2));
    writeln(res);
}

It doesn't work, and gives a cloud of errors:

...\dmd\src\phobos\std\random.d(890): Error:
cast(dchar)(this._input[this._current]) is not an lvalue
...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string
boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if
(is(CommonType!(T1,UniformRandomNumberGenerator) == void) &&
!is(CommonType!(T1,T2) == void)) does not match any function template
declaration
...\dmd\src\phobos\std\random.d(907): Error: template std.random.uniform(string
boundaries = "[)",T1,T2,UniformRandomNumberGenerator) if
(is(CommonType!(T1,UniformRandomNumberGenerator) == void) &&
!is(CommonType!(T1,T2) == void)) cannot deduce template function from argument
types !()(int,uint,MersenneTwisterEngine!(uint,32,624,397,31,-1727483681u,11,7,-1658038656u,15,-272236544u,18))


If I replace the d string with a dchar[], it works:

import std.stdio, std.random, std.array, std.range;
void main() {
    dchar[] d = "0123456789"d.dup;
    dchar[] res = array(take(randomCover(d, rndGen), 2));
    writeln(res);
}


But now all strings in this little program are dchar arrays.

What I am trying to say is that with the recent changes to the management of
the strings in std.algorithm, when you use strings and char arrays, and you use
algorithms over them, the dchar becomes viral, and you end using in most of the
code composed dchar arrays or dstrings (unless you cast things back to
char[]/string, and I don't know if this is possible in SafeD).

Do you understand now? I am mistaken?

Bye,
bearophile

Sep 12 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Sunday 12 September 2010 17:09:04 bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

 
 I assume you have missed most of the things I was trying to say, maybe you
 have not even read the original post. So I try to explain better a subset
 of the things I have written.
 
 This is a quite common piece of Python code:
 
 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

You do seem to try to do a lot of things that most other folks never even think 
of doing, let alone have a need to. This is one of them. That's probably why 
Andrei reacted the way that he did.

 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.
 
 This doesn't work:
 
 import std.stdio, std.random, std.array, std.range;
 void main() {
     string d = "0123456789";
     string res = array(take(randomCover(d, rndGen), 2));
     writeln(res);
 }
 
 It returns:
 test.d(4): Error: cannot implicitly convert expression
 (array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string

I've found that if you want a string out of array(), what you need to do is 
to!string(array(...))). I don't know about this particular case, and it's a bit 
annoying - particularly when you started with a string in the first place - so 
perhaps take(), and until(), and the others like them that have this problem 
should be altered so that array() would produce a string if you passed them a 
string, but for the moment to!string seems to be the solution.

I would point out, however, that if you're trying to grab random characters
from 
a string, that's likely to work best with a dstring because it supports random 
access, so there's a decent chance that dstring is really what you want anyway, 
and trying to use string is just going to me a lot of conversions no matter how 
well put together the Phobos functions are, simply because the underlying 
algorithm works best with random access and string doesn't provide it. Just one 
of the irritations of UTF-8 vs UTF-16 vs UTF-32. Unicode is wonderful and 
unicode sucks. At least D handles in explicitly as part of the language, which 
is a big improvement over languages like C, C++, or Java.

- Jonathan M Davis

Sep 12 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 09/12/2010 07:28 PM, Jonathan M Davis wrote:
 On Sunday 12 September 2010 17:09:04 bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

 I assume you have missed most of the things I was trying to say,
 maybe you have not even read the original post. So I try to explain
 better a subset of the things I have written.

 This is a quite common piece of Python code:

 from random import sample d = "0123456789" print "".join(sample(d,
 2))

 You do seem to try to do a lot of things that most other folks never
 even think of doing, let alone have a need to. This is one of them.
 That's probably why Andrei reacted the way that he did.

No, it's not that at all. It's just this:

 I'll add it to Bugzilla later. But even if you remove that bug,
 forcing me to use dstrings in the whole program is strange. Or maybe
 it's a good thing, and the natural state for D programs is to just
 use dstrings everywhere. Andrei may offer his opinion on the
 situation.

I think it's not difficult to infer I wouldn't advocate using 32 bits 
characters everywhere.


Andrei

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I think it's not difficult to infer I wouldn't advocate using 32 bits 
 characters everywhere.

Yet, using std.algorithm on strings you may end doing that.

How do you exactly suggest me to translate something like the original Python
code to D?

Bye,
bearophile

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even
think 
 of doing, let alone have a need to. This is one of them.

Choosing few random chars out of a sequence of possible chars is very normal in
Python, it's even a common thing.


 I've found that if you want a string out of array(), what you need to do is 
 to!string(array(...))).

I see.


 if you're trying to grab random characters from 
 a string, that's likely to work best with a dstring because it supports random 
 access, so there's a decent chance that dstring is really what you want anyway,

This may be right. But as I have tried to explain two times, you may end up
having string-processing code made mostly of dstrings.

Bye,
bearophile

Sep 12 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Sunday 12 September 2010 18:13:47 bearophile wrote:
 Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even
 think of doing, let alone have a need to. This is one of them.

 
 Choosing few random chars out of a sequence of possible chars is very
 normal in Python, it's even a common thing.

Well, I don't think that I've ever seen a program that did that sort of thing. 
Of course, I don't program in python (I bought a book on it but haven't gotten 
around to reading it yet), but I suspect that either it's simply an artifact of 
what you are trying to do as opposed to what's typical in python or that it's 
something that's typical to do in python but not other languages (which could
be 
an artifact of which language people use for which task).

 if you're trying to grab random characters from
 a string, that's likely to work best with a dstring because it supports
 random access, so there's a decent chance that dstring is really what
 you want anyway,

 
 This may be right. But as I have tried to explain two times, you may end up
 having string-processing code made mostly of dstrings.

Well, yes. That would be a side effect of using algorithms that need dstrings.
If 
the algorithms that you're using are primarily random access-based, then 
naturally, most of your code will end up using dstrings rather than strings. 
There's no really any way around that unless you want to keep translating back 
and forth. If your string processing doesn't require random access, then you 
avoid the problem, but as long as it needs random access, you're pretty much 
stuck.

- Jonathan M Davis

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of thing.

It's common Python code (and maybe in future it will be common D2 code). In
another answer I have given few examples to Andrei.


 If your string processing doesn't require random access, then you 
 avoid the problem, but as long as it needs random access, you're pretty much 
 stuck.

I understand, this is probably the answer I was looking for, thank you :-)

Bye,
bearophile

Sep 12 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of thing.

 It's common Python code (and maybe in future it will be common D2 code). In

another answer I have given few examples to Andrei.
 If your string processing doesn't require random access, then you
 avoid the problem, but as long as it needs random access, you're pretty much
 stuck.

 I understand, this is probably the answer I was looking for, thank you :-)
 Bye,
 bearophile

I think what we need here is an AsciiString type.  Such a type would be a thin
wrapper over char[], or maybe immutable(char)[] for added safety.  On
construction
it would enforce that the underlying string does not contain any multiple byte
characters.  It would only allow appending of chars, not wchars or dchars.  If
you
appended a regular to it, it would throw if the appended string contained any
characters that couldn't be represented in a single byte.  It would be a random
access range of chars with lvalue elements, and would provide a way of
documenting
the assumption that you're only working with ASCII, and a mechanism for
verifying
this assumption at runtime.

Sep 12 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Sunday 12 September 2010 19:15:10 dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of
 thing.

 
 It's common Python code (and maybe in future it will be common D2 code).
 In

 
 another answer I have given few examples to Andrei.
 
 If your string processing doesn't require random access, then you
 avoid the problem, but as long as it needs random access, you're pretty
 much stuck.

 
 I understand, this is probably the answer I was looking for, thank you
 :-) Bye,
 bearophile

 
 I think what we need here is an AsciiString type.  Such a type would be a
 thin wrapper over char[], or maybe immutable(char)[] for added safety.  On
 construction it would enforce that the underlying string does not contain
 any multiple byte characters.  It would only allow appending of chars, not
 wchars or dchars.  If you appended a regular to it, it would throw if the
 appended string contained any characters that couldn't be represented in a
 single byte.  It would be a random access range of chars with lvalue
 elements, and would provide a way of documenting the assumption that
 you're only working with ASCII, and a mechanism for verifying this
 assumption at runtime.

It's not necessarily a bad idea, but I'm not sure that we want to encourage
code 
that assumes ASCII. It's far too easy for English-speaking programmers to end
up 
making that assumption in their code and then they run into problems later when 
they unexpectedly end up with unicode characters in their input, or they have
to 
change their code to work with unicode. I'm inclined to force the issue and
keep 
the status quo that _all_ strings in D are unicode of some variety. There's far 
too much code out there which is not unicode compliant when it should be.

- Jonathan m Davis

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 It's not necessarily a bad idea,

I don't know if it's a good idea.


 but I'm not sure that we want to encourage code 
 that assumes ASCII. It's far too easy for English-speaking programmers to end
up 
 making that assumption in their code and then they run into problems later
when 
 they unexpectedly end up with unicode characters in their input, or they have
to 
 change their code to work with unicode.

On the other hand there are situations when you know you are dealing just with
digits, or few predetermined symbols like ()+-*/", or when you process very
large biological strings that are composed by a restricted and limited number
of different ASCII chars.

Bye,
bearophile

Sep 12 2010

Daniel Gibson <metalcaedes gmail.com> writes:

On Mon, Sep 13, 2010 at 4:50 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Jonathan M Davis:

 It's not necessarily a bad idea,

 I don't know if it's a good idea.


 but I'm not sure that we want to encourage code
 that assumes ASCII. It's far too easy for English-speaking programmers t=


o end up
 making that assumption in their code and then they run into problems lat=


er when
 they unexpectedly end up with unicode characters in their input, or they=


 have to
 change their code to work with unicode.

 On the other hand there are situations when you know you are dealing just=

 with digits, or few predetermined symbols like ()+-*/", or when you proces=
s very large biological strings that are composed by a restricted and limit=
ed number of different ASCII chars.
 Bye,
 bearophile

Can't you just use byte[] for that? If you're 100% sure your string
only contains ASCII characters, you can just cast it to byte[], feed
that into algorithms and cast it back to char[] afterwards, I guess.

Cheers,
- Daniel

Sep 13 2010

Brad Roberts <braddr puremagic.com> writes:

On 9/12/2010 7:09 PM, bearophile wrote:
 Jonathan M Davis:
 Well, I don't think that I've ever seen a program that did that sort of
 thing.

 
 It's common Python code (and maybe in future it will be common D2 code). In
 another answer I have given few examples to Andrei.

Existence != common.

27 hits among millions != common.

I think you're viewpoint might be a little skewed.

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Brad Roberts:
 Existence != common.
 27 hits among millions != common.
 I think you're viewpoint might be a little skewed.

Please Brad. I didn't mean that you are able to find thousands of strings
""".join(sample(d, 2))" in Python code around the world. What I meant to say is
that in Python2 that's a very natural idiom. I have no idea how to demonstrate
this last statement of mine.

Bye,
bearophile

Sep 12 2010

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

The "".join idiom itself is widespread (amongst those who know about
it, at least). It's mentioned in several books and Python tutorials.
As for taking random string samples, I've never used it so I can't
judge whether it's common or not.

On Mon, Sep 13, 2010 at 4:34 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Brad Roberts:
 Existence !=3D common.
 27 hits among millions !=3D common.
 I think you're viewpoint might be a little skewed.

 Please Brad. I didn't mean that you are able to find thousands of strings=

 """.join(sample(d, 2))" in Python code around the world. What I meant to s=
ay is that in Python2 that's a very natural idiom. I have no idea how to de=
monstrate this last statement of mine.
 Bye,
 bearophile

Sep 12 2010

Walter Bright <newshound2 digitalmars.com> writes:

Andrej Mitrovic wrote:
 The "".join idiom itself is widespread (amongst those who know about
 it, at least). It's mentioned in several books and Python tutorials.
 As for taking random string samples, I've never used it so I can't
 judge whether it's common or not.

Yes, taking random substring samples seems very obscure to me.

Sure, taking a random index into a string may wind up in the middle of a UTF8 
sequence. But, in practice, indices into strings are not random. They are the 
result of some other operation on a string, and so they point to the start of a 
UTF8 sequence.

Sep 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 09/12/2010 08:13 PM, bearophile wrote:
 Jonathan M Davis:
 You do seem to try to do a lot of things that most other folks never even think
 of doing, let alone have a need to. This is one of them.

 Choosing few random chars out of a sequence of possible chars is very normal
in Python, it's even a common thing.


 I've found that if you want a string out of array(), what you need to do is
 to!string(array(...))).

 I see.


 if you're trying to grab random characters from
 a string, that's likely to work best with a dstring because it supports random
 access, so there's a decent chance that dstring is really what you want anyway,

 This may be right. But as I have tried to explain two times, you may end up
having string-processing code made mostly of dstrings.

No, you end up having string-processing code dealing with ranges of 
dchar. Which is in fact exactly as it should. If you want to keep the 
comparison with Python complete, Python's support for Unicode also needs 
to be part of the discussion.

Andrei

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 No, you end up having string-processing code dealing with ranges of 
 dchar.

Well, in several situations it's better to produce a real string/dstring. Even
in Haskell, that is designed to manage lazy computation well, you sometimes
create eager lists/arrays to simplify the types or the code or to make the code
more deterministic.


 If you want to keep the 
 comparison with Python complete, Python's support for Unicode also needs 
 to be part of the discussion.

Right. My code was written in Python 2.x. In Python 3.x the situation is
different, all strings are Unicode on default (they are all UTF 16 or UTF 32
according to the way you have compiled CPython) (and there is a built-in
bytearray, that is an array of bytes that in some situations is seen as an
ASCII string). So in Python it's like using dstrings everywere (in Python
there's no char type, it's a string of length 1) or using lazy generators of
them.

Bye,
bearophile

Sep 12 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Sunday 12 September 2010 19:22:02 bearophile wrote:
 Andrei Alexandrescu:
 No, you end up having string-processing code dealing with ranges of
 dchar.

 
 Well, in several situations it's better to produce a real string/dstring.
 Even in Haskell, that is designed to manage lazy computation well, you
 sometimes create eager lists/arrays to simplify the types or the code or
 to make the code more deterministic.

Personally, I've had to use strict functions rather than lazy ones in haskell 
primarily to save memory by forcing the program to actually do the computations 
rather than putting it off and piling up the whole list of operations to
possibly 
do later in memory. When working on my thesis, I had a program which made me
run 
out of memory - all 4 GB of memory and 6GB of swap - because it wasn't 
processing _any_ of the files that I gave it until it had gotten the last one.
I 
had to make it process each file and save the result before processing the next 
file rather than processing them all and then saving the result.

 If you want to keep the
 comparison with Python complete, Python's support for Unicode also needs
 to be part of the discussion.

 
 Right. My code was written in Python 2.x. In Python 3.x the situation is
 different, all strings are Unicode on default (they are all UTF 16 or UTF
 32 according to the way you have compiled CPython) (and there is a
 built-in bytearray, that is an array of bytes that in some situations is
 seen as an ASCII string). So in Python it's like using dstrings everywere
 (in Python there's no char type, it's a string of length 1) or using lazy
 generators of them.

Well, then in comparing python 3 with D, it would then seem like you wouldn't 
really lose anything to be using dstrings everywhere. Sure, it's nice to be
able 
to save space by using string, but if it's a comparison between python and D
and 
you end up using UTF-32 in both, then it doesn't seem to me that it's all that 
big a deal when porting code. Now, in comparing python 2 and D, that may be a 
different issue, but it sounds like the python 2 strings aren't unicode, which 
could be problematic.

The issues with UTF-8 vs UTF-32 and random access are just a natural
side-effect 
of having all strings be unicode. And honestly, I _really_ don't want having 
non-unicode strings to be at all normal in D. The fact that D forces unicode is 
a _good_ thing.

- Jonathan M Davis

Sep 12 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 09/12/2010 07:09 PM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

 I assume you have missed most of the things I was trying to say, maybe you
have not even read the original post. So I try to explain better a subset of
the things I have written.

 This is a quite common piece of Python code:

 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

Well it's not that common code. How often would one need to generate a 
string that contains two random but distinct digits?

 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.

 This doesn't work:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      string res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It returns:
 test.d(4): Error: cannot implicitly convert expression
(array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string

The code compiles and runs as written on my system. I think it's David 
Simcha who changed the return type to ForEachType!Range[]. I'm not sure 
I agree with that, as it takes an oddity of foreach that I hoped would 
go away some time and propagates it.

About the original problem: strings are bidirectional ranges of dchar, 
which is the way they ought to be. Algorithms used on top of strings 
will inherently traffic in dchar. If you want to get a string back, this 
should work:

string res = to!string(take(randomCover(d, rndGen), 2));

That doesn't work for a different reason, and is a bug worth filing. In 
fact - no need, I just submitted a fix 
(http://www.dsource.org/projects/phobos/changeset/1988). Thanks for 
bringing this up!


Andrei

Sep 12 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Sunday 12 September 2010 18:25:27 Andrei Alexandrescu wrote:
 string res = to!string(take(randomCover(d, rndGen), 2));
 
 That doesn't work for a different reason, and is a bug worth filing. In
 fact - no need, I just submitted a fix
 (http://www.dsource.org/projects/phobos/changeset/1988). Thanks for
 bringing this up!

Skipping the array() call and going straight to to!string() would certainly 
clean this sort of code up.

- Jonathan M Davis

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

 
 Well it's not that common code. How often would one need to generate a 
 string that contains two random but distinct digits?

It's not easy to give a good answer to this question. In Python it's normal
code, almost common.
Google Code Search gives 27 answers:
http://www.google.com/codesearch?hl=en&lr=&q=%22.join%28sample%28%22+lang%3Apython&sbtn=Search

Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's
similar to MasterMind, at the beginning you need to generate the secret key,
four random distinct digits, that later are used in the program, the user has
to guess them using the number of right items in the right place, or right
items in the wrong place. To generate the key in Python you may use
"".join(sample(d, 4)).


 The code compiles and runs as written on my system.

Sorry. I have used the normal DMD 2.048, I don't use the svn head :-)


 I think it's David 
 Simcha who changed the return type to ForEachType!Range[]. I'm not sure 
 I agree with that, as it takes an oddity of foreach that I hoped would 
 go away some time and propagates it.

I see. If there is something you don't like about this situation, then I think
it's a good moment to discuss it :-)


 About the original problem: strings are bidirectional ranges of dchar, 
 which is the way they ought to be. Algorithms used on top of strings 
 will inherently traffic in dchar. If you want to get a string back, this 
 should work:
 
 string res = to!string(take(randomCover(d, rndGen), 2));

OK, I accept this (but what you have just said has some consequences). Thank
you for your answer.

With one of my suggestions:
http://d.puremagic.com/issues/show_bug.cgi?id=4851
that line becomes
string res = to!string(take(randomCover(d), 2));


 That doesn't work for a different reason, and is a bug worth filing. In 
 fact - no need, I just submitted a fix 
 (http://www.dsource.org/projects/phobos/changeset/1988). Thanks for 
 bringing this up!

You are welcome and thank you for the answers and the fix.

Bye,
bearophile

Sep 12 2010

Kagamin <spam here.lot> writes:

bearophile Wrote:

 string that contains two random but distinct digits?

 
 It's not easy to give a good answer to this question. In Python it's normal
code, almost common.
 Google Code Search gives 27 answers:
 http://www.google.com/codesearch?hl=en&lr=&q=%22.join%28sample%28%22+lang%3Apython&sbtn=Search
 

Well, captcha is a good example, but simple to!string(rand()) is ok as a
password generator.

 Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's
similar to MasterMind, at the beginning you need to generate the secret key,
four random distinct digits, that later are used in the program, the user has
to guess them using the number of right items in the right place, or right
items in the wrong place. To generate the key in Python you may use
"".join(sample(d, 4)).
 

Why they're chars but not numbers?

Sep 12 2010

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 Think about a "Bulls and cows" game (it's a task of Rosettacode site), it's
 similar to MasterMind, at the beginning you need to generate the secret key,
 four random distinct digits, that later are used in the program, the user has
 to guess them using the number of right items in the right place, or right
 items in the wrong place. To generate the key in Python you may use
 "".join(sample(d, 4)).

Generate a 4 digit random integer and convert it to a string. It's probably a 
lot more efficient than the Python version.

Sep 17 2010

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Walter Bright wrote:
 bearophile wrote:
 Think about a "Bulls and cows" game (it's a task of Rosettacode site),=


 it's
 similar to MasterMind, at the beginning you need to generate the
 secret key,
 four random distinct digits, that later are used in the program, the
 user has
 to guess them using the number of right items in the right place, or
 right
 items in the wrong place. To generate the key in Python you may use
 "".join(sample(d, 4)).

=20
 Generate a 4 digit random integer and convert it to a string. It's
 probably a lot more efficient than the Python version.

	Except that the Python version ensures that you don't have the same
digit twice, which just generating a 4 digits random integer won't...

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Sep 18 2010

Walter Bright <newshound2 digitalmars.com> writes:

Jérôme M. Berger wrote:
 	Except that the Python version ensures that you don't have the same
 digit twice, which just generating a 4 digits random integer won't...

I didn't know sample() did that.

Sep 18 2010

Kagamin <spam here.lot> writes:

Jérôme M. Berger Wrote:

 Generate a 4 digit random integer and convert it to a string. It's
 probably a lot more efficient than the Python version.

 
 	Except that the Python version ensures that you don't have the same
 digit twice, which just generating a 4 digits random integer won't...

So this trick is not good for captcha and password generation.

Sep 18 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 09/12/2010 07:09 PM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

 I assume you have missed most of the things I was trying to say, maybe you


have not even read the original post. So I try to explain better a subset of the
things I have written.
 This is a quite common piece of Python code:

 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))

 Well it's not that common code. How often would one need to generate a
 string that contains two random but distinct digits?
 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.

 This doesn't work:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      string res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It returns:
 test.d(4): Error: cannot implicitly convert expression


(array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string
 The code compiles and runs as written on my system. I think it's David
 Simcha who changed the return type to ForEachType!Range[]. I'm not sure
 I agree with that, as it takes an oddity of foreach that I hoped would
 go away some time and propagates it.

Just to clear up some confusion, I specialized array() for narrow strings so it
always returns a dchar[] instead of using ForeachType.  Therefore, the behavior
is
effectively the same as before I changed array() to work with opApply, when it
used ElementType.

I figured there's two use cases for calling array() on a narrow string:  Generic
code and non-generic code.  In generic code you want to be able to assume that
the
array returned will be a random access range with lvalue elements like every
array
type besides narrow strings is.  In non-generic code you can just use std.conv
to
get exactly the type you want.

Sep 12 2010

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 I think what we need here is an AsciiString type.

Good. I was thinking about something similar, but your details are more
developed. In Python3 there is something vaguely similar, named bytearray.
(Eventually I may like a Dstring-like struct that contains compressed dchars
and allows for random access with [] with complexity O(ln n). The compression
allows to reduce cache misses and reduces memory traffic, improving performance
in some special situations. But this type is for later).

Bye,
bearophile

Sep 12 2010

Pelle <pelle.mansson gmail.com> writes:

On 09/13/2010 02:09 AM, bearophile wrote:
 Andrei Alexandrescu:
 This goes into "bearophile's odd posts coming now and then".

 I assume you have missed most of the things I was trying to say, maybe you
have not even read the original post. So I try to explain better a subset of
the things I have written.

 This is a quite common piece of Python code:

 from random import sample
 d = "0123456789"
 print "".join(sample(d, 2))


 I need to perform the same thing in D.
 For me it's not easy to do that in D2 with Phobos2.

 This doesn't work:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      string res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It returns:
 test.d(4): Error: cannot implicitly convert expression
(array(take(randomCover(d,rndGen()),2u))) of type dchar[] to string


 If I change it like this:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      string d = "0123456789";
      dchar[] res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }

 It doesn't work, and gives a cloud of errors:

 ...\dmd\src\phobos\std\random.d(890): Error:
cast(dchar)(this._input[this._current]) is not an lvalue
 ...\dmd\src\phobos\std\random.d(907): Error: template
std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator)
if (is(CommonType!(T1,UniformRandomNumberGenerator) == void)&& 
!is(CommonType!(T1,T2) == void)) does not match any function template
declaration
 ...\dmd\src\phobos\std\random.d(907): Error: template
std.random.uniform(string boundaries = "[)",T1,T2,UniformRandomNumberGenerator)
if (is(CommonType!(T1,UniformRandomNumberGenerator) == void)&& 
!is(CommonType!(T1,T2) == void)) cannot deduce template function from argument
types !()(int,uint,MersenneTwisterEngine!(uint,32,624,397,31,-1727483681u,11,7,-1658038656u,15,-272236544u,18))


 If I replace the d string with a dchar[], it works:

 import std.stdio, std.random, std.array, std.range;
 void main() {
      dchar[] d = "0123456789"d.dup;
      dchar[] res = array(take(randomCover(d, rndGen), 2));
      writeln(res);
 }


 But now all strings in this little program are dchar arrays.

 What I am trying to say is that with the recent changes to the management of
the strings in std.algorithm, when you use strings and char arrays, and you use
algorithms over them, the dchar becomes viral, and you end using in most of the
code composed dchar arrays or dstrings (unless you cast things back to
char[]/string, and I don't know if this is possible in SafeD).

 Do you understand now? I am mistaken?

 Bye,
 bearophile

pp ~% python
Python 2.6.5 (r265:79063, Apr  1 2010, 05:22:20)
[GCC 4.4.3 20100316 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 from random import sample
 "��"



'\xc3\xa4\xc3\xb6'
 "".join(sample("��", 2))



'\xb6\xc3'

Doesn't work with utf8. The D version is clearly superior. :-)

pp ~/dee% cat test.d | tail -50 | head -8
void main() {


     string s = "��";

     writeln(take(randomCover(to!dstring(s), rndGen), 2));

     return;
pp ~/dee% rdmd test.d
��
pp ~/dee% rdmd test.d
��

Sep 13 2010

bearophile <bearophileHUGS lycos.com> writes:

Pelle:

  >>> from random import sample
  >>> "��"
 '\xc3\xa4\xc3\xb6'
  >>> "".join(sample("��", 2))
 '\xb6\xc3'
 
 Doesn't work with utf8. The D version is clearly superior. :-)

On the other hand D/Phobos/DMD have several thousand problems, small, big and
HUGE, that Python lacks :-)

You are using Python 2.6.5, where you need to use unicode strings ("u" prefix).
This works correctly on both Windows and Linux with Python 2.6.6, if your
source code is UTF-8:


from random import sample
print u"��".encode("utf-8")
print "".join(sample(u"��", 2)).encode("utf-8")


The strings have being changed in Python3.x, where they are the default. So
there is no need to use the "u" prefix.

Mine was not a comparison, and it didn't have the purpose to show that Python
is better, it was a way to put in the limelight a possible problem with Phobos.

Bye,
bearophile

Sep 13 2010

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 On the other hand D/Phobos/DMD have several thousand problems, small, big and
HUGE, that Python lacks :-)

Python has 2507 open issues. http://bugs.python.org/

Sep 18 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Random string samples & unicode - Reprise