digitalmars.D - Re: String implementations

bearophile <bearophileHUGS lycos.com> Jan 19 2008

Walter Bright <newshound1 digitalmars.com> Jan 20 2008
Sean Kelly <sean f4.ca> Jan 20 2008

bearophile <bearophileHUGS lycos.com> writes:

Janice Caron:
Also, isn't perl an interpreted language? You can get away with a lot
more in an interpreted language, but you pay the price in speed.

I'm not a Perl expert, and I don't know how well Perl manages Unicode (maybe
Python manages Unicode better than Perl), but Perl was designed to process
text, so if you process strings you will find that Perl is pretty *fast*, it's
easy to write Perl programs that process text faster (and in a more flexible
way) than C++ ones... (Note that Python 3.0 will manage unicode strings as
default).

For example if you use Python dicts (AAs) with strings they seem faster than
current DMD AAs, and probably that's true for Perl ones too. This was a tiny
example:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=57986

Perl and Python have GC that is well refined, so it may be faster than the
current DMD GC if you manage lot of strings, this was an example where D was
slower than Py too:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=62369

With Python you can also use Psyco, that's a JIT, to speed it up, etc. Psyco
uses tricks to avoid actually copying strings and string slices in most cases,
because Python strings are immutables (Python copies them when you perform a
slice), like D too does.

REs in current DMD are *way* slower than Perl/Python/Tcl ones, etc. Some time
ago I have found a situation where the RE sub() of D looks O(n^2):
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=dlang&id=4

String methods of Python are written in a really refined C, like this one:
http://effbot.org/zone/stringlib.htm
And they are usually faster than the not-refined versions you can find in the
current Phobos. I have implemented and I use a fastJoin an xsplit, etc, faster
then the Phobos ones.

The built-in sort of Python is the Timsort, that's way faster than the D
built-in (I have written a rather simple sort that is up to 3 times faster than
the built in in D, and it's always faster no matter what data I use).

Now and then the text I/O on disk of the current DMD is slower than Python,
this comes from some of my benchmarks.

I know all those parts of DMD can be improved later. When you create a new
language you can't (and you don't want to) optimize every little bit (because
it may be premature optimization), optimizazion must come later, so I
understand Walter in this regard. But all this is just to show you that if
today you have to process lot of text in a very flexible way it's not easy to
beat the languages like Perl (but Python/Ruby/Tcl too. Ruby is less good than
Python for Unicode texts, I think) designed for it.

If you take a look near the bottom of this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/0b3ded6d0f494d06/0068cb1406ab9e4c
you can see that I'd like to use D to speed up some text-processing-related
bioinformatics scripts of mine, but often I find that the Python programs are
faster for that purpose ;-)

Bye,
bearophile

Jan 19 2008

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 The built-in sort of Python is the Timsort, that's way faster than
 the D built-in (I have written a rather simple sort that is up to 3
 times faster than the built in in D, and it's always faster no matter
 what data I use).


D's sort is in phobos/internal/qsort.d and qsort.d. If you have a faster 
qsort, and want to contribute it, please do so! Same goes for other 
faster routines you've written.

 Now and then the text I/O on disk of the current DMD is slower than
 Python, this comes from some of my benchmarks.


The D 2.0 I/O is much faster than the 1.0 I/O. But it still suffers a 
bit from the requirement (I imposed) of being compatible with C stdio. I 
don't know if Python does this or not.

 
 I know all those parts of DMD can be improved later. When you create
 a new language you can't (and you don't want to) optimize every
 little bit (because it may be premature optimization), optimizazion
 must come later, so I understand Walter in this regard. But all this
 is just to show you that if today you have to process lot of text in
 a very flexible way it's not easy to beat the languages like Perl
 (but Python/Ruby/Tcl too. Ruby is less good than Python for Unicode
 texts, I think) designed for it.


I don't believe there are any fundamental reasons why D string 
processing should be slower, it's just spending the effort on it.

Jan 20 2008

Sean Kelly <sean f4.ca> writes:

bearophile wrote:
 The built-in sort of Python is the Timsort, that's way faster than the D
built-in (I have written a rather simple sort that is up to 3 times faster than
the built in in D, and it's always faster no matter what data I use).


I'd be interested in seeing that.  I've been able to beat the D sort for
some data sets and match it in others, but not beat it across the board.


Sean

Jan 20 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Re: String implementations