www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: String implementations

reply bearophile <bearophileHUGS lycos.com> writes:
Janice Caron:
 Also, isn't perl an interpreted language? You can get away with a lot
 more in an interpreted language, but you pay the price in speed.

I'm not a Perl expert, and I don't know how well Perl manages Unicode (maybe Python manages Unicode better than Perl), but Perl was designed to process text, so if you process strings you will find that Perl is pretty *fast*, it's easy to write Perl programs that process text faster (and in a more flexible way) than C++ ones... (Note that Python 3.0 will manage unicode strings as default). For example if you use Python dicts (AAs) with strings they seem faster than current DMD AAs, and probably that's true for Perl ones too. This was a tiny example: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=57986 Perl and Python have GC that is well refined, so it may be faster than the current DMD GC if you manage lot of strings, this was an example where D was slower than Py too: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=62369 With Python you can also use Psyco, that's a JIT, to speed it up, etc. Psyco uses tricks to avoid actually copying strings and string slices in most cases, because Python strings are immutables (Python copies them when you perform a slice), like D too does. REs in current DMD are *way* slower than Perl/Python/Tcl ones, etc. Some time ago I have found a situation where the RE sub() of D looks O(n^2): http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=dlang&id=4 String methods of Python are written in a really refined C, like this one: http://effbot.org/zone/stringlib.htm And they are usually faster than the not-refined versions you can find in the current Phobos. I have implemented and I use a fastJoin an xsplit, etc, faster then the Phobos ones. The built-in sort of Python is the Timsort, that's way faster than the D built-in (I have written a rather simple sort that is up to 3 times faster than the built in in D, and it's always faster no matter what data I use). Now and then the text I/O on disk of the current DMD is slower than Python, this comes from some of my benchmarks. I know all those parts of DMD can be improved later. When you create a new language you can't (and you don't want to) optimize every little bit (because it may be premature optimization), optimizazion must come later, so I understand Walter in this regard. But all this is just to show you that if today you have to process lot of text in a very flexible way it's not easy to beat the languages like Perl (but Python/Ruby/Tcl too. Ruby is less good than Python for Unicode texts, I think) designed for it. If you take a look near the bottom of this thread: http://groups.google.com/group/comp.lang.python/browse_thread/thread/0b3ded6d0f494d06/0068cb1406ab9e4c you can see that I'd like to use D to speed up some text-processing-related bioinformatics scripts of mine, but often I find that the Python programs are faster for that purpose ;-) Bye, bearophile
Jan 19 2008
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 The built-in sort of Python is the Timsort, that's way faster than
 the D built-in (I have written a rather simple sort that is up to 3
 times faster than the built in in D, and it's always faster no matter
 what data I use).

D's sort is in phobos/internal/qsort.d and qsort.d. If you have a faster qsort, and want to contribute it, please do so! Same goes for other faster routines you've written.
 Now and then the text I/O on disk of the current DMD is slower than
 Python, this comes from some of my benchmarks.

The D 2.0 I/O is much faster than the 1.0 I/O. But it still suffers a bit from the requirement (I imposed) of being compatible with C stdio. I don't know if Python does this or not.
 
 I know all those parts of DMD can be improved later. When you create
 a new language you can't (and you don't want to) optimize every
 little bit (because it may be premature optimization), optimizazion
 must come later, so I understand Walter in this regard. But all this
 is just to show you that if today you have to process lot of text in
 a very flexible way it's not easy to beat the languages like Perl
 (but Python/Ruby/Tcl too. Ruby is less good than Python for Unicode
 texts, I think) designed for it.

I don't believe there are any fundamental reasons why D string processing should be slower, it's just spending the effort on it.
Jan 20 2008
prev sibling parent Sean Kelly <sean f4.ca> writes:
bearophile wrote:
 The built-in sort of Python is the Timsort, that's way faster than the D
built-in (I have written a rather simple sort that is up to 3 times faster than
the built in in D, and it's always faster no matter what data I use).

I'd be interested in seeing that. I've been able to beat the D sort for some data sets and match it in others, but not beat it across the board. Sean
Jan 20 2008