www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Investigation: downsides of being generic and correct

reply "Dicebot" <m.strashun gmail.com> writes:
Want to bring into discussion people that are not on Google+. 
Samuel recently has posted there some simple experiments with 
bioinformatics and bad performance of Phobos-based snippet has 
surprised me.

I did explore issue a bit and reported results in a blog post 
(snippets are really small and simple) : 
http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

One open question remains though - can D/Phobos do better here? 
Can some changes be done to Phobos functions in question to 
improve performance or creating bioinformatics-specialized 
library is only practical solution?
May 16 2013
next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?
Of course things can be improved. For a start, pattern could be a template parameter so that most of the checks are inlined and const-folded. Using count!(c => c=='G' || c=='C')(line) from std.algorithm would probably perform better as well. Simply put, countchars is just the obvious naive implementation of the algorithm. It hasn't been tuned at all, and isn't suitable for use in a small kernel like this.
May 16 2013
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Dicebot:

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html
In the first of his posts I don't see -noboundscheck used, and it compares different algorithms from C++ (a switch) and D (two nested ifs, that are not optimal). From my experience if you have some care you are able to write D code for LDC that is about as fast as equivalent as C code, or better.
 One open question remains though - can D/Phobos do better here?
Of course. Bye, bearophile
May 16 2013
parent "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 16 May 2013 at 11:37:14 UTC, bearophile wrote:
 Dicebot:
 In the first of his posts I don't see -noboundscheck used, and 
 it compares different algorithms from C++ (a switch) and D (two 
 nested ifs, that are not optimal).

 From my experience if you have some care you are able to write 
 D code for LDC that is about as fast as equivalent as C code, 
 or better.
Sure. I am not interested in benchmarks. What made me curious was "what made this code so slow if you _don't_ have some care".
May 17 2013
prev sibling next sibling parent reply "Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:
On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?
I bet the problem is in readln. Currently, File.byLine() and readln() are extremely slow, because they call fgetc() one char at a time. I made an "byLineFast" implementation some time ago that is 10x faster than std.stdio.byLine. It reads lines through rawRead, and using buffers instead of char by char. I don't have the time to make it phobos-ready (unicode, etc.). But I'll paste it here for any one to use (it works perfectly). --jm ------------------------------------- module ByLineFast; import std.stdio; import std.string: indexOf; import std.c.string: memmove; /** Reads by line in an efficient way (10 times faster than File.byLine from std.stdio). This is accomplished by reading entire buffers (fgetc() is not used), and allocating as little as possible. The char \n is considered as separator, removing the previous \r if it exists. The \n is never returned. The \r is not returned if it was part of a \r\n (but it is returned if it was by itself). The returned string is always a substring of a temporary buffer, that must not be stored. If necessary, you must use str[] or .dup or .idup to copy to another string. Example: File f = File("file.txt"); foreach (string line; ByLineFast(f)) { ...process line... //Make a copy: string copy = line[]; } The file isn't closed when done iterating, unless it was the only reference to the file (same as std.stdio.byLine). (example: ByLineFast(File("file.txt"))). */ struct ByLineFast { File file; char[] line; bool first_call = true; char[] buffer; char[] strBuffer; this(File f, int bufferSize=4096) { assert(bufferSize > 0); file = f; buffer.length = bufferSize; } property bool empty() const { //Its important to check "line !is null" instead of //"line.length != 0", otherwise, no empty lines can //be returned, the iteration would be closed. if (line !is null) { return false; } if (!file.isOpen) { //Clean the buffer to avoid pointer false positives: (cast(char[])buffer)[] = 0; return true; } //First read. Determine if it's empty and put the char back. auto mutableFP = (cast(File*) &file).getFP(); auto c = fgetc(mutableFP); if (c == -1) { //Clean the buffer to avoid pointer false positives: (cast(char[])buffer)[] = 0; return true; } if (ungetc(c, mutableFP) != c) { assert(false, "Bug in cstdlib implementation"); } return false; } property char[] front() { if (first_call) { popFront(); first_call = false; } return line; } void popFront() { if (strBuffer.length == 0) { strBuffer = file.rawRead(buffer); if (strBuffer.length == 0) { file.detach(); line = null; return; } } int pos = strBuffer.indexOf('\n'); if (pos != -1) { if (pos != 0 && strBuffer[pos-1] == '\r') { line = strBuffer[0 .. (pos-1)]; } else { line = strBuffer[0 .. pos]; } //Pop the line, skipping the terminator: strBuffer = strBuffer[(pos+1) .. $]; } else { //More needs to be read here. Copy the tail of the buffer //to the beginning, and try to read with the empty part of //the buffer. //If no buffer was left, extend the size of the buffer before //reading. If the file has ended, then the line is the entire //buffer. if (strBuffer.ptr != buffer.ptr) { //Must use memmove because there might be overlap memmove(buffer.ptr, strBuffer.ptr, strBuffer.length * char.sizeof); } int spaceBegin = strBuffer.length; if (strBuffer.length == buffer.length) { //Must extend the buffer to keep reading. assumeSafeAppend(buffer); buffer.length = buffer.length * 2; } char[] readPart = file.rawRead(buffer[spaceBegin .. $]); if (readPart.length == 0) { //End of the file. Return whats in the buffer. //The next popFront() will try to read again, and then //mark empty condition. if (spaceBegin != 0 && buffer[spaceBegin-1] == '\r') { line = buffer[0 .. spaceBegin-1]; } else { line = buffer[0 .. spaceBegin]; } strBuffer = null; return; } strBuffer = buffer[0 .. spaceBegin + readPart.length]; //Now that we have new data in strBuffer, we can go on. //If a line isn't found, the buffer will be extended again to read more. popFront(); } } }
May 16 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/16/13 9:48 AM, Juan Manuel Cabo wrote:
 I bet the problem is in readln. Currently, File.byLine() and readln()
 are extremely slow, because they call fgetc() one char at a time.
Depends on the OS. Andrei
May 16 2013
prev sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 16 May 2013 at 13:48:45 UTC, Juan Manuel Cabo wrote:
 On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better 
 here? Can some changes be done to Phobos functions in question 
 to improve performance or creating bioinformatics-specialized 
 library is only practical solution?
I bet the problem is in readln. Currently, File.byLine() and readln() are extremely slow, because they call fgetc() one char at a time.
Both manual and naive phobos version use same readln approach, but former is more than 10x faster. It was my first guess too, but comparing to snippets have shown that this is not the issue this time.
May 17 2013
prev sibling next sibling parent reply "Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:
On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?
May I also recommend my tool "avgtime" to make simple benchmarks, instead of "time" (you can see an ascii histogram as the output): https://github.com/jmcabo/avgtime/tree/ For example: $ avgtime -r10 -h -q ls ------------------------ Total time (ms): 27.413 Repetitions : 10 Sample mode : 2.6 (4 ocurrences) Median time : 2.6695 Avg time : 2.7413 Std dev. : 0.260515 Minimum : 2.557 Maximum : 3.505 95% conf.int. : [2.2307, 3.2519] e = 0.510599 99% conf.int. : [2.07026, 3.41234] e = 0.671041 EstimatedAvg95%: [2.57983, 2.90277] e = 0.161466 EstimatedAvg99%: [2.5291, 2.9535] e = 0.212202 Histogram : msecs: count normalized bar --jm
May 16 2013
next sibling parent reply 1100110 <0b1100110 gmail.com> writes:
 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):
=20
      https://github.com/jmcabo/avgtime/tree/
=20
 For example:
=20
 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e =3D 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e =3D 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e =3D 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e =3D 0.212202
 Histogram      :
     msecs: count  normalized bar




=20
 --jm
=20
Thank you for self-promotion, I miss that tool.
May 16 2013
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Thu, 16 May 2013 09:03:36 -0500
1100110 <0b1100110 gmail.com> wrote:

 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):
 
      https://github.com/jmcabo/avgtime/tree/
 
 For example:
 
 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e = 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e = 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e = 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e = 0.212202
 Histogram      :
     msecs: count  normalized bar




 
 --jm
 
Thank you for self-promotion, I miss that tool.
Indeed. I had totally forgotten about that, and yet it *should* be the first thing I think of when I think "timing a program". IMO, that should be a standard tool in any unixy installation.
May 16 2013
parent reply 1100110 <0b1100110 gmail.com> writes:
On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 On Thu, 16 May 2013 09:03:36 -0500
 1100110 <0b1100110 gmail.com> wrote:
=20
 May I also recommend my tool "avgtime" to make simple benchmarks,
 instead of "time" (you can see an ascii histogram as the output):

      https://github.com/jmcabo/avgtime/tree/

 For example:

 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e =3D 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e =3D 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e =3D 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e =3D 0.212202
 Histogram      :
     msecs: count  normalized bar





 --jm
Thank you for self-promotion, I miss that tool.
=20 Indeed. I had totally forgotten about that, and yet it *should* be the first thing I think of when I think "timing a program". IMO, that should be a standard tool in any unixy installation. =20 =20
+1 That's worth creating a package for.
May 16 2013
parent reply "Juan Manuel Cabo" <juanmanuel.cabo gmail.com> writes:
On Thursday, 16 May 2013 at 22:58:42 UTC, 1100110 wrote:
 On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 On Thu, 16 May 2013 09:03:36 -0500
 1100110 <0b1100110 gmail.com> wrote:
 
 May I also recommend my tool "avgtime" to make simple 
 benchmarks,
 instead of "time" (you can see an ascii histogram as the 
 output):

      https://github.com/jmcabo/avgtime/tree/

 For example:

 $ avgtime -r10 -h -q  ls
 ------------------------
 Total time (ms): 27.413
 Repetitions    : 10
 Sample mode    : 2.6 (4 ocurrences)
 Median time    : 2.6695
 Avg time       : 2.7413
 Std dev.       : 0.260515
 Minimum        : 2.557
 Maximum        : 3.505
 95% conf.int.  : [2.2307, 3.2519]  e = 0.510599
 99% conf.int.  : [2.07026, 3.41234]  e = 0.671041
 EstimatedAvg95%: [2.57983, 2.90277]  e = 0.161466
 EstimatedAvg99%: [2.5291, 2.9535]  e = 0.212202
 Histogram      :
     msecs: count  normalized bar





 --jm
Thank you for self-promotion, I miss that tool.
Indeed. I had totally forgotten about that, and yet it *should* be the first thing I think of when I think "timing a program". IMO, that should be a standard tool in any unixy installation.
+1 That's worth creating a package for.
Thanks! I currently don't have much time to make a ubuntu/arch/etc. package, between work and the university. I might in the future. Keep in mind that it also works in windows. Though the process creation overhead is bigger in windows than in linux (because of the OS). Also, you can open the source up and easily modify it to measure your times directly, inside your programs. --jm
May 16 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Fri, 17 May 2013 03:01:38 +0200
"Juan Manuel Cabo" <juanmanuel.cabo gmail.com> wrote:

 On Thursday, 16 May 2013 at 22:58:42 UTC, 1100110 wrote:
 On 05/16/2013 01:46 PM, Nick Sabalausky wrote:
 
 Indeed. I had totally forgotten about that, and yet it 
 *should* be the
 first thing I think of when I think "timing a program". IMO, 
 that
 should be a standard tool in any unixy installation.
 
[...]
 
 Keep in mind that it also works in windows. Though the process 
 creation overhead is bigger in windows than in linux (because of 
 the OS). Also, you can open the source up and easily modify it to 
 measure your times directly, inside your programs.
Yea, I almost said "should be a standard tool in any OS installation", but there's a *lot* of things that should be a standard part of any Windows box (bash, grep, a pre-Vista GUI...) and yet never will be ;)
May 16 2013
prev sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 16 May 2013 at 13:52:01 UTC, Juan Manuel Cabo wrote:
 ...
Thanks for the tool, it is a good one. But I was not doing benchmarks this time, only cared about 2x difference at least, so "time" was enough :)
May 17 2013
prev sibling next sibling parent reply "nazriel" <spam dzfl.pl> writes:
On Thursday, 16 May 2013 at 10:35:12 UTC, Dicebot wrote:
 Want to bring into discussion people that are not on Google+. 
 Samuel recently has posted there some simple experiments with 
 bioinformatics and bad performance of Phobos-based snippet has 
 surprised me.

 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html

 One open question remains though - can D/Phobos do better here? 
 Can some changes be done to Phobos functions in question to 
 improve performance or creating bioinformatics-specialized 
 library is only practical solution?
Very nice blog post. Something similar should go into D wiki database so it won't get lost in "In 80s we had..." topics. For sure there is a space for improvements in Phobos but such articles are good start to prevent wave of "D is slow and sucks" and force people to rethink if they are using right tools (functions in this case ie UTF8 aware vs plain ASCII ones) for their job. Btw, you've got nice articles on your blog in overall. Bookmarked ;)
May 16 2013
parent "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 16 May 2013 at 14:23:22 UTC, nazriel wrote:
 Very nice blog post.

 Something similar should go into D wiki database so it won't 
 get lost in "In 80s we had..." topics.

 For sure there is a space for improvements in Phobos but such 
 articles are good start to prevent wave of "D is slow and 
 sucks" and force people to rethink if they are using right 
 tools (functions in this case ie UTF8 aware vs plain ASCII 
 ones) for their job.
Thank you, I am glad at least someone have noticed it is not a call for a benchmarking contest :) Yes, my interest was exactly in case when newbie comes and tries to write some trivial code. If it behaves too slow, that abstract guy won't benchmark or investigate stuff, he will just say "D sucks" and move to the next language. It is more of an informational issue, than Phobos one.
May 17 2013
prev sibling next sibling parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Thu, 16 May 2013 12:35:11 +0200
"Dicebot" <m.strashun gmail.com> wrote:
 
 I did explore issue a bit and reported results in a blog post 
 (snippets are really small and simple) : 
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html
 
For anyone else who has trouble viewing that like I did, there appears to be an HTML version of it here: http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html?m=1
May 16 2013
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, May 16, 2013 12:35:11 Dicebot wrote:
 Want to bring into discussion people that are not on Google+.
 Samuel recently has posted there some simple experiments with
 bioinformatics and bad performance of Phobos-based snippet has
 surprised me.
 
 I did explore issue a bit and reported results in a blog post
 (snippets are really small and simple) :
 http://dicebot.blogspot.com/2013/05/short-performance-tuning-story.html
 
 One open question remains though - can D/Phobos do better here?
 Can some changes be done to Phobos functions in question to
 improve performance or creating bioinformatics-specialized
 library is only practical solution?
1. In general, if you want to operate on ASCII, and you want your code to be fast, use immutable(ubyte)[], not immutable(char)[]. Obviously, that's not gonig to work in this case, because the function is in std.string, but maybe that's a reason for some std.string functions to have ubyte overloads which are ASCII-specific. 2. We actually discussed removing all of the pattern stuff completely and replacing it with regexes (which is why countchars doesn't follow Phobos' naming scheme correctly - I left the pattern-using functions alone). However, that requires that someone who is appropriately familiar with regexes go and implement new versions of all of these functions which use std.regex. It should definitely be done, but no one has taken the time to do so yet. 3. While some functions in Phobos are well-optimized, there are plenty of them which aren't. They do the job, but no one has taken the time to optimize their implementations. This should be fixed, but again, it requires that someone spends the time to do the optimizations, and while that has been done for some functions, it definitely hasn't been done for all. And if python is faster than D at something, odds are that either the code in question is poorly written or that whatever Phobos functions it's using haven't been properly optimized yet. - Jonathan M Davis
May 16 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/16/2013 12:15 PM, Jonathan M Davis wrote:
 And if python is faster than
 D at something, odds are that either the code in question is poorly written or
 that whatever Phobos functions it's using haven't been properly optimized yet.
We should also be aware that while Python code itself is slow, its library functions are heavily optimized C code. So, if the benchmark consists of calling a Python library function, it'll run as fast as any optimized C code.
May 16 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-05-16 21:54, Walter Bright wrote:

 We should also be aware that while Python code itself is slow, its
 library functions are heavily optimized C code. So, if the benchmark
 consists of calling a Python library function, it'll run as fast as any
 optimized C code.
But someone using Python won't care about that. Most of them will think they just use Python and have no idea there's optimized C code under the hood. -- /Jacob Carlborg
May 17 2013
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 17 May 2013 at 08:28:38 UTC, Jacob Carlborg wrote:
 On 2013-05-16 21:54, Walter Bright wrote:

 We should also be aware that while Python code itself is slow, 
 its
 library functions are heavily optimized C code. So, if the 
 benchmark
 consists of calling a Python library function, it'll run as 
 fast as any
 optimized C code.
But someone using Python won't care about that. Most of them will think they just use Python and have no idea there's optimized C code under the hood.
I'm not sure how we can respond to that. If naive D code has to be significantly faster than optimised C for people to not go "D sucks, it's only as fast as python" then we're pretty much doomed by peoples stupidity.
May 17 2013
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote:
 If naive D code has to be significantly faster than optimised C 
 for people to not go "D sucks, it's only as fast as python" 
 then we're pretty much doomed by peoples stupidity.
No. The whole benefit of D is lost if you have to tweak everything in complex way to get it run fast. It means we failed at designing nice API. Dev don't have years to sped on every existing language to know if it is good or not and figure out all the subtelties.
May 17 2013
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 17 May 2013 at 11:26:27 UTC, deadalnix wrote:
 On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote:
 If naive D code has to be significantly faster than optimised 
 C for people to not go "D sucks, it's only as fast as python" 
 then we're pretty much doomed by peoples stupidity.
No. The whole benefit of D is lost if you have to tweak everything in complex way to get it run fast.
Define fast. In some cases, if a naive call to a generic phobos function is as fast as an equivalent python library function then i'd say that's pretty good. Those python library functions are often impressively fast.
May 17 2013
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, May 16, 2013 12:54:35 Walter Bright wrote:
 On 5/16/2013 12:15 PM, Jonathan M Davis wrote:
 And if python is faster than
 D at something, odds are that either the code in question is poorly
 written or that whatever Phobos functions it's using haven't been
 properly optimized yet.
We should also be aware that while Python code itself is slow, its library functions are heavily optimized C code. So, if the benchmark consists of calling a Python library function, it'll run as fast as any optimized C code.
I keep forgetting about that. That's a good thing to keep in mind when comparing performance - though part of me thinks that it says very poor things about your language if you have to write your code in other languages in order to make it fast enough (even if it were only the standard library where that happened). - Jonathan M Davis
May 17 2013
prev sibling parent reply "Dicebot" <m.strashun gmail.com> writes:
On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
 1. In general, if you want to operate on ASCII, and you want 
 your code to be
 fast, use immutable(ubyte)[], not immutable(char)[]. Obviously, 
 that's not
 gonig to work in this case, because the function is in 
 std.string, but maybe
 that's a reason for some std.string functions to have ubyte 
 overloads which
 are ASCII-specific.
I was thinking exactly about that. Only thing I want to be advised on - is it better to add those overloads in std.string or separate module is better from the point of self-documentation?
 2. We actually discussed removing all of the pattern stuff 
 completely and
 replacing it with regexes.
Is is kind of pre-approved? I am willing to add this to my TODO list together with needed benchmarks, but had some doubts that std.string depending on std.regex will be tolerated.
 3. While some functions in Phobos are well-optimized, there are 
 plenty of them
 which aren't. They do the job, but no one has taken the time to 
 optimize their
 implementations. This should be fixed, but again, it requires 
 that someone
 spends the time to do the optimizations, and while that has 
 been done for some
 functions, it definitely hasn't been done for all. And if 
 python is faster than
 D at something, odds are that either the code in question is 
 poorly written or
 that whatever Phobos functions it's using haven't been properly 
 optimized yet.
I understand that. What I tried to bring attention to is how big difference it may be for someone who just picks random functions and writes some simple code. It is very tempting to just say "Phobos (D) sucks" and don't get into details. In other words I consider it more of informational/marketing issue than a technical one.
 - Jonathan M Davis
Thanks for your response, it was really helpful.
May 17 2013
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, May 17, 2013 11:15:24 Dicebot wrote:
 On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
 1. In general, if you want to operate on ASCII, and you want
 your code to be
 fast, use immutable(ubyte)[], not immutable(char)[]. Obviously,
 that's not
 gonig to work in this case, because the function is in
 std.string, but maybe
 that's a reason for some std.string functions to have ubyte
 overloads which
 are ASCII-specific.
I was thinking exactly about that. Only thing I want to be advised on - is it better to add those overloads in std.string or separate module is better from the point of self-documentation?
I'm not sure. My first inclination would be to simply put them as overloads in the same module, but that probably merits some discussion. And while I think that having ubyte overloads for strings for ASCII is something that we should at least explore, it probably merits some discussion as well, as we haven't really done a lot with handling ASCII outside of std.ascii at this point (which currently only operates on characters, not strings). My first inclination is to handle ASCII where necessary by accepting arrays of ubytes, but others here may have other ideas about that (which may or may not be better). A side note of that is that we might want to consider is having a function called assumeASCII which casts from string to immutable(ubyte)[] (similar to assumeUnique). I think that that might have been suggested before, but even if it has, we've never actually added it.
 2. We actually discussed removing all of the pattern stuff
 completely and
 replacing it with regexes.
Is is kind of pre-approved? I am willing to add this to my TODO list together with needed benchmarks, but had some doubts that std.string depending on std.regex will be tolerated.
AFAIK, there would be no problem with doing so. Maybe Dmitry would have something to say about it, since he's the regex guru, but IIRC, the last time it was discussed, it was pretty clear that we wanted those functions to be using std.regex instead of patterns. So, if you did the work and did it at the appropriate quality level, I expect that it would be merged in. And we might or might now deprecate the pattern functions at that point (that was originally my intention and is why I never fixed their names, but we're not deprecating much now, so I don't know if we'll want to in this case).
 I understand that. What I tried to bring attention to is how big
 difference it may be for someone who just picks random functions
 and writes some simple code. It is very tempting to just say
 "Phobos (D) sucks" and don't get into details. In other words I
 consider it more of informational/marketing issue than a
 technical one.
We need to do more to optimize Phobos, but given our stance of correctness by default, we're kind of stuck with string functions taking a performance hit in a number of common cases simply due to the necessary decoding of code points. We can do better at making them fast, and reduce problems like this, but ultimately, if you want fast ASCII-only operations, you almost certainly need to operate on something like ubyte[] rather than string, and that requires educating people. It's one of the costs of trying to be both correct and performant. - Jonathan M Davis
May 17 2013
prev sibling parent Samuel Lampa <samuel.lampa gmail.com> writes:
On 05/17/2013 11:41 AM, Jonathan M Davis wrote:
 We need to do more to optimize Phobos, but given our stance of correctness by
 default, we're kind of stuck with string functions taking a performance hit in
 a number of common cases simply due to the necessary decoding of code points.
 We can do better at making them fast, and reduce problems like this, but
 ultimately, if you want fast ASCII-only operations, you almost certainly need
 to operate on something like ubyte[] rather than string, and that requires
 educating people. It's one of the costs of trying to be both correct and
 performant.
At least I'm now educated on this :") // Samuel
May 17 2013