digitalmars.D - D vs Haskell
- Szymon Gatner (2/2) Jun 22 2013 Word counting problem in D and Haskell:
- Juan Manuel Cabo (84/87) Jun 22 2013 I thought that the D time could be improved further with
- Joseph Rushton Wakeling (2/11) Jun 22 2013 What about with GDC or LDC?
- Juan Manuel Cabo (13/26) Jun 22 2013 Right, the author of the article used ldc. I'm always used
- Joseph Rushton Wakeling (5/14) Jun 23 2013 You know that you can use ldmd2 to invoke LDC using the same
- Juan Manuel Cabo (15/36) Jun 23 2013 The computer where I tested has Kubuntu 12.04, which is
Word counting problem in D and Haskell: http://leonardo-m.livejournal.com/109201.html
Jun 22 2013
On 06/22/2013 03:25 PM, Szymon Gatner wrote:Word counting problem in D and Haskell: http://leonardo-m.livejournal.com/109201.htmlI thought that the D time could be improved further with little changes. Testing against Complete Works of William Shakespeare (5.3 MiB plaintext): http://www.gutenberg.org/ebooks/100 and using "dmd -O -inline -noboundscheck -release" on both the last version of the author, and my version using "byWord" I got these times (minimum of 10 runs): before: 2781 ms after: 805 ms Here is the code, with a "byWord" range using std.ascii.toAlpha: import std.stdio, std.conv, std.file, std.string, std.algorithm, std.range, std.traits, std.ascii; auto hashCounter(R)(R items) if (isForwardRange!R) { size_t[ForeachType!R] result; foreach (x; items) result[x]++; return result.byKey.zip(result.byValue); } void main(string[] args) { //Slow: // args[1] // .readText // .toLower // .tr("A-Za-z", "\n", "cs") // .split //Faster: args[1] .readText .byWord .map!toLower() .array .hashCounter .array .sort!"-a[1] < -b[1]"() .take(args[2].to!uint) .map!q{ text(a[0], " ", a[1]) } .join("\n") .writeln; } /** Range that extracts words from a string. Words are strings composed only of chars accepted by std.ascii.toAlpha() */ struct byWord { string s; size_t pos; string word; this(string s) { this.s = s; popFront(); } property bool empty() const { return s.length == 0; } property string front() { return word; } void popFront() { if (pos == s.length) { //Mark the range as empty, only after popFront fails: s = null; return; } while (pos < s.length && !std.ascii.isAlpha(s[pos])) { ++pos; } auto start = pos; while (pos < s.length && std.ascii.isAlpha(s[pos])) { ++pos; } if (start == s.length) { //No more words. Range empty: s = null; } else { word = s[start .. pos]; } } } unittest { assert([] == array(byWord(""))); assert(["a", "b"] == array(byWord("a b"))); assert(["a", "b", "c"] == array(byWord("a b c"))); } --jm
Jun 22 2013
On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote:Testing against Complete Works of William Shakespeare (5.3 MiB plaintext): http://www.gutenberg.org/ebooks/100 and using "dmd -O -inline -noboundscheck -release" on both the last version of the author, and my version using "byWord" I got these times (minimum of 10 runs): before: 2781 ms after: 805 msWhat about with GDC or LDC?
Jun 22 2013
On 06/22/2013 06:29 PM, Joseph Rushton Wakeling wrote:On 06/22/2013 10:11 PM, Juan Manuel Cabo wrote:Right, the author of the article used ldc. I'm always used to dmd. Keep in mind that he hasn't released the 9MB text that he used, but instead pointed to that 5.3MB Shakespeare file. This time I used LDC 0.11.0, with: ldc2 -O -release -disable-boundscheck Times (minimum of 10 runs): before: 1617 ms after: 554 ms Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC. --jmTesting against Complete Works of William Shakespeare (5.3 MiB plaintext): http://www.gutenberg.org/ebooks/100 and using "dmd -O -inline -noboundscheck -release" on both the last version of the author, and my version using "byWord" I got these times (minimum of 10 runs): before: 2781 ms after: 805 msWhat about with GDC or LDC?
Jun 22 2013
On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote:Right, the author of the article used ldc. I'm always used to dmd.You know that you can use ldmd2 to invoke LDC using the same flags as DMD?This time I used LDC 0.11.0, with: ldc2 -O -release -disable-boundscheck Times (minimum of 10 runs): before: 1617 ms after: 554 ms Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC.Any chance of GDC results? You can again use gdmd for DMD-like interface.
Jun 23 2013
On 06/23/2013 05:42 AM, Joseph Rushton Wakeling wrote:On Saturday, 22 June 2013 at 21:45:48 UTC, Juan Manuel Cabo wrote:The computer where I tested has Kubuntu 12.04, which is too old even for gcc-4.7, and I cannot compile GDC there. The gdc version which comes in the repository fails to compile the program, even after some rearranging so that it doesn't use UFCS. Anyway, using my "byWord" range to iterate the string will surely be faster with GDC too, because tr and split can be a little slow, especially since the string is big (5MB). I tested a little more and found that using a handcoded function instead of tr, and using splitter instead of split, it's almost as fast as the byWord version. The handcoded function assumes ascii and preallocates the resulting string (with result.length = source.length). --jmRight, the author of the article used ldc. I'm always used to dmd.You know that you can use ldmd2 to invoke LDC using the same flags as DMD?This time I used LDC 0.11.0, with: ldc2 -O -release -disable-boundscheck Times (minimum of 10 runs): before: 1617 ms after: 554 ms Conclusion: using "byWords" was 3.4 times faster with DMD and 2.9 times faster with LDC.Any chance of GDC results? You can again use gdmd for DMD-like interface.
Jun 23 2013