digitalmars.D.learn - how to be faster than perl?

Boris Bukowski (28/28) Jan 30 2007 Hi,

Frits van Bommel (20/40) Jan 30 2007 std.regexp.find instantiates a RegExp object, compiles the regex and

Boris Bukowski (14/26) Jan 30 2007 buko01@dizit:~/d$ time ./lineio.pl access.log

Frits van Bommel (10/26) Jan 30 2007 Some obvious questions:
mario pernici (15/47) Jan 30 2007 Hello,

Boris Bukowski (7/57) Jan 31 2007 Hi,

mario pernici (15/47) Jan 30 2007 CORRECTION:

Unknown W. Brackets (6/45) Jan 30 2007 I'm a bit tired, but does BufferedFile's opApply use a fixed buffer? I
Derek Parnell (42/45) Jan 31 2007 Your example code seemed to be trying to count the number of times a
David Medlock (8/47) Jan 31 2007 I am too lazy to look but does the regexp module cache any regexes

Dejan Lekic (4/4) Feb 01 2007 Mr. Medlock,

Frits van Bommel (7/9) Feb 01 2007 Well, AFAIK std.regexp only compiles to a bytecode that is then
David Medlock (7/13) Feb 01 2007 I know that.

Boris Bukowski <boris.bukowski lycos-europe.com> writes:

Hi,

currently I am testing D for log processing.
My perl script is more than ten times faster than my D Prog.
How can I get Lines faster from a File?

Boris 

---snip---
private import std.stream;
private import std.stdio;
private import std.string;

void main (char[][] args) {
        int c;
        Stream file = new BufferedFile(args[1]);
        foreach(ulong n, char[] line; file) {
                if(std.regexp.find(line, "horizontal") > -1){
                        c++;
                }
        }

        writefln("%d", c);

}
---snip---



while($line=<>) {
        if ($line=~/horizontal/) {
                $c++;
        }
}

print "$c\n";

---snip---

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Boris Bukowski wrote:
 currently I am testing D for log processing.
 My perl script is more than ten times faster than my D Prog.
 How can I get Lines faster from a File?

I don't think the file handling is your problem.

 ---snip---
 private import std.stream;
 private import std.stdio;
 private import std.string;
 
 void main (char[][] args) {
         int c;
         Stream file = new BufferedFile(args[1]);
         foreach(ulong n, char[] line; file) {
                 if(std.regexp.find(line, "horizontal") > -1){
                         c++;
                 }
         }
 
         writefln("%d", c);
 
 }

std.regexp.find instantiates a RegExp object, compiles the regex and 
uses it once, then deletes it. This is fine for one-time searches, but 
if you're using it for each line of a file, you're allocating and 
deleting an object for each line and performing unnecessary work to 
recompile the same regex over and over.

Try something like this instead:
-----
// (untested code)
auto re = new RegExp("horizontal");
foreach (ulong n, char[] line; file) {
     if (re.find(line) > -1) {
// ...
-----
as the start of your foreach loop.
That should be faster.

I don't know how fast it'll be compared to Perl; I don't know anything 
about the relative performance of D vs. Perl regexes. (In fact, I hardly 
ever use regexes, and have never used Perl)

Jan 30 2007

Boris Bukowski <boris.bukowski lycos-europe.com> writes:

 // (untested code)
 auto re = new RegExp("horizontal");
 foreach (ulong n, char[] line; file) {
      if (re.find(line) > -1) {
 // ...
 -----
 as the start of your foreach loop.
 That should be faster.
 
 I don't know how fast it'll be compared to Perl; I don't know anything
 about the relative performance of D vs. Perl regexes. (In fact, I hardly
 ever use regexes, and have never used Perl)

buko01 dizit:~/d$ time ./lineio.pl access.log
1087

real    0m0.105s
user    0m0.092s
sys     0m0.012s
buko01 dizit:~/d$ time ./lineio2 access.log
1087

real    0m1.547s
user    0m1.528s
sys     0m0.020s

still 15 times slower :-(
Perl strings/IO must be somehow black magic.
Looks like I have to write my own lineReader.

Boris

Jan 30 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Boris Bukowski wrote:
 buko01 dizit:~/d$ time ./lineio.pl access.log
 1087
 
 real    0m0.105s
 user    0m0.092s
 sys     0m0.012s
 buko01 dizit:~/d$ time ./lineio2 access.log
 1087
 
 real    0m1.547s
 user    0m1.528s
 sys     0m0.020s
 
 still 15 times slower :-(
 Perl strings/IO must be somehow black magic.
 Looks like I have to write my own lineReader.

Some obvious questions:

Did you use -O -inline? If not, try those. I don't think they'll make 
much difference.

Do you actually search for "horizontal" (or a similar fixed string) ? To 
search for a non-regex string, std.string.find will likely be faster.


Other than that, I'm out of ideas.

IIRC Perl compiles regexes inline, presumably optimizing them along with 
the rest of the code, so that might explain why it's faster. This sort 
of stuff is what Perl was designed for...

Jan 30 2007

mario pernici <mario.pernici mi.infn.it> writes:

Boris Bukowski Wrote:

 
 // (untested code)
 auto re = new RegExp("horizontal");
 foreach (ulong n, char[] line; file) {
      if (re.find(line) > -1) {
 // ...
 -----
 as the start of your foreach loop.
 That should be faster.
 
 I don't know how fast it'll be compared to Perl; I don't know anything
 about the relative performance of D vs. Perl regexes. (In fact, I hardly
 ever use regexes, and have never used Perl)

 
 buko01 dizit:~/d$ time ./lineio.pl access.log
 1087
 
 real    0m0.105s
 user    0m0.092s
 sys     0m0.012s
 buko01 dizit:~/d$ time ./lineio2 access.log
 1087
 
 real    0m1.547s
 user    0m1.528s
 sys     0m0.020s
 
 still 15 times slower :-(
 Perl strings/IO must be somehow black magic.
 Looks like I have to write my own lineReader.
 
 Boris

Hello,
on my PC the D example is faster than the Perl one, with the data produced by
the Python script

f = open('data','w')
for j in range(1087):
  for i in range(100):
    f.write("%d\n" % i)
  f.write("horizontal\n")
  for i in range(100):
    f.write("%d\n" % i)
f.close()

the Perl example takes on my PC  0.068s,  the D example with auto re takes
0.068s.

Bye
   Mario

Jan 30 2007

Boris Bukowski <boris.bukowski lycos-europe.com> writes:

mario pernici wrote:

 Boris Bukowski Wrote:
 
 
 // (untested code)
 auto re = new RegExp("horizontal");
 foreach (ulong n, char[] line; file) {
      if (re.find(line) > -1) {
 // ...
 -----
 as the start of your foreach loop.
 That should be faster.
 
 I don't know how fast it'll be compared to Perl; I don't know anything
 about the relative performance of D vs. Perl regexes. (In fact, I
 hardly ever use regexes, and have never used Perl)

 
 buko01 dizit:~/d$ time ./lineio.pl access.log
 1087
 
 real    0m0.105s
 user    0m0.092s
 sys     0m0.012s
 buko01 dizit:~/d$ time ./lineio2 access.log
 1087
 
 real    0m1.547s
 user    0m1.528s
 sys     0m0.020s
 
 still 15 times slower :-(
 Perl strings/IO must be somehow black magic.
 Looks like I have to write my own lineReader.
 
 Boris

 
 Hello,
 on my PC the D example is faster than the Perl one, with the data produced
 by the Python script
 
 f = open('data','w')
 for j in range(1087):
   for i in range(100):
     f.write("%d\n" % i)
   f.write("horizontal\n")
   for i in range(100):
     f.write("%d\n" % i)
 f.close()
 
 the Perl example takes on my PC  0.068s,  the D example with auto re takes
 0.068s.

Hi,

with that generated data file D is faster, cause perl spends more time in
the loop.
I use a 20MB squid access log for testing.
looks like I have to write my own readline for this.

Boris

Jan 31 2007

mario pernici <mario.pernici mi.infn.it> writes:

Boris Bukowski Wrote:

 
 // (untested code)
 auto re = new RegExp("horizontal");
 foreach (ulong n, char[] line; file) {
      if (re.find(line) > -1) {
 // ...
 -----
 as the start of your foreach loop.
 That should be faster.
 
 I don't know how fast it'll be compared to Perl; I don't know anything
 about the relative performance of D vs. Perl regexes. (In fact, I hardly
 ever use regexes, and have never used Perl)

 
 buko01 dizit:~/d$ time ./lineio.pl access.log
 1087
 
 real    0m0.105s
 user    0m0.092s
 sys     0m0.012s
 buko01 dizit:~/d$ time ./lineio2 access.log
 1087
 
 real    0m1.547s
 user    0m1.528s
 sys     0m0.020s
 
 still 15 times slower :-(
 Perl strings/IO must be somehow black magic.
 Looks like I have to write my own lineReader.
 
 Boris


CORRECTION:
on my PC the D example is faster than the Perl one, with the data produced by
the Python script

f = open('data','w')
for j in range(1087):
  for i in range(100):
    f.write("%d\n" % i)
  f.write("horizontal\n")
  for i in range(100):
    f.write("%d\n" % i)
f.close()

the Perl example takes on my PC  0.148s,  the D example with auto re takes
0.068s.

Bye
   Mario

Jan 30 2007

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I'm a bit tired, but does BufferedFile's opApply use a fixed buffer?  I 
doubt it does.  In this case, the foreach method is going to be a lot 
slower than reading lines into a buffer.

Check on the other methods of BufferedFile.

Sorry, I'd give a code example but I'm just doing a drive by.

-[Unknown]


 Hi,
 
 currently I am testing D for log processing.
 My perl script is more than ten times faster than my D Prog.
 How can I get Lines faster from a File?
 
 Boris 
 
 ---snip---
 private import std.stream;
 private import std.stdio;
 private import std.string;
 
 void main (char[][] args) {
         int c;
         Stream file = new BufferedFile(args[1]);
         foreach(ulong n, char[] line; file) {
                 if(std.regexp.find(line, "horizontal") > -1){
                         c++;
                 }
         }
 
         writefln("%d", c);
 
 }
 ---snip---
 

 
 while($line=<>) {
         if ($line=~/horizontal/) {
                 $c++;
         }
 }
 
 print "$c\n";
 
 ---snip---

Jan 30 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Tue, 30 Jan 2007 13:21:53 +0100, Boris Bukowski wrote:

 currently I am testing D for log processing.
 My perl script is more than ten times faster than my D Prog.
 How can I get Lines faster from a File?

Your example code seemed to be trying to count the number of times a
certain string occurred in a file so I didn't bother with working with
'lines' as such. Anyhow, here is one way to do it...

// findtext.d ---------
private import std.file;
private import std.stdio;
private import std.string;

void main (char[][] args) {
        char[] lFileText; // Buffer for file contents.

        int lCnt;   // Number if hits
        int lPos;   // Found at position, or Not Found flag.
        int lFrom;  // Where in the file to look from.

        // Grab the whole file into RAM
        lFileText = cast(char[]) std.file.read(args[1]);

        // Start scanning for the substring.
        lFrom = 0;
        while(lFrom < lFileText.length)
        {
            lPos = std.string.find(lFileText[lFrom..$], args[2]);
            if (lPos != -1)
            {
                // Adjust next starting position.
                lFrom += lPos + args[2].length;
                // And count the hits, of course.
                lCnt++;
            }
            else
            {
                // Force end of scanning.
                lFrom = lFileText.length;
            }
        }

        writefln("Count of '%s' found in '%s': %d",
                        args[2], args[1], lCnt);
}

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
31/01/2007 7:18:25 PM

Jan 31 2007

David Medlock <noone nowhere.com> writes:

Boris Bukowski wrote:
 Hi,
 
 currently I am testing D for log processing.
 My perl script is more than ten times faster than my D Prog.
 How can I get Lines faster from a File?
 
 Boris 
 
 ---snip---
 private import std.stream;
 private import std.stdio;
 private import std.string;
 
 void main (char[][] args) {
         int c;
         Stream file = new BufferedFile(args[1]);
         foreach(ulong n, char[] line; file) {
                 if(std.regexp.find(line, "horizontal") > -1){
                         c++;
                 }
         }
 
         writefln("%d", c);
 
 }
 ---snip---
 

 
 while($line=<>) {
         if ($line=~/horizontal/) {
                 $c++;
         }
 }
 
 print "$c\n";
 
 ---snip---
 

I am too lazy to look but does the regexp module cache any regexes 
passed to it?  Otherwise thats probably the major slowdown.

I am pretty sure all 'fixed' regexen in Perl are  pre-compiled into the 
AST so they aren't re-evaluated each time they are used.

I may be wrong though, I've only been using Perl about 7 months(and I 
despise it).

-DavidM

Jan 31 2007

Dejan Lekic <dejan.lekic gmail.com> writes:

Mr. Medlock,
D will compile regexp too. Mr. Bommel has already explained that in one of his
replies...

Kind regards

Dejan

Feb 01 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Dejan Lekic wrote:
 Mr. Medlock,
 D will compile regexp too. Mr. Bommel has already explained that in one of his
replies...

Well, AFAIK std.regexp only compiles to a bytecode that is then 
interpreted for every operation on it. (I have no idea what Perl does 
exactly, so it may be something similar, but I just thought I'd clear 
that up)


P.S. If you want to get formal, it's 'Mr. van Bommel', the 'van' is part 
of my surname.

Feb 01 2007

David Medlock <noone nowhere.com> writes:

Dejan Lekic wrote:
 Mr. Medlock,
 D will compile regexp too. Mr. Bommel has already explained that in one of his
replies...
 
 Kind regards
 
 Dejan

I know that.

I was talking about recompiling each time through the loop.

In python you would say:

pat = re.compile( "horizontal" )

then use pat within your loop.

-DavidM

Feb 01 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - how to be faster than perl?