digitalmars.D.learn - Which "if" choice is faster?

jicman (35/35) Aug 03 2005 Greetings.

Vathix (20/57) Aug 03 2005 Well, std.string.find() needs to start at the beginning of the array for...

jicman (6/72) Aug 03 2005 Well, the problem is that I am not really searching for "one letter". I...

Walter (4/4) Aug 03 2005 They're the same. In general, though, you need to profile code to see wh...

jicman (10/14) Aug 03 2005 Thanks Walter. Yes, I figured that some if's will be encountered less t...

Manfred Nowak (8/9) Aug 03 2005 Have you thaught of

jicman (154/161) Aug 03 2005 Will this be faster than std.string.find? I already have some RegExp fo...

Ben Hinkle (8/28) Aug 03 2005 If you could find a way to use string comparison instead of find then yo...

jicman (6/36) Aug 03 2005 Ok, thanks. These values are returned to a case which then, depending o...

Manfred Nowak (8/10) Aug 03 2005 There is no predefined solution for your problem in D.

Manfred Nowak (46/47) Aug 03 2005 For this task you can use re2c from r2c.org.

Ben Hinkle (1/5) Aug 03 2005 any particular reason for the .dups?

Manfred Nowak (4/5) Aug 03 2005 references to local data aren't allowed, are they?

Ben Hinkle (8/13) Aug 03 2005 String literals are stored in the shared data segment so they can be saf...

Manfred Nowak (7/9) Aug 03 2005 I cannot find any hint on this in the specs. So it is current but

Ben Hinkle (6/15) Aug 04 2005 Since C behaves that way Walter probably assumed people would assume D

Manfred Nowak (5/7) Aug 04 2005 I will have a closer look to C-code ported to 64-bit OS. But the fact

Derek Parnell (125/127) Aug 03 2005 I have a much faster string finding routine coded up in Build. It uses t...

Ben Hinkle (8/14) Aug 03 2005 Hmm. when I try it on strings pContainer of length between 50 and 150 wi...

Derek Parnell (19/37) Aug 03 2005 Ouch!

Walter (4/8) Aug 05 2005 do

Ben Hinkle (3/7) Aug 03 2005 I admit this is a truly nit-picky complaint but the "l" prefix above too...

jicman <jicman_member pathlink.com> writes:

Greetings.

I have a program which handles lots of text files on a per line basis.  I am
trying to get some info from the D gurus as to which "if" option is fastest and
best for a large amount of text handling lines.  Let us imagine these two
instances of code:

Code a:
|char[] GetTheChoice(char[] line)
|{
|  if (std.string.find(line,"A") > -1)
|    return "A";
|  if (std.string.find(line,"B") > -1)
|    return "A";
|  ...
|  ...
|  ...
|  if (std.string.find(line,"Y") > -1)
|    return "A";
|  if (std.string.find(line,"Z") > -1)
|    return "A";
|}


Code b:
|char[] GetTheChoice(char[] line)
|{
|  if (std.string.find(line,"A") > -1 ||
|      std.string.find(line,"B") > -1 ||
|  ...
|  ...
|  ...
|      std.string.find(line,"Y") > -1 ||
|      std.string.find(line,"Z") > -1)
|    return "A";
|}

Also, if there is a faster way than this.

thanks,

jos�

Aug 03 2005

Vathix <chris dprogramming.com> writes:

On Wed, 03 Aug 2005 12:38:01 -0400, jicman <jicman_member pathlink.com>  
wrote:

 Greetings.

 I have a program which handles lots of text files on a per line basis.   
 I am
 trying to get some info from the D gurus as to which "if" option is  
 fastest and
 best for a large amount of text handling lines.  Let us imagine these two
 instances of code:

 Code a:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1)
 |    return "A";
 |  if (std.string.find(line,"B") > -1)
 |    return "A";
 |  ...
 |  ...
 |  ...
 |  if (std.string.find(line,"Y") > -1)
 |    return "A";
 |  if (std.string.find(line,"Z") > -1)
 |    return "A";
 |}


 Code b:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1 ||
 |      std.string.find(line,"B") > -1 ||
 |  ...
 |  ...
 |  ...
 |      std.string.find(line,"Y") > -1 ||
 |      std.string.find(line,"Z") > -1)
 |    return "A";
 |}

 Also, if there is a faster way than this.

 thanks,

 josé

Well, std.string.find() needs to start at the beginning of the array for  
each search, so not using it at all and just looping over the string once  
would be fastest.

char[] GetTheChoice(char[] line)
{
    foreach(char ch; line)
    {
       switch(ch)
       {
          case 'A', 'B':
             return "A";
          case 'Y', 'Z':
             return "A";
          default: ;
       }
    }
    //return "?";
}

Aug 03 2005

jicman <jicman_member pathlink.com> writes:

Vathix says...
On Wed, 03 Aug 2005 12:38:01 -0400, jicman <jicman_member pathlink.com>  
wrote:

 Greetings.

 I have a program which handles lots of text files on a per line basis.   
 I am
 trying to get some info from the D gurus as to which "if" option is  
 fastest and
 best for a large amount of text handling lines.  Let us imagine these two
 instances of code:

 Code a:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1)
 |    return "A";
 |  if (std.string.find(line,"B") > -1)
 |    return "A";
 |  ...
 |  ...
 |  ...
 |  if (std.string.find(line,"Y") > -1)
 |    return "A";
 |  if (std.string.find(line,"Z") > -1)
 |    return "A";
 |}


 Code b:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1 ||
 |      std.string.find(line,"B") > -1 ||
 |  ...
 |  ...
 |  ...
 |      std.string.find(line,"Y") > -1 ||
 |      std.string.find(line,"Z") > -1)
 |    return "A";
 |}

 Also, if there is a faster way than this.

 thanks,

 josé

Well, std.string.find() needs to start at the beginning of the array for  
each search, so not using it at all and just looping over the string once  
would be fastest.

char[] GetTheChoice(char[] line)
{
    foreach(char ch; line)
    {
       switch(ch)
       {
          case 'A', 'B':
             return "A";
          case 'Y', 'Z':
             return "A";
          default: ;
       }
    }
    //return "?";
}

Well, the problem is that I am not really searching for "one letter".  I should
have entered another example. ;-)  My fault.  There is a lot of information
missing here, but I need to use std.string.find().

Thanks for the help, though.

jic

Aug 03 2005

"Walter" <newshound digitalmars.com> writes:

They're the same. In general, though, you need to profile code to see where
the real bottlenecks are, as the results can be counterintuitive and
surprising. This should get you started:

    www.digitalmars.com/techtips/timing_code.html

Aug 03 2005

jicman <jicman_member pathlink.com> writes:

Thanks Walter.  Yes, I figured that some if's will be encountered less than
others, and those I am placing at the bottom of the list.  This will allow for
those most used if's to return faster.  I will try both of these if options and
tell you if there is a real time difference between them.  However, don't wait
standing up.  I am working on a project which I chosed to do in D, instead of
Java or c, and I need to finish it first. ;-)  It's due Friday.  My boss does
not know, but when he sees the outcome, he'll be very glad. ;-)

thanks,

jic

Walter says...
They're the same. In general, though, you need to profile code to see where
the real bottlenecks are, as the results can be counterintuitive and
surprising. This should get you started:

    www.digitalmars.com/techtips/timing_code.html

Aug 03 2005

Manfred Nowak <svv1999 hotmail.com> writes:

jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of

import std.regexP;
//...
  if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");
 
-manfred

Aug 03 2005

jicman <jicman_member pathlink.com> writes:

Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of

import std.regexP;
//...
  if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

Will this be faster than std.string.find?  I already have some RegExp for
something else, but I wouldn't think that opening a RegExp object for every
different if would be faster.  Perhaps, I should have explained it better.  I am
not just searching for one letter.  It was just an example.  Here is the full
test, and perhaps, some of you may think of a better way:

|char[] GetTheChoice(char[] line)
|{
|  //writefln(line);
|  if (std.string.find(line,"JUNKDATA") > -1)
|    return "JUNKDATA";
|  if (std.string.find(line,"Auto stopping job") > -1)
|    return "AutoStopJob";
|  if (std.string.find(line,"Auto starting the next job") > -1)
|    return "AutoStartJob";
|  if (std.string.find(line,"HTTP/NSA job") > -1)
|    return "HTTP/NSA job";
|  if (std.string.find(line,"HTTP/IFax job") > -1)
|    return "HTTP/IFax job";
|  if (std.string.find(line,"IFax processing message") > -1)
|    return "IFax processing message";
|  if (std.string.find(line,"IFax Job received") > -1)
|    return "IFax job";
|  if (std.string.find(line,"IFax Job POP3 job") > -1)
|    return "IFax POP3 job";
|  if (std.string.find(line,"Login failed because password incorrect:") > -1)
|    return "Failed NSA FTP login";
|  if (std.string.find(line,"User logged in:") > -1)
|    return "NSA FTP login";
|  if (std.string.find(line,"NSA job from ") > -1)
|    return "NSA job from ";
|  if (std.string.find(line,"Processing image of Cover Sheet ") > -1)
|    return "Processing image of Cover Sheet ";
|  if (std.string.find(line,"e-mail confirmation of") > -1)
|    return "e-mail confirmation of";
|  if (std.string.find(line,"Printing Confirmation Page") > -1)
|    return "Printing Confirmation Page";
|  if (std.string.find(line,"Web Centre URL ") > -1)
|    return "Web Centre URL ";
|  if (std.string.find(line,"Restarted job: ") > -1)
|    return "Restarted job: ";
|  if (std.string.find(line,"Starting remote job") > -1)
|    return "Starting remote job";
|  if (std.string.find(line,"Error handler reports") > -1)
|    return "Error handler reports";
|  if (std.string.find(line," Attempting to send mail to ") > -1)
|    return "CreatingUser0";
|  if (std.string.find(line," User Created: ") > -1)
|    return "CreatingUser1";
|  if (std.string.find(line,"ERROR ") > -1)
|    return "ERROR ";
|  if (std.string.find(line,"DEBUG ") > -1)
|    return "DEBUG ";
|  if (std.string.find(line,"WEB ") > -1)
|    return "WEB ";
|  if (std.string.find(line,"OCR: begin=") > -1)
|    return "OCR: begin=";
|  if (std.string.find(line,"OCR: fail=") > -1)
|    return "OCR: fail=";
|  if (std.string.find(line,"OCR: end=") > -1)
|    return "OCR: end=";
|  if (std.string.find(line,">>> Rejecting login ") > -1)
|    return "HTTPProgrammaticAccess";
|  if (std.string.find(line,"WARNING ") > -1)
|    return "WARNING ";
|  if (std.string.find(line,"SYSTEM.ERR ") > -1)
|    return "SYSTEM.ERR ";
|  if (std.string.find(line,"SYSTEM.OUT ") > -1)
|    return "SYSTEM.OUT ";
|  if (std.string.find(line,"DocuShare retry loop required ") > -1)
|    return "DSRetryLoop";
|  if (std.string.find(line,"FTP command rejected,") > -1)
|    return "FTP command rejected";
|  if (std.string.find(line,"Login: User ") > -1)
|    return "UserLogin";
|  if (std.string.find(line,"Logout from web") > -1)
|    return "UserLogout";
|  if (std.string.find(line,"Login: Username ") > -1)
|    return "BadPassWord";
|  if (std.string.find(line," Advanced access ") > -1) // this one covers
|    return "AdvancedAccess";  // both the request and the access
|  if (std.string.find(line," The scheduled backup did not occur") > -1)
|    return "MissedBackup";
|  if (std.string.find(line,">>> Last backup time of ") > -1)
|    return "MissedBackup";
|  if (std.string.find(line,"There are ") > -1 &&
|      std.string.find(line," jobs waiting to restart") > -1)
|    return "JobsBackedUp";
|  if (std.string.find(line,"Waiting for ") > -1 &&
|      (std.string.find(line," job to complete") > -1 ||
|      std.string.find(line," jobs to complete") > -1 ))
|    return "JobsBackedUp";
|  if (std.string.find(line," Persistent GC is slow") > -1)
|    return "GCSlow";
|  if (std.string.find(line," Persistent store GC reclaimed ") > -1)
|    return "GarbageCollection";
|  if (std.string.find(line,"Shutdown called:") > -1)
|    return "Shutdown called:";
|  if (std.string.find(line,"About to shutdown due to timeout") > -1)
|    return "TimeOutShutDown";
|  if (std.string.find(line,"Persistent Store has been closed!") > -1 ||
|      std.string.find(line,"Terminating persistent store session") > -1)
|    return "ClosedPersistentStore";
|  if (
|      std.string.find(line," >>> About to do GC: ") > -1 ||
|      std.string.find(line," >>> Did GC: ") > -1 ||
|      std.string.find(line," Initial class load") > -1 ||
|      std.string.find(line," Cleaning Database prior ") > -1 ||
|      std.string.find(line," Creating unpack.ini") > -1 ||
|      std.string.find(line," Fixing internal state ") > -1 ||
|      std.string.find(line," Recording Unpack ") > -1 ||
|      std.string.find(line," Deleting info inserted ") > -1 ||
|      std.string.find(line," Packing Properties and ") > -1 ||
|      std.string.find(line," Packing Independent Entities") > -1 ||
|      std.string.find(line," Noting Replaceable objects") > -1 ||
|      std.string.find(line," Starting to unpack objects") > -1 ||
|      std.string.find(line," >>> Credentials support provided ") > -1 ||
|      std.string.find(line," >>> Gathered ") > -1 ||
|      std.string.find(line," >>> Finished unpacking") > -1 ||
|      std.string.find(line," >>> Finished Unpacking") > -1 ||
|      std.string.find(line," >>> SystemParameter ") > -1 ||
|      std.string.find(line," Writing objects to ") > -1 ||
|      std.string.find(line," Exporting database objects...") > -1 ||
|      std.string.find(line," Importing database objects...") > -1 ||
|      std.string.find(line,"Administrative Notice Logging Facility") > -1 ||
|      std.string.find(line,"Starting NSA FTP with server") > -1 ||
|      std.string.find(line," >>> Unable to read ") > -1 ||
|      std.string.find(line," >>> Deleting ") > -1 ||
|      std.string.find(line," >>> Installing Patch ") > -1 ||
|      std.string.find(line," Installing services and ") > -1 ||
|      std.string.find(line," >>> Finished Packing") > -1 ||
|      std.string.find(line,"Starting NSA FTP server") > -1 ||
|      std.string.find(line,"Using Hostname class from ") > -1 ||
|      std.string.find(line,"Started POP3 poller for ") > -1 ||
|      std.string.find(line," >>> POP3 polling enabled") > -1 ||
|      std.string.find(line," >>> POP3 enabled polling") > -1 ||
|      std.string.find(line,"Starting POP3 poller for ") > -1 ||
|      std.string.find(line,"Loading ApplicationJarEntities") > -1 ||
|      std.string.find(line,">>> Loading application ") > -1 ||
|      std.string.find(line,">>> Loading service ") > -1 ||
|      std.string.find(line," Debugging and Error Logging Faci") > -1 ||
|      std.string.find(line,"FlowPort Version FlowPort") > -1 ||
|      std.string.find(line,"TextBridge SDK 5.0-PP") > -1 ||
|      std.string.find(line," Deflate compressed in pass ") > -1 ||
|      std.string.find(line," Terminating persistent store GC ") > -1 ||
|      std.string.find(line,"Attempting to start IFax SMTP") > -1 ||
|      std.string.find(line,"Enabled SMTP IFax server COM.") > -1 ||
|      std.string.find(line,"Started SMTP IFax server COM.") > -1
|     )
|    return "IgnoreEntry"; // ignore these STATUS entries... not used
|  return "NotUsableData";
|}

thanks,

jos�

Aug 03 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"jicman" <jicman_member pathlink.com> wrote in message 
news:dcr6dh$1kv5$1 digitaldaemon.com...
 Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of

import std.regexP;
//...
  if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

 Will this be faster than std.string.find?  I already have some RegExp for
 something else, but I wouldn't think that opening a RegExp object for 
 every
 different if would be faster.  Perhaps, I should have explained it better. 
 I am
 not just searching for one letter.  It was just an example.  Here is the 
 full
 test, and perhaps, some of you may think of a better way:

If you could find a way to use string comparison instead of find then you 
might be able to use an associative array with the keys the strings like 
"JUNKDATA", "Auto stopping job" etc and the values strings like 
"JUNKDATA","AutoStopJob", etc. Without any assumptions about where in the 
line they keys can appear it would be tough to do any better than what you 
are currently doing.

Aug 03 2005

jicman <jicman_member pathlink.com> writes:

Ben Hinkle says...
"jicman" <jicman_member pathlink.com> wrote in message 
news:dcr6dh$1kv5$1 digitaldaemon.com...
 Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of

import std.regexP;
//...
  if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

 Will this be faster than std.string.find?  I already have some RegExp for
 something else, but I wouldn't think that opening a RegExp object for 
 every
 different if would be faster.  Perhaps, I should have explained it better. 
 I am
 not just searching for one letter.  It was just an example.  Here is the 
 full
 test, and perhaps, some of you may think of a better way:

If you could find a way to use string comparison instead of find then you 
might be able to use an associative array with the keys the strings like 
"JUNKDATA", "Auto stopping job" etc and the values strings like 
"JUNKDATA","AutoStopJob", etc. Without any assumptions about where in the 
line they keys can appear it would be tough to do any better than what you 
are currently doing. 

Ok, thanks.  These values are returned to a case which then, depending on the
return string will execute more string splits, checks, etc.  I just wanted to
get an idea, since the amount of data is getting bigger and bigger.

thanks again folks.

jic

Aug 03 2005

Manfred Nowak <svv1999 hotmail.com> writes:

jicman <jicman_member pathlink.com> wrote:

[...]
  Here is the full test, and
 perhaps, some of you may think of a better way: 

There is no predefined solution for your problem in D.

If you want a fast running solution for a big data pool you have to 
implement a lexer.

If you want a prototype to show how fast developing in D can be, you 
are already done.

-manfred

Aug 03 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Manfred Nowak <svv1999 hotmail.com> wrote:

 you have to implement a lexer.

For this task you can use re2c from r2c.org.

A partially filled template for your taks would look like this:

<code>
alias char YYCTYPE;
YYCTYPE* YYCURSOR, YYLIMIT, YYMARKER;
char[] choose( char[] line){
  void yyLimitAdjust(){
    YYLIMIT= &line[ line.length-1];
    YYLIMIT++;
  }
  void YYFILL( int n){
    uint now= line.length;
    line.length= now + n;
    yyLimitAdjust();
    for( uint i= now; i< line.length; i++)
      line[ i]= ' ';
  }
  YYCURSOR= &line[0];
  yyLimitAdjust();
  try{
    while( 1)
/*!re2c
"JUNKDATA"	{ return "JUNKDATA".dup;}
"Auto stopping job"	{ return "AutoStopJob".dup;}
"Waiting for ".*"job""s"?" to complete"	{ return 
"JobsBackedUp".dup; }
'\n'	{}
.	{}
*/
  }
  catch{ return "NotUsableData";};
}
unittest{
  assert( choose( "JUNKDAT") == "NotUsableData");
  assert( choose( "JUNKDATA") == "JUNKDATA");
  assert( choose( "Waiting for one job to complete") == 
"JobsBackedUp");
  assert( choose( "Waiting for 20 jobs to complete") == 
"JobsBackedUp");
}
void main(){
}
</char>

This run through re2c can be fed into dmd nearly as is.

-manfred

Aug 03 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 "JUNKDATA" { return "JUNKDATA".dup;}
 "Auto stopping job" { return "AutoStopJob".dup;}
 "Waiting for ".*"job""s"?" to complete" { return
 "JobsBackedUp".dup; }

any particular reason for the .dups?

Aug 03 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote:

[...]
 any particular reason for the .dups? 

references to local data aren't allowed, are they?

-manfred

Aug 03 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:dcrovi$23ik$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote:

 [...]
 any particular reason for the .dups?

 references to local data aren't allowed, are they?

 -manfred

String literals are stored in the shared data segment so they can be safely 
returned from functions (like in C). The local info on the stack is the ptr 
to the data segment and length to the string data and that is being copied 
to the return value. If the function returned the address of a local 
variable or a string stored locally on the stack then that would indeed be 
illegal.

Aug 03 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote:

 String literals are stored in the shared data segment so they
 can be safely returned from functions (like in C).

I cannot find any hint on this in the specs. So it is current but 
undocumented and therefore unpromised behaviour.

Furthermore and as far as I understand, the 64-bit OS's have given up 
the idea of segments. If this cannot be reestablished, the move to 
64-bit will break masses of D-code?

-manfred

Aug 03 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:dcs6lm$2eod$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote:

 String literals are stored in the shared data segment so they
 can be safely returned from functions (like in C).

 I cannot find any hint on this in the specs. So it is current but
 undocumented and therefore unpromised behaviour.

Since C behaves that way Walter probably assumed people would assume D 
behaves that way, too. It would be nice to have it mentioned in the doc, 
though.

 Furthermore and as far as I understand, the 64-bit OS's have given up
 the idea of segments. If this cannot be reestablished, the move to
 64-bit will break masses of D-code?

That would make existing C code break, too, so I doubt it will be a problem.

 -manfred

Aug 04 2005

Manfred Nowak <svv1999 hotmail.com> writes:

"Ben Hinkle" <ben.hinkle gmail.com> wrote:

[...]
 That would make existing C code break, too, so I doubt it will
 be a problem. 

I will have a closer look to C-code ported to 64-bit OS. But the fact 
that C-code may break is no reason for letting D-code break also.

-manfred

Aug 04 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


[snip]
 I am
 not just searching for one letter.

I have a much faster string finding routine coded up in Build. It uses the
Boyer-Moore algorithm. Here it is ...

<code>
template BMScan(T) {
int BMScan(T[] pContainer, T[] pFind, uint pStartPos=0)
{
    int[T] lShift; // Shift table.
    int lCP;    // Index into container array
    int lCE;    // Current high-water mark in containter array
    int lFP;    // Index into find array

    // Do length checks first.
    if (pFind.length == 0)
        return -1;
    if (pStartPos > pContainer.length)
        return -1;
    if (pStartPos < 0)
        return -1;
    if (pStartPos > 0)
        pContainer = pContainer[pStartPos .. $];
    if (pContainer.length == 0)
        return -1;
    if (pFind.length > pContainer.length)
        return -1;

    // If find string is only one element long, do a brute force scan.
    if (pFind.length == 1)
    {
        T pElem = pFind[0];
        foreach(int i, T lElem; pContainer[pStartPos .. $])
        {
            if ( lElem == pElem)
                return i + pStartPos;
        }
        return -1;
    }

    // Prepare shift table.
    foreach(int i, T c; pFind)
    {
        lShift[c] = pFind.length - i - 1;
    }

    // Start scan at rightmost position.
    lCP = pFind.length - 1;
    while(lCP < pContainer.length)
    {
        lFP = pFind.length - 1;
        assert(lCP == 0 || lCP > lCE);
        lCE = lCP;  // Remember where I'm up to so far.

        if (pContainer[lCP] != pFind[lFP])
        {   // Found a mismatch, so realign the arrays.
            if ((pContainer[lCP] in lShift) == null)
            {
                // This element is not anywhere in the Find array
                // so we can skip over it to align up just beyond it.
                lCP += pFind.length;
            }
            else
            {
                // This element is somewhere in the Find array
                // so if we haven't checked all Find elements we
                // can slide up to align with the rightmost
                // position where the element occurs in the Find
                // array. However, if we have checked the entire
                // Find array, then it isn't in the Container array.
                if (lFP == 0)
                    return -1;
                lCP += lShift[pContainer[lCP]];
            }
        }
        else
        {   // Found a match, so scan right-to-left...
            // unless this we are at the first Find array element
            // in which case we have found it.
            if (lFP == 0)
               return lCP + pStartPos;

           // Keep scanning until we run out of elements in the Find array.
            while(lFP > 0)
            {
                // Point to next pair to check.
                lFP--;
                lCP--;

                if (pContainer[lCP] != pFind[lFP])
                {   // Found a mismatch, so realign the arrays.
                    if ((pContainer[lCP] in lShift) == null)
                    {
                      // This element is not anywhere in the Find array
                      // so we can skip over it to align up just beyond it.
                        lCP += pFind.length;
                    }
                    else
                    {
                        // This element is somewhere in the Find array
                        // so if we haven't checked all Find elements we
                        // can realign the match so far beyond the current
                        // 'block'.
                        // However, if we have checked the entire
                       // Find array, then it isn't in the Container array.
                        if (lFP == 0)
                            return -1;
                        lCP = lCE + (pFind.length - lFP - 1);
                    }
                    // Stop element-by-element scanning and resume
                    // block-end checking.
                    break;
                }

                if (lFP == 0)
                    // We are at the begining of the Find array so
                    // that means we found a complete match!
                    return lCP + pStartPos;
            }
        }

    }
    // If we get here, then there is no match.
    return -1;
  }
}

</code>

And you use it like the phobos find() function ...

<code>
  if ( BMScan!(char)(line,"Loading ApplicationJarEntities") > -1) ...
</code>

-- 
Derek Parnell
Melbourne, Australia
4/08/2005 7:05:14 AM

Aug 03 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:9lcyd77i1y0t.uvlaprrcooxk.dlg 40tude.net...
 On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


 [snip]
 I am
 not just searching for one letter.

 I have a much faster string finding routine coded up in Build. It uses the
 Boyer-Moore algorithm. Here it is ...

Hmm. when I try it on strings pContainer of length between 50 and 150 with a 
pFind of "this is a test of phobos" I get std.string.find outperforms the 
posted code by an order of magnitude and it seems to get worse as the GC 
load increses (I notice the lShift is an assoc array which will allocate 
nodes for every unique key on every call). What sorts of problem sizes are 
appropriate for BMScan?

Aug 03 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 3 Aug 2005 19:58:08 -0400, Ben Hinkle wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:9lcyd77i1y0t.uvlaprrcooxk.dlg 40tude.net...
 On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


 [snip]
 I am
 not just searching for one letter.

 I have a much faster string finding routine coded up in Build. It uses the
 Boyer-Moore algorithm. Here it is ...

 
 Hmm. when I try it on strings pContainer of length between 50 and 150 with a 
 pFind of "this is a test of phobos" I get std.string.find outperforms the 
 posted code by an order of magnitude and it seems to get worse as the GC 
 load increses (I notice the lShift is an assoc array which will allocate 
 nodes for every unique key on every call). What sorts of problem sizes are 
 appropriate for BMScan?

Ouch! 

I've just tried some time trials myself and I have to say that phobos is
faster in all cases, even when I have to do conversions of wchar[] and
dchar[]. I also tried to remove the AA dependancies but the best I could do
was to come in a half phobos's speed. 

Conclusion: Don't use the BMScan code I supplied. There is something not
quite right about.

On the lower-case 'L' prefix issue, I use a modified Courier font that
displays this character's glyph as a lower-case "T" but without the bar.
The net effect is a character that has a curved base and is easily
distinguishable from both 'i' and 1. The same modified font also displays a
zero with dot in the center of the oval to avoid confusion with 'O' and
'o'.

Sorry about that ... ;-)
-- 
Derek
Melbourne, Australia
4/08/2005 1:21:06 PM

Aug 03 2005

"Walter" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message
news:s7w75fouxa82$.1x9tu12mubkvu.dlg 40tude.net...
 I've just tried some time trials myself and I have to say that phobos is
 faster in all cases, even when I have to do conversions of wchar[] and
 dchar[]. I also tried to remove the AA dependancies but the best I could

do
 was to come in a half phobos's speed.

It's just really hard to beat memchr() (which find() is based on).

Aug 05 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

    int[T] lShift; // Shift table.
    int lCP;    // Index into container array
    int lCE;    // Current high-water mark in containter array
    int lFP;    // Index into find array

I admit this is a truly nit-picky complaint but the "l" prefix above took me 
a while to figure out - I thought it was an I for "index". I only really 
confirmed those were l's when I changed my font.

Aug 03 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Which "if" choice is faster?