digitalmars.D - Who wants regexps anyway?

Georg Wrede (18/18) Mar 14 2005 Hands up?

Andrew Fedoniouk (35/53) Mar 14 2005 Hi, George,

Georg Wrede (2/44) Mar 14 2005

Charles (9/27) Mar 14 2005 Hand down.

Georg Wrede (6/13) Mar 14 2005 Well, in a language like D that treats strings as char[] one would not

Charles (8/21) Mar 14 2005 Oh I'm sorry I thought this thread was a contituation of putting built-i...

Jarrett Billingsley (9/20) Mar 14 2005 These sound very interesting, but.. mind explaining their use to those o...

Derek Parnell (17/39) Mar 14 2005 They are very handy for applications that do a *lot* of text processing,

Georg Wrede (6/33) Mar 15 2005 Right. (So this was totally separate from the thread a few weeks ago

pragma (28/33) Mar 14 2005 Gladly. :)

Jarrett Billingsley (6/16) Mar 14 2005 Oh, well then. I suppose they might be useful when writing something li...

Georg Wrede (15/34) Mar 15 2005 Right. And that's exactly what I'm doing with the meta thing right now.
pragma (17/19) Mar 15 2005 There are a couple different reasons why one would build regular-express...

Charlie Patterson (5/23) Mar 15 2005 I'm not sure what this is about, but if you have some standard regexps y...

Georg Wrede <georg.wrede nospam.org> writes:

Hands up?

I do too.

What if we collected the most used and needed regexps? They could be 
precompiled, and maybe put as part of Phobos?

Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


The last one would be cool. While I could never write one, I bet there 
is one on the internet somewhere!!

What other regexps would be nice? Those that are harder to write would 
be most valuable, I think.

This would save a lot of time when coding.

Why invent the wheel over and over! ??

Mar 14 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Hi, George,

Here is (on the clip) my collection of url REs used in http://BlockNote.net 
.
I think it make sense to include them. Pretty frequently used these days.

Andrew.


"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be
 precompiled, and maybe put as part of Phobos?

 Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


 The last one would be cool. While I could never write one, I bet there
 is one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would
 be most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ?? 


begin 666 URL_REs.cpp

M7T%$1%)?3D%-12 B6U]A+7I!+5HP+3E<7"U=*RA;7%PN72M;7V$M>D$M6C M

M*UQ<+ELP+3E=*UQ<+ELP+3E=*UQ<+ELP+3E=*R(-"B-D969I;F4 4D5?5$-0


M<W,-"B-D969I;F4 4D5?14U!24Q?041$4B  ("  (" B6U]A+7I!+5HP+3E<
M7"U<7"Y=*T H(B!215]40U!?25!?041$4B B*2(-"B\O($9I;'1E<B!A;B!U
M;FEX('!A=& -"B-D969I;F4 4D5?54Y)6%]0051(("  ("  (" B*"];7V$M

M(" B*%Q</UM?82UZ02U:,"TY7%PF7%P]7%PE72LI/R(-"B-D969I;F4 4D5?


M("  ("  ("  ("  ("(H6T9F75M4=%U;4'!=?%M(:%U;5'1=6U1T75M0<%U;
M4W-=/RDZ+R\H(B!215]40U!?25!?041$4B B*2 Z6S M.5TK*3\H(B!215]5


M:6YE(%)%7T944"  ("  ("  ("  ("  (EM&9EU;5'1=6U!P75Q<+B(-" T*
M=&]O;#HZ
M;V]L.CIR96=E>'  <F5?96UA:6PH(")>*%M-;5U;06%=6TEI75M,;%U;5'1=


M7U5.25A?4$%42" B*2HB(%)%7U!!4D%-4R!215]!3D-(3U( *3L-"G1O;VPZ
M.G)E9V5X<"!R95]F=' H(")>(B!215]&5%  4D5?5$-07TE07T%$1%)?3D%-
:12 B*"( 4D5?54Y)6%]0051(("(I*B( *3L`
`
end

Mar 14 2005

Georg Wrede <georg.wrede nospam.org> writes:

Excellent!

Andrew Fedoniouk wrote:
 Hi, George,
 
 Here is (on the clip) my collection of url REs used in http://BlockNote.net 
 .
 I think it make sense to include them. Pretty frequently used these days.
 
 Andrew.
 
 
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:4235751B.5020009 nospam.org...
 
Hands up?

I do too.

What if we collected the most used and needed regexps? They could be
precompiled, and maybe put as part of Phobos?

Stuff like

 - find next number
 - find next float
 - find next identifier
 - find next URL
 - find next filename (one for windows, unix, etc.)
 - find next _valid_ http tag
 - find next regexp


The last one would be cool. While I could never write one, I bet there
is one on the internet somewhere!!

What other regexps would be nice? Those that are harder to write would
be most valuable, I think.

This would save a lot of time when coding.

Why invent the wheel over and over! ??

Mar 14 2005

"Charles" <cee-lo green.com> writes:

Hand down.

I think built-in regex's work well in PERL, in fact maybe one of its most
powerful features, but it serves such a specific purpose I dont think it
would fit well in a general purpose language like D.  How many of us have
used regex's in a compiled language ?  How many feel the Regex class is
insufficient ?

Charlie

"Georg Wrede" <georg.wrede nospam.org> wrote in message
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be
 precompiled, and maybe put as part of Phobos?

 Stuff like

   - find next number
   - find next float
   - find next identifier
   - find next URL
   - find next filename (one for windows, unix, etc.)
   - find next _valid_ http tag
   - find next regexp


 The last one would be cool. While I could never write one, I bet there
 is one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would
 be most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ??

Mar 14 2005

Georg Wrede <georg.wrede nospam.org> writes:

Charles wrote:
 Hand down.
 
 I think built-in regex's work well in PERL, in fact maybe one of its most
 powerful features, but it serves such a specific purpose I dont think it
 would fit well in a general purpose language like D.  How many of us have
 used regex's in a compiled language ?  How many feel the Regex class is
 insufficient ?

Well, in a language like D that treats strings as char[] one would not 
need to have substring, trim, and other functions. You can always write 
them when needed.

Since D provides them, it would only be natural to provide the most used 
regexps.

Mar 14 2005

"Charles" <cee-lo green.com> writes:

Oh I'm sorry I thought this thread was a contituation of putting built-in
regexp support like PERL into D.  If its just adding existing regular
expression Im all for it :D.

Charlie

"Georg Wrede" <georg.wrede nospam.org> wrote in message
news:42360AE1.40709 nospam.org...
 Charles wrote:
 Hand down.

 I think built-in regex's work well in PERL, in fact maybe one of its


most
 powerful features, but it serves such a specific purpose I dont think it
 would fit well in a general purpose language like D.  How many of us


have
 used regex's in a compiled language ?  How many feel the Regex class is
 insufficient ?

 Well, in a language like D that treats strings as char[] one would not
 need to have substring, trim, and other functions. You can always write
 them when needed.

 Since D provides them, it would only be natural to provide the most used
 regexps.

Mar 14 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?
 Stuff like
  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp
 This would save a lot of time when coding.
 Why invent the wheel over and over! ??

These sound very interesting, but.. mind explaining their use to those of us 
not so acquainted with them?  I'm sure there are several windows users here 
who have never used Perl and have no idea what a regexp is for.  I know it's 
a kind of string pattern-matching, but what use would it serve in a 
programming language?

And if it does what I think it does, wouldn't it be a sort of.. 
preprocessor?

Mar 14 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 14 Mar 2005 21:31:34 -0500, Jarrett Billingsley wrote:

 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:4235751B.5020009 nospam.org...
 Hands up?
 Stuff like
  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp
 This would save a lot of time when coding.
 Why invent the wheel over and over! ??

 
 These sound very interesting, but.. mind explaining their use to those of us 
 not so acquainted with them?  I'm sure there are several windows users here 
 who have never used Perl and have no idea what a regexp is for.  I know it's 
 a kind of string pattern-matching, but what use would it serve in a 
 programming language?
 
 And if it does what I think it does, wouldn't it be a sort of.. 
 preprocessor?

They are very handy for applications that do a *lot* of text processing,
but I don't think that they warrant being built-in to the language. Library
routines ought to suffice. In fact, I'm sure that if they were built-in,
they would only be syntax sweeteners for library calls anyway. That is, the
dmd compiler would insert a call to a library routine to perform the
requested functionality. Ok, with regexp literals it would do this at
compile time rather than at run time, but that is not going to be a
world-beating performance improvement.

I suspect an interface into the PCRE library ( http://www.pcre.org/ ) will
suit nearly all needs in this area.

-- 
Derek Parnell
Melbourne, Australia
http://www.dsource.org/projects/build/
http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage
15/03/2005 2:05:02 PM

Mar 14 2005

Georg Wrede <georg.wrede nospam.org> writes:

Derek Parnell wrote:
 On Mon, 14 Mar 2005 21:31:34 -0500, Jarrett Billingsley wrote:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...

Hands up?
Stuff like
 - find next number
 - find next float
 - find next identifier
 - find next URL
 - find next filename (one for windows, unix, etc.)
 - find next _valid_ http tag
 - find next regexp
This would save a lot of time when coding.
Why invent the wheel over and over! ??


 They are very handy for applications that do a *lot* of text processing,
 but I don't think that they warrant being built-in to the language. Library

Right. (So this was totally separate from the thread a few weeks ago 
about Unquoted Regular Expressions in D.)

 routines ought to suffice. In fact, I'm sure that if they were built-in,
 they would only be syntax sweeteners for library calls anyway. That is, the
 dmd compiler would insert a call to a library routine to perform the

I was thinking of having some stock functions in the library.

 requested functionality. Ok, with regexp literals it would do this at
 compile time rather than at run time, but that is not going to be a
 world-beating performance improvement.
 
 I suspect an interface into the PCRE library ( http://www.pcre.org/ ) will
 suit nearly all needs in this area.

Hmm, took a look at it. Will look some more, hopefully there would be 
something useful to get into the D library too.

Mar 15 2005

pragma <pragma_member pathlink.com> writes:

In article <d15h9v$29i$1 digitaldaemon.com>, Jarrett Billingsley says...
mind explaining their use to those of us 
not so acquainted with them?  I'm sure there are several windows users here 
who have never used Perl and have no idea what a regexp is for.  I know it's 
a kind of string pattern-matching, but what use would it serve in a 
programming language?

Gladly. :)

The thumbnail sketch of a regexp is basically the world's most compact parser
languge.  It makes relatively sizable parsing chores a breeze, by offloading all
the messy details to a common syntax and a runtime engine to support it.

Wikipedia has all the gory details:

http://en.wikipedia.org/wiki/Regular_expression

"A regular expression (abbreviated as regexp, regex or regxp) is a string that
describes or matches a set of strings, according to certain syntax rules.
Regular expressions are used by many text editors and utilities to search and
manipulate bodies of text based on certain patterns. Many programming languages
support regular expressions for string manipulation. For example, Perl has a
powerful regular expression engine built directly into its syntax. The set of
utilities (including the editor sed and the filter grep) provided by Unix
distributions were the first to popularize the concept of regular expressions."

(The wiki page goes into a great deal of depth and is a pretty good read. If
you're anything like me, you'll want to skip the Language Theory part and get
right to the syntax section.)


One more thing to consider about regexps, is that they aren't well suited to
large parsing tasks.  Since the grammar an infinite-lookahead parser, it spends
a lot of time resolving negative matches.  Also, it's running off of the regular
expression 'language' as it goes (okay, some regexp engines compile to an
inermediate code first).   So you have a tool that is less efficent than a
hand-coded solution for the same task.  

All things said and done, I don't use regexps very often.  However, they come in
*very handy* when I want a quick solution to a parsing problem, or simply don't
mind trading runtime performance for sane-looking code.

- EricAnderton at yahoo

Mar 14 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"pragma" <pragma_member pathlink.com> wrote in message 
news:d15jro$4e1$1 digitaldaemon.com...
 The thumbnail sketch of a regexp is basically the world's most compact 
 parser
 languge.  It makes relatively sizable parsing chores a breeze, by 
 offloading all
 the messy details to a common syntax and a runtime engine to support it.

Oh, well then.  I suppose they might be useful when writing something like a 
source analyzer tool, or anything else that parses text.

 All things said and done, I don't use regexps very often.  However, they 
 come in
 *very handy* when I want a quick solution to a parsing problem, or simply 
 don't
 mind trading runtime performance for sane-looking code.

I can see what you mean.  I'm not sure why they'd need to be part of the 
language, as they are useful, but not _that_ integral.

Mar 14 2005

Georg Wrede <georg.wrede nospam.org> writes:

Jarrett Billingsley wrote:
 "pragma" <pragma_member pathlink.com> wrote in message 
 news:d15jro$4e1$1 digitaldaemon.com...
 
The thumbnail sketch of a regexp is basically the world's most compact 
parser
languge.  It makes relatively sizable parsing chores a breeze, by 
offloading all
the messy details to a common syntax and a runtime engine to support it.

 
 Oh, well then.  I suppose they might be useful when writing something like a 
 source analyzer tool, or anything else that parses text.

Right. And that's exactly what I'm doing with the meta thing right now.

Also, whenever you make a program that takes a text file as its input, 
regexps are more flexible than the functions in std.string.

find, cmp, ifind, inPattern, inPatterns

are all useful and handy for what they do, but they are most useful in 
"simple" situations.

All things said and done, I don't use regexps very often.  However, they 
come in
*very handy* when I want a quick solution to a parsing problem, or simply 
don't
mind trading runtime performance for sane-looking code.

 
 I can see what you mean.  I'm not sure why they'd need to be part of the 
 language, as they are useful, but not _that_ integral. 

True, this time I was talking about having some of the most useful in 
the standard library. Not in the language itself.

Incidentally, unfamiliarity with regexps is yet another reason why one 
should install Linux and use it. Familiarity with Linux is increasingly 
becoming a must for anybody who even remotely considers computers or 
programming as a future career (or even serious hobby).

And that brings regexps sort of "easily and naturally" into your life. 
You'll hardly even notice.

Mar 15 2005

pragma <pragma_member pathlink.com> writes:

In article <d15lj4$5rt$1 digitaldaemon.com>, Jarrett Billingsley says...
I can see what you mean.  I'm not sure why they'd need to be part of the 
language, as they are useful, but not _that_ integral. 

There are a couple different reasons why one would build regular-expressions
into a compiled language.  IMO, its past the point of diminishing returns, and
aside from D's stated goals.

If a given regular expression were compiled into native code, it would obviously
be faster than its runtime counterpart.  That alone could make building
high-performance parsing applications easier.  However, this would require an
additional parse/lex/compile unit to be added onto the D compiler; this is not
as trivial a task as some would believe.

Another angle is "evangelizing" users of other languages to D on the grounds
that D "eats perl's lunch" when it comes to text processing.  Perl has the
advantage in that regexps are a "built-in" feature of the language syntax.
Bringing D up to that standard of use for parsing would go a long way to
enticing folks used to programming like this.

IMO, D is a hammer and perl is a screwdriver: D can be used to do perl's job,
but that's for folks who like pounding screws all afternoon.

- EricAnderton at yahoo

Mar 15 2005

"Charlie Patterson" <charliep1 excite.com> writes:

I'm not sure what this is about, but if you have some standard regexps you 
want, such as find next http tag, a hand-built version is much faster than 
writing it in generic regexp'isms.

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be 
 precompiled, and maybe put as part of Phobos?

 Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


 The last one would be cool. While I could never write one, I bet there is 
 one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would be 
 most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ??

Mar 15 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Who wants regexps anyway?