www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Who wants regexps anyway?

reply Georg Wrede <georg.wrede nospam.org> writes:
Hands up?

I do too.

What if we collected the most used and needed regexps? They could be 
precompiled, and maybe put as part of Phobos?

Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


The last one would be cool. While I could never write one, I bet there 
is one on the internet somewhere!!

What other regexps would be nice? Those that are harder to write would 
be most valuable, I think.

This would save a lot of time when coding.

Why invent the wheel over and over! ??
Mar 14 2005
next sibling parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
Hi, George,

Here is (on the clip) my collection of url REs used in http://BlockNote.net 
.
I think it make sense to include them. Pretty frequently used these days.

Andrew.


"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be
 precompiled, and maybe put as part of Phobos?

 Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


 The last one would be cool. While I could never write one, I bet there
 is one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would
 be most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ?? 
begin 666 URL_REs.cpp M7T%$1%)?3D%-12 B6U]A+7I!+5HP+3E<7"U=*RA;7%PN72M;7V$M>D$M6C M M*UQ<+ELP+3E=*UQ<+ELP+3E=*UQ<+ELP+3E=*R(-"B-D969I;F4 4D5?5$-0 M<W,-"B-D969I;F4 4D5?14U!24Q?041$4B (" (" B6U]A+7I!+5HP+3E< M7"U<7"Y=*T H(B!215]40U!?25!?041$4B B*2(-"B\O($9I;'1E<B!A;B!U M;FEX('!A=& -"B-D969I;F4 4D5?54Y)6%]0051((" (" (" B*"];7V$M M(" B*%Q</UM?82UZ02U:,"TY7%PF7%P]7%PE72LI/R(-"B-D969I;F4 4D5? M(" (" (" (" ("(H6T9F75M4=%U;4'!=?%M(:%U;5'1=6U1T75M0<%U; M4W-=/RDZ+R\H(B!215]40U!?25!?041$4B B*2 Z6S M.5TK*3\H(B!215]5 M:6YE(%)%7T944" (" (" (" (" (EM&9EU;5'1=6U!P75Q<+B(-" T* M=&]O;#HZ M;V]L.CIR96=E>' <F5?96UA:6PH(")>*%M-;5U;06%=6TEI75M,;%U;5'1= M7U5.25A?4$%42" B*2HB(%)%7U!!4D%-4R!215]!3D-(3U( *3L-"G1O;VPZ M.G)E9V5X<"!R95]F=' H(")>(B!215]&5% 4D5?5$-07TE07T%$1%)?3D%- :12 B*"( 4D5?54Y)6%]0051(("(I*B( *3L` ` end
Mar 14 2005
parent Georg Wrede <georg.wrede nospam.org> writes:
Excellent!

Andrew Fedoniouk wrote:
 Hi, George,
 
 Here is (on the clip) my collection of url REs used in http://BlockNote.net 
 .
 I think it make sense to include them. Pretty frequently used these days.
 
 Andrew.
 
 
 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:4235751B.5020009 nospam.org...
 
Hands up?

I do too.

What if we collected the most used and needed regexps? They could be
precompiled, and maybe put as part of Phobos?

Stuff like

 - find next number
 - find next float
 - find next identifier
 - find next URL
 - find next filename (one for windows, unix, etc.)
 - find next _valid_ http tag
 - find next regexp


The last one would be cool. While I could never write one, I bet there
is one on the internet somewhere!!

What other regexps would be nice? Those that are harder to write would
be most valuable, I think.

This would save a lot of time when coding.

Why invent the wheel over and over! ?? 
Mar 14 2005
prev sibling next sibling parent reply "Charles" <cee-lo green.com> writes:
Hand down.

I think built-in regex's work well in PERL, in fact maybe one of its most
powerful features, but it serves such a specific purpose I dont think it
would fit well in a general purpose language like D.  How many of us have
used regex's in a compiled language ?  How many feel the Regex class is
insufficient ?

Charlie

"Georg Wrede" <georg.wrede nospam.org> wrote in message
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be
 precompiled, and maybe put as part of Phobos?

 Stuff like

   - find next number
   - find next float
   - find next identifier
   - find next URL
   - find next filename (one for windows, unix, etc.)
   - find next _valid_ http tag
   - find next regexp


 The last one would be cool. While I could never write one, I bet there
 is one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would
 be most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ??
Mar 14 2005
parent reply Georg Wrede <georg.wrede nospam.org> writes:
Charles wrote:
 Hand down.
 
 I think built-in regex's work well in PERL, in fact maybe one of its most
 powerful features, but it serves such a specific purpose I dont think it
 would fit well in a general purpose language like D.  How many of us have
 used regex's in a compiled language ?  How many feel the Regex class is
 insufficient ?
Well, in a language like D that treats strings as char[] one would not need to have substring, trim, and other functions. You can always write them when needed. Since D provides them, it would only be natural to provide the most used regexps.
Mar 14 2005
parent "Charles" <cee-lo green.com> writes:
Oh I'm sorry I thought this thread was a contituation of putting built-in
regexp support like PERL into D.  If its just adding existing regular
expression Im all for it :D.

Charlie

"Georg Wrede" <georg.wrede nospam.org> wrote in message
news:42360AE1.40709 nospam.org...
 Charles wrote:
 Hand down.

 I think built-in regex's work well in PERL, in fact maybe one of its
most
 powerful features, but it serves such a specific purpose I dont think it
 would fit well in a general purpose language like D.  How many of us
have
 used regex's in a compiled language ?  How many feel the Regex class is
 insufficient ?
Well, in a language like D that treats strings as char[] one would not need to have substring, trim, and other functions. You can always write them when needed. Since D provides them, it would only be natural to provide the most used regexps.
Mar 14 2005
prev sibling next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?
 Stuff like
  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp
 This would save a lot of time when coding.
 Why invent the wheel over and over! ??
These sound very interesting, but.. mind explaining their use to those of us not so acquainted with them? I'm sure there are several windows users here who have never used Perl and have no idea what a regexp is for. I know it's a kind of string pattern-matching, but what use would it serve in a programming language? And if it does what I think it does, wouldn't it be a sort of.. preprocessor?
Mar 14 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 14 Mar 2005 21:31:34 -0500, Jarrett Billingsley wrote:

 "Georg Wrede" <georg.wrede nospam.org> wrote in message 
 news:4235751B.5020009 nospam.org...
 Hands up?
 Stuff like
  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp
 This would save a lot of time when coding.
 Why invent the wheel over and over! ??
These sound very interesting, but.. mind explaining their use to those of us not so acquainted with them? I'm sure there are several windows users here who have never used Perl and have no idea what a regexp is for. I know it's a kind of string pattern-matching, but what use would it serve in a programming language? And if it does what I think it does, wouldn't it be a sort of.. preprocessor?
They are very handy for applications that do a *lot* of text processing, but I don't think that they warrant being built-in to the language. Library routines ought to suffice. In fact, I'm sure that if they were built-in, they would only be syntax sweeteners for library calls anyway. That is, the dmd compiler would insert a call to a library routine to perform the requested functionality. Ok, with regexp literals it would do this at compile time rather than at run time, but that is not going to be a world-beating performance improvement. I suspect an interface into the PCRE library ( http://www.pcre.org/ ) will suit nearly all needs in this area. -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build/ http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage 15/03/2005 2:05:02 PM
Mar 14 2005
parent Georg Wrede <georg.wrede nospam.org> writes:
Derek Parnell wrote:
 On Mon, 14 Mar 2005 21:31:34 -0500, Jarrett Billingsley wrote:
"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...

Hands up?
Stuff like
 - find next number
 - find next float
 - find next identifier
 - find next URL
 - find next filename (one for windows, unix, etc.)
 - find next _valid_ http tag
 - find next regexp
This would save a lot of time when coding.
Why invent the wheel over and over! ??
They are very handy for applications that do a *lot* of text processing, but I don't think that they warrant being built-in to the language. Library
Right. (So this was totally separate from the thread a few weeks ago about Unquoted Regular Expressions in D.)
 routines ought to suffice. In fact, I'm sure that if they were built-in,
 they would only be syntax sweeteners for library calls anyway. That is, the
 dmd compiler would insert a call to a library routine to perform the
I was thinking of having some stock functions in the library.
 requested functionality. Ok, with regexp literals it would do this at
 compile time rather than at run time, but that is not going to be a
 world-beating performance improvement.
 
 I suspect an interface into the PCRE library ( http://www.pcre.org/ ) will
 suit nearly all needs in this area.
Hmm, took a look at it. Will look some more, hopefully there would be something useful to get into the D library too.
Mar 15 2005
prev sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <d15h9v$29i$1 digitaldaemon.com>, Jarrett Billingsley says...
mind explaining their use to those of us 
not so acquainted with them?  I'm sure there are several windows users here 
who have never used Perl and have no idea what a regexp is for.  I know it's 
a kind of string pattern-matching, but what use would it serve in a 
programming language?
Gladly. :) The thumbnail sketch of a regexp is basically the world's most compact parser languge. It makes relatively sizable parsing chores a breeze, by offloading all the messy details to a common syntax and a runtime engine to support it. Wikipedia has all the gory details: http://en.wikipedia.org/wiki/Regular_expression "A regular expression (abbreviated as regexp, regex or regxp) is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns. Many programming languages support regular expressions for string manipulation. For example, Perl has a powerful regular expression engine built directly into its syntax. The set of utilities (including the editor sed and the filter grep) provided by Unix distributions were the first to popularize the concept of regular expressions." (The wiki page goes into a great deal of depth and is a pretty good read. If you're anything like me, you'll want to skip the Language Theory part and get right to the syntax section.) One more thing to consider about regexps, is that they aren't well suited to large parsing tasks. Since the grammar an infinite-lookahead parser, it spends a lot of time resolving negative matches. Also, it's running off of the regular expression 'language' as it goes (okay, some regexp engines compile to an inermediate code first). So you have a tool that is less efficent than a hand-coded solution for the same task. All things said and done, I don't use regexps very often. However, they come in *very handy* when I want a quick solution to a parsing problem, or simply don't mind trading runtime performance for sane-looking code. - EricAnderton at yahoo
Mar 14 2005
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"pragma" <pragma_member pathlink.com> wrote in message 
news:d15jro$4e1$1 digitaldaemon.com...
 The thumbnail sketch of a regexp is basically the world's most compact 
 parser
 languge.  It makes relatively sizable parsing chores a breeze, by 
 offloading all
 the messy details to a common syntax and a runtime engine to support it.
Oh, well then. I suppose they might be useful when writing something like a source analyzer tool, or anything else that parses text.
 All things said and done, I don't use regexps very often.  However, they 
 come in
 *very handy* when I want a quick solution to a parsing problem, or simply 
 don't
 mind trading runtime performance for sane-looking code.
I can see what you mean. I'm not sure why they'd need to be part of the language, as they are useful, but not _that_ integral.
Mar 14 2005
next sibling parent Georg Wrede <georg.wrede nospam.org> writes:
Jarrett Billingsley wrote:
 "pragma" <pragma_member pathlink.com> wrote in message 
 news:d15jro$4e1$1 digitaldaemon.com...
 
The thumbnail sketch of a regexp is basically the world's most compact 
parser
languge.  It makes relatively sizable parsing chores a breeze, by 
offloading all
the messy details to a common syntax and a runtime engine to support it.
Oh, well then. I suppose they might be useful when writing something like a source analyzer tool, or anything else that parses text.
Right. And that's exactly what I'm doing with the meta thing right now. Also, whenever you make a program that takes a text file as its input, regexps are more flexible than the functions in std.string. find, cmp, ifind, inPattern, inPatterns are all useful and handy for what they do, but they are most useful in "simple" situations.
All things said and done, I don't use regexps very often.  However, they 
come in
*very handy* when I want a quick solution to a parsing problem, or simply 
don't
mind trading runtime performance for sane-looking code.
I can see what you mean. I'm not sure why they'd need to be part of the language, as they are useful, but not _that_ integral.
True, this time I was talking about having some of the most useful in the standard library. Not in the language itself. Incidentally, unfamiliarity with regexps is yet another reason why one should install Linux and use it. Familiarity with Linux is increasingly becoming a must for anybody who even remotely considers computers or programming as a future career (or even serious hobby). And that brings regexps sort of "easily and naturally" into your life. You'll hardly even notice.
Mar 15 2005
prev sibling parent pragma <pragma_member pathlink.com> writes:
In article <d15lj4$5rt$1 digitaldaemon.com>, Jarrett Billingsley says...
I can see what you mean.  I'm not sure why they'd need to be part of the 
language, as they are useful, but not _that_ integral. 
There are a couple different reasons why one would build regular-expressions into a compiled language. IMO, its past the point of diminishing returns, and aside from D's stated goals. If a given regular expression were compiled into native code, it would obviously be faster than its runtime counterpart. That alone could make building high-performance parsing applications easier. However, this would require an additional parse/lex/compile unit to be added onto the D compiler; this is not as trivial a task as some would believe. Another angle is "evangelizing" users of other languages to D on the grounds that D "eats perl's lunch" when it comes to text processing. Perl has the advantage in that regexps are a "built-in" feature of the language syntax. Bringing D up to that standard of use for parsing would go a long way to enticing folks used to programming like this. IMO, D is a hammer and perl is a screwdriver: D can be used to do perl's job, but that's for folks who like pounding screws all afternoon. - EricAnderton at yahoo
Mar 15 2005
prev sibling parent "Charlie Patterson" <charliep1 excite.com> writes:
I'm not sure what this is about, but if you have some standard regexps you 
want, such as find next http tag, a hand-built version is much faster than 
writing it in generic regexp'isms.

"Georg Wrede" <georg.wrede nospam.org> wrote in message 
news:4235751B.5020009 nospam.org...
 Hands up?

 I do too.

 What if we collected the most used and needed regexps? They could be 
 precompiled, and maybe put as part of Phobos?

 Stuff like

  - find next number
  - find next float
  - find next identifier
  - find next URL
  - find next filename (one for windows, unix, etc.)
  - find next _valid_ http tag
  - find next regexp


 The last one would be cool. While I could never write one, I bet there is 
 one on the internet somewhere!!

 What other regexps would be nice? Those that are harder to write would be 
 most valuable, I think.

 This would save a lot of time when coding.

 Why invent the wheel over and over! ?? 
Mar 15 2005