std.regexp
Regular expressions are a powerful method of string pattern matching. The regular expression language used is the same as that commonly used, however, some of the very advanced forms may behave slightly differently.std.regexp is designed to work only with valid UTF strings as input. To validate untrusted input, use std.utf.validate().
In the following guide, pattern[] refers to a regular expression. The attributes[] refers to a string controlling the interpretation of the regular expression. It consists of a sequence of one or more of the following characters:
Attribute | Action |
---|---|
g | global; repeat over the whole input string |
i | case insensitive |
m | treat as multiple lines separated by newlines |
The format[] string has the formatting characters:
Format | Replaced With |
---|---|
$$ | $ |
$& | The matched substring. |
$` | The portion of string that precedes the matched substring. |
$' | The portion of string that follows the matched substring. |
$n | The nth capture, where n is a single digit 1-9 and n is not followed by a decimal digit. |
$nn | The nnth capture, where nn is a two-digit decimal number 01-99. If nnth capture is undefined or more than the number of parenthesized subexpressions, use the empty string instead. |
Any other $ are left as is.
References:
Wikipedia
Source:
std/regexp.d
- const char[] email;
- Regular expression to extract an email address
- const char[] url;
- Regular expression to extract a url
- class RegExpException: object.Exception;
- One of these gets thrown on compilation errors
- char[] sub(char[] string, char[] pattern, char[] format, char[] attributes = null);
- Search string for matches with regular expression
pattern with attributes.
Replace each match with string generated from format.
Params:
char[] string String to search. char[] pattern Regular expression pattern. char[] format Replacement string format. char[] attributes Regular expression attributes.
Returns:
the resulting string
Example:
Replace the letters 'a' with the letters 'ZZ'.s = "Strap a rocket engine on a chicken." sub(s, "a", "ZZ") // result: StrZZp a rocket engine on a chicken. sub(s, "a", "ZZ", "g") // result: StrZZp ZZ rocket engine on ZZ chicken.
The replacement format can reference the matches using the $&, $$, $', $`, .. 9 notation:sub(s, "[ar]", "[$&]", "g") // result: St[r][a]p [a] [r]ocket engine on [a] chi
- char[] sub(char[] string, char[] pattern, char[] delegate(RegExp) dg, char[] attributes = null);
- Search string for matches with regular expression
pattern with attributes.
Pass each match to delegate dg.
Replace each match with the return value from dg.
Params:
char[] string String to search. char[] pattern Regular expression pattern. char[] delegate(RegExp) dg Delegate char[] attributes Regular expression attributes.
Returns:
the resulting string.
Example:
Capitalize the letters 'a' and 'r':s = "Strap a rocket engine on a chicken."; sub(s, "[ar]", delegate char[] (RegExp m) { return toupper(m.match(0)); }, "g"); // result: StRAp A Rocket engine on A chicken.
- ptrdiff_t find(rchar[] string, char[] pattern, char[] attributes = null);
- Search string[] for first match with pattern[] with attributes[].
Params:
rchar[] string String to search. char[] pattern Regular expression pattern. char[] attributes Regular expression attributes.
Returns:
index into string[] of match if found, -1 if no match.
Example:
auto s = "abcabcabab"; std.regexp.find(s, "b"); // match, returns 1 std.regexp.find(s, "f"); // no match, returns -1
- ptrdiff_t rfind(rchar[] string, char[] pattern, char[] attributes = null);
- Search string[] for last match with pattern[] with attributes[].
Params:
rchar[] string String to search. char[] pattern Regular expression pattern. char[] attributes Regular expression attributes.
Returns:
index into string[] of match if found, -1 if no match.
Example:
auto s = "abcabcabab"; std.regexp.find(s, "b"); // match, returns 9 std.regexp.find(s, "f"); // no match, returns -1
- char[][] split(char[] string, char[] pattern, char[] attributes = null);
- Split string[] into an array of strings, using the regular
expression pattern[] with attributes[] as the separator.
Params:
char[] string String to search. char[] pattern Regular expression pattern. char[] attributes Regular expression attributes.
Returns:
array of slices into string[]
Example:
foreach (s; split("abcabcabab", "C.", "i")) { writefln("s = '%s'", s); } // Prints: // s = 'ab' // s = 'b' // s = 'bab'
- RegExp search(char[] string, char[] pattern, char[] attributes = null);
- Search string[] for first match with pattern[] with attributes[].
Params:
char[] string String to search. char[] pattern Regular expression pattern. char[] attributes Regular expression attributes.
Returns:
corresponding RegExp if found, null if not.
Example:
import std.stdio; import std.regexp; void main() { if (auto m = std.regexp.search("abcdef", "c")) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } // Prints: // ab[c]def
- class RegExp;
- RegExp is a class to handle regular expressions.
It is the core foundation for adding powerful string pattern matching capabilities to programs like grep, text editors, awk, sed, etc.
- this(rchar[] pattern, rchar[] attributes = null);
- Construct a RegExp object. Compile pattern
with attributes into
an internal form for fast execution.
Params:
rchar[] pattern regular expression rchar[] attributes attributes
Throws:
RegExpException if there are any compilation errors.
Example:
Declare two variables and assign to them a RegExp object:auto r = new RegExp("pattern"); auto s = new RegExp(r"p[1-5]\s*");
- static RegExp opCall(rchar[] pattern, rchar[] attributes = null);
- Generate instance of RegExp.
Params:
rchar[] pattern regular expression rchar[] attributes attributes
Throws:
RegExpException if there are any compilation errors.
Example:
Declare two variables and assign to them a RegExp object:auto r = RegExp("pattern"); auto s = RegExp(r"p[1-5]\s*");
- RegExp search(rchar[] string);
int opApply(int delegate(ref RegExp) dg); - Set up for start of foreach loop.
Returns:
search() returns instance of RegExp set up to search string[].
Example:
import std.stdio; import std.regexp; void main() { foreach(m; RegExp("ab").search("abcabcabab")) { writefln("%s[%s]%s", m.pre, m.match(0), m.post); } } // Prints: // [ab]cabcabab // abc[ab]cabab // abcabc[ab]ab // abcabcab[ab]
- char[] match(size_t n);
- Retrieve match n.
n==0 means the matched substring, n>0 means the n'th parenthesized subexpression. if n is larger than the number of parenthesized subexpressions, null is returned.
- char[] pre();
- Return the slice of the input that precedes the matched substring.
- char[] post();
- Return the slice of the input that follows the matched substring.
- rchar[][] split(rchar[] string);
- Split string[] into an array of strings, using the regular
expression as the separator.
Returns:
array of slices into string[]
- ptrdiff_t find(rchar[] string);
- Search string[] for match with regular expression.
Returns:
index of match if successful, -1 if not found
- rchar[][] match(rchar[] string);
- Search string[] for match.
Returns:
If global attribute, return same value as exec(string). If not global attribute, return array of all matches.
- rchar[] replace(rchar[] string, rchar[] format);
- Find regular expression matches in string[]. Replace those matches
with a new string composed of format[] merged with the result of the
matches.
If global, replace all matches. Otherwise, replace first match.
Returns:
the new string
- rchar[][] exec(rchar[] string);
- Search string[] for match.
Returns:
array of slices into string[] representing matches
- rchar[][] exec();
- Pick up where last exec(string) or exec() left off,
searching string[] for next match.
Returns:
array of slices into string[] representing matches
- int test(rchar[] string);
- Search string[] for match.
Returns:
0 for no match, !=0 for match
Example:
import std.stdio; import std.regexp; import std.string; int grep(int delegate(char[]) pred, char[][] list) { int count; foreach (s; list) { if (pred(s)) ++count; } return count; } void main() { auto x = grep(&RegExp("[Ff]oo").test, std.string.split("mary had a foo lamb")); writefln(x); }
which prints: 1
- int test();
- Pick up where last test(string) or test() left off, and search again.
Returns:
0 for no match, !=0 for match
- int test(char[] string, ptrdiff_t startindex);
- Test string[] starting at startindex against regular expression.
Returns:
0 for no match, !=0 for match
- rchar[] replace(rchar[] format);
- After a match is found with test(), this function
will take the match results and, using the format
string, generate and return a new string.
- rchar[] replaceOld(rchar[] format);
- Like replace(char[] format), but uses old style formatting:
Format Description & replace with the match \n replace with the nth parenthesized match, n is 1..9 \c replace with char c.