digitalmars.D - Empty subexpressions captures in std.regex
- PC (37/37) Jul 11 2010 Hi, I've been lurking in this group for a few months, have read
- Andrei Alexandrescu (9/46) Jul 11 2010 Hi PC,
- PC (31/91) Jul 12 2010 Sorry about the lack of clarity in the last post. I actually
Hi, I've been lurking in this group for a few months, have read
through TDPL (which is great Andrei) and have started using D for
some
small programs. So far it's been a joy to use (you may have a C++
convert on your hands) and with the convenience of rdmd, I've been
using it where I'd normally use a scripting language.
It's been pretty good for this especially as Phobos has had almost
everything I've wanted to do covered. I have run into some issues
with
std.regex matching empty subexpressions though (dmd 2.047, win32):
auto r1 = regex( "(a*)b" );
auto m = match( "b", r1 );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 0, empty = true
If I disable the call to optimize, it gives the expected results:
=> captures = 2, empty = false
Also, with optimize disabled:
auto r = regex("([^,]*),([^,]*),([^,]*)");
m = match( ",,", r );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 3, empty = false
I noticed in Captures:
property size_t length()
{
foreach (i; 0 .. matches.length)
{
if (matches[i].startIdx >= input.length) return i;
}
return matches.length;
}
In this case matches[3].startIdx = 2 and matches[3].endIdx=2. Should
this line be:
if (matches[i].startIdx > input.length) return i;
Anyway kudos to everyone involved with D, I'm certainly going to be
using it a lot in the future.
Jul 11 2010
Hi PC,
Thanks for your kind words.
Regarding regex, we need to get a report into bugzilla so we keep track
of the problem. When you say "disable the call to optimize" are you
referring to the -O compiler flag? In that case it's a compiler problem
(otherwise it might be a library issue). Could you please clarify?
Thanks,
Andrei
On 07/11/2010 06:29 AM, PC wrote:
Hi, I've been lurking in this group for a few months, have read
through TDPL (which is great Andrei) and have started using D for
some
small programs. So far it's been a joy to use (you may have a C++
convert on your hands) and with the convenience of rdmd, I've been
using it where I'd normally use a scripting language.
It's been pretty good for this especially as Phobos has had almost
everything I've wanted to do covered. I have run into some issues
with
std.regex matching empty subexpressions though (dmd 2.047, win32):
auto r1 = regex( "(a*)b" );
auto m = match( "b", r1 );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 0, empty = true
If I disable the call to optimize, it gives the expected results:
=> captures = 2, empty = false
Also, with optimize disabled:
auto r = regex("([^,]*),([^,]*),([^,]*)");
m = match( ",,", r );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 3, empty = false
I noticed in Captures:
property size_t length()
{
foreach (i; 0 .. matches.length)
{
if (matches[i].startIdx>= input.length) return i;
}
return matches.length;
}
In this case matches[3].startIdx = 2 and matches[3].endIdx=2. Should
this line be:
if (matches[i].startIdx> input.length) return i;
Anyway kudos to everyone involved with D, I'm certainly going to be
using it a lot in the future.
Jul 11 2010
Sorry about the lack of clarity in the last post. I actually
commented out the call to the Regex.optimize in Regex.compile.
auto r1 = regex( "(a*)b" );
r1.printProgram();
Prints out:
printProgram()
0: REtestbit 98, 13
18: REparen len=15 n=0, pc=>42
27: REnm len=2, n=0, m=4294967295, pc=>42
40: REchar 'a'
42: REchar 'b'
44: REend
With optimize(buf); commented out I get:
printProgram()
0: REparen len=15 n=0, pc=>24
9: REnm len=2, n=0, m=4294967295, pc=>24
22: REchar 'a'
24: REchar 'b'
26: REend
I don't understand why REtestbit is inserted at the start of the
program by the optimize routine, but it will not match if there
is no "a" at the start of the input (e.g. "b").
I think I need to spend some more time looking through the
regex.d source to understand it better
- Pete
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
article
Hi PC,
Thanks for your kind words.
Regarding regex, we need to get a report into bugzilla so we keep
track
of the problem. When you say "disable the call to optimize" are you
referring to the -O compiler flag? In that case it's a compiler
problem
(otherwise it might be a library issue). Could you please clarify?
Thanks,
Andrei
On 07/11/2010 06:29 AM, PC wrote:
Hi, I've been lurking in this group for a few months, have read
through TDPL (which is great Andrei) and have started using D for
some
small programs. So far it's been a joy to use (you may have a C++
convert on your hands) and with the convenience of rdmd, I've been
using it where I'd normally use a scripting language.
It's been pretty good for this especially as Phobos has had almost
everything I've wanted to do covered. I have run into some issues
with
std.regex matching empty subexpressions though (dmd 2.047, win32):
auto r1 = regex( "(a*)b" );
auto m = match( "b", r1 );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 0, empty = true
If I disable the call to optimize, it gives the expected results:
=> captures = 2, empty = false
Also, with optimize disabled:
auto r = regex("([^,]*),([^,]*),([^,]*)");
m = match( ",,", r );
writefln( "captures = %s, empty = %s", m.captures.length,
m.empty );
=> captures = 3, empty = false
I noticed in Captures:
property size_t length()
{
foreach (i; 0 .. matches.length)
{
if (matches[i].startIdx>= input.length) return i;
}
return matches.length;
}
In this case matches[3].startIdx = 2 and matches[3].endIdx=2.
Should
this line be:
if (matches[i].startIdx> input.length) return i;
Anyway kudos to everyone involved with D, I'm certainly going to
be
using it a lot in the future.
Jul 12 2010








PC <petevik38 yahoo.com.au>