digitalmars.D - Empty subexpressions captures in std.regex
- PC (37/37) Jul 11 2010 Hi, I've been lurking in this group for a few months, have read
- Andrei Alexandrescu (9/46) Jul 11 2010 Hi PC,
- PC (31/91) Jul 12 2010 Sorry about the lack of clarity in the last post. I actually
Hi, I've been lurking in this group for a few months, have read through TDPL (which is great Andrei) and have started using D for some small programs. So far it's been a joy to use (you may have a C++ convert on your hands) and with the convenience of rdmd, I've been using it where I'd normally use a scripting language. It's been pretty good for this especially as Phobos has had almost everything I've wanted to do covered. I have run into some issues with std.regex matching empty subexpressions though (dmd 2.047, win32): auto r1 = regex( "(a*)b" ); auto m = match( "b", r1 ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 0, empty = true If I disable the call to optimize, it gives the expected results: => captures = 2, empty = false Also, with optimize disabled: auto r = regex("([^,]*),([^,]*),([^,]*)"); m = match( ",,", r ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 3, empty = false I noticed in Captures: property size_t length() { foreach (i; 0 .. matches.length) { if (matches[i].startIdx >= input.length) return i; } return matches.length; } In this case matches[3].startIdx = 2 and matches[3].endIdx=2. Should this line be: if (matches[i].startIdx > input.length) return i; Anyway kudos to everyone involved with D, I'm certainly going to be using it a lot in the future.
Jul 11 2010
Hi PC, Thanks for your kind words. Regarding regex, we need to get a report into bugzilla so we keep track of the problem. When you say "disable the call to optimize" are you referring to the -O compiler flag? In that case it's a compiler problem (otherwise it might be a library issue). Could you please clarify? Thanks, Andrei On 07/11/2010 06:29 AM, PC wrote:Hi, I've been lurking in this group for a few months, have read through TDPL (which is great Andrei) and have started using D for some small programs. So far it's been a joy to use (you may have a C++ convert on your hands) and with the convenience of rdmd, I've been using it where I'd normally use a scripting language. It's been pretty good for this especially as Phobos has had almost everything I've wanted to do covered. I have run into some issues with std.regex matching empty subexpressions though (dmd 2.047, win32): auto r1 = regex( "(a*)b" ); auto m = match( "b", r1 ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 0, empty = true If I disable the call to optimize, it gives the expected results: => captures = 2, empty = false Also, with optimize disabled: auto r = regex("([^,]*),([^,]*),([^,]*)"); m = match( ",,", r ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 3, empty = false I noticed in Captures: property size_t length() { foreach (i; 0 .. matches.length) { if (matches[i].startIdx>= input.length) return i; } return matches.length; } In this case matches[3].startIdx = 2 and matches[3].endIdx=2. Should this line be: if (matches[i].startIdx> input.length) return i; Anyway kudos to everyone involved with D, I'm certainly going to be using it a lot in the future.
Jul 11 2010
Sorry about the lack of clarity in the last post. I actually commented out the call to the Regex.optimize in Regex.compile. auto r1 = regex( "(a*)b" ); r1.printProgram(); Prints out: printProgram() 0: REtestbit 98, 13 18: REparen len=15 n=0, pc=>42 27: REnm len=2, n=0, m=4294967295, pc=>42 40: REchar 'a' 42: REchar 'b' 44: REend With optimize(buf); commented out I get: printProgram() 0: REparen len=15 n=0, pc=>24 9: REnm len=2, n=0, m=4294967295, pc=>24 22: REchar 'a' 24: REchar 'b' 26: REend I don't understand why REtestbit is inserted at the start of the program by the optimize routine, but it will not match if there is no "a" at the start of the input (e.g. "b"). I think I need to spend some more time looking through the regex.d source to understand it better - Pete == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleHi PC, Thanks for your kind words. Regarding regex, we need to get a report into bugzilla so we keeptrackof the problem. When you say "disable the call to optimize" are you referring to the -O compiler flag? In that case it's a compilerproblem(otherwise it might be a library issue). Could you please clarify? Thanks, Andrei On 07/11/2010 06:29 AM, PC wrote:ShouldHi, I've been lurking in this group for a few months, have read through TDPL (which is great Andrei) and have started using D for some small programs. So far it's been a joy to use (you may have a C++ convert on your hands) and with the convenience of rdmd, I've been using it where I'd normally use a scripting language. It's been pretty good for this especially as Phobos has had almost everything I've wanted to do covered. I have run into some issues with std.regex matching empty subexpressions though (dmd 2.047, win32): auto r1 = regex( "(a*)b" ); auto m = match( "b", r1 ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 0, empty = true If I disable the call to optimize, it gives the expected results: => captures = 2, empty = false Also, with optimize disabled: auto r = regex("([^,]*),([^,]*),([^,]*)"); m = match( ",,", r ); writefln( "captures = %s, empty = %s", m.captures.length, m.empty ); => captures = 3, empty = false I noticed in Captures: property size_t length() { foreach (i; 0 .. matches.length) { if (matches[i].startIdx>= input.length) return i; } return matches.length; } In this case matches[3].startIdx = 2 and matches[3].endIdx=2.bethis line be: if (matches[i].startIdx> input.length) return i; Anyway kudos to everyone involved with D, I'm certainly going tousing it a lot in the future.
Jul 12 2010