digitalmars.D - Limitation with current regex API
- Jerry (19/19) Jan 16 2012 Hi all,
- Vladimir Panteleev (3/6) Jan 16 2012 Not sure if this is what you were referring to, but you can do...
- Vladimir Panteleev (4/10) Jan 16 2012 Even simpler: m.captures[1].ptr - s.ptr
- Jerry (2/10) Jan 16 2012 Ah ok, that'll work.
- Mail Mantis (2/4) Jan 16 2012 No, it wouldn't. Somehow, I forgot the rules for pointer ariphmetics. So...
- Nick Sabalausky (3/11) Jan 16 2012 That wouldn't work in @safe mode, would it?
- Timon Gehr (3/16) Jan 16 2012 There is nothing unsafe about the operation, so I'd actually expect it
- Nick Sabalausky (3/23) Jan 16 2012 I thought pointer arithmetic was forbidden in @safe?
- Timon Gehr (5/29) Jan 16 2012 I don't know exactly, since @safe is neither fully specified nor
- Jonathan M Davis (5/9) Jan 17 2012 Pointer arithmetic is definitely forbidden in @safe, but I'm not sure th...
- Don Clugston (7/16) Jan 17 2012 My guess is that safe D is supposed to enforce C pointer semantics.
- Andrei Alexandrescu (5/25) Jan 17 2012 Yah, that C rule is to allow segmented memory architectures work
- Mail Mantis (3/14) Jan 16 2012 Correct me if I'm wrong, but wouldn't this be better:
- Jerry (6/23) Jan 16 2012 I *think* pointer arithmetic handles that. However this is much uglier
Hi all, In general, I'm enjoying the regex respin. However, I ran into one issue that seems to have no clean workaround. Generally, I want to be able to get the start and end indices of matches. With the complete match, this info can be pieced together with match.pre().length and match.hit.length(). However, I can't do this with captures. For an example: I have a string and the regex .*(a).*(b).*(c).*. I want to find where a, b, and c are located when I match. As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets. That seems wasteful. If you look at the ICU and Java regex APIs, you'll see that this information is retrievable. I believe it's available under the covers of the D regex library API too. Can this please be exposed? It's very helpful for doing text processing where you need to be able to align the results of multiple transformations to the input text. Thanks Jerry
Jan 16 2012
On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do... m.pre.length + m.captures[1].ptr - m.hit.ptr
Jan 16 2012
On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
"Vladimir Panteleev" <vladimir thecybershadow.net> writes:On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:Ah ok, that'll work.On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
2012/1/17 Mail Mantis <mail.mantis.88 gmail.com>:Correct me if I'm wrong, but wouldn't this be better: (m_captures[1].ptr - s.ptr) / s[0].sizeof;No, it wouldn't. Somehow, I forgot the rules for pointer ariphmetics. Sorry.
Jan 16 2012
"Vladimir Panteleev" <vladimir thecybershadow.net> wrote in message news:klzeekkilpzwmjmkudhh dfeed.kimsufi.thecybershadow.net...On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:That wouldn't work in safe mode, would it?On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
On 01/17/2012 04:03 AM, Nick Sabalausky wrote:"Vladimir Panteleev"<vladimir thecybershadow.net> wrote in message news:klzeekkilpzwmjmkudhh dfeed.kimsufi.thecybershadow.net...There is nothing unsafe about the operation, so I'd actually expect it to work.On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:That wouldn't work in safe mode, would it?On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
"Timon Gehr" <timon.gehr gmx.ch> wrote in message news:jf2p5d$2ria$1 digitalmars.com...On 01/17/2012 04:03 AM, Nick Sabalausky wrote:I thought pointer arithmetic was forbidden in safe?"Vladimir Panteleev"<vladimir thecybershadow.net> wrote in message news:klzeekkilpzwmjmkudhh dfeed.kimsufi.thecybershadow.net...There is nothing unsafe about the operation, so I'd actually expect it to work.On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:That wouldn't work in safe mode, would it?On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
On 01/17/2012 05:00 AM, Nick Sabalausky wrote:"Timon Gehr"<timon.gehr gmx.ch> wrote in message news:jf2p5d$2ria$1 digitalmars.com...I don't know exactly, since safe is neither fully specified nor implemented. In my understanding, in safe code, operations that may lead to memory corruption are forbidden. Pointer - pointer cannot, other kinds of pointer arithmetic may.On 01/17/2012 04:03 AM, Nick Sabalausky wrote:I thought pointer arithmetic was forbidden in safe?"Vladimir Panteleev"<vladimir thecybershadow.net> wrote in message news:klzeekkilpzwmjmkudhh dfeed.kimsufi.thecybershadow.net...There is nothing unsafe about the operation, so I'd actually expect it to work.On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:That wouldn't work in safe mode, would it?On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
On Tuesday, January 17, 2012 05:04:39 Timon Gehr wrote:I don't know exactly, since safe is neither fully specified nor implemented. In my understanding, in safe code, operations that may lead to memory corruption are forbidden. Pointer - pointer cannot, other kinds of pointer arithmetic may.Pointer arithmetic is definitely forbidden in safe, but I'm not sure that that forbids pointer - pointer, since it's not dangerous. It's changing a pointer via arithmetic which is dangerous. - Jonathan M Davis
Jan 17 2012
On 17/01/12 10:40, Jonathan M Davis wrote:On Tuesday, January 17, 2012 05:04:39 Timon Gehr wrote:My guess is that safe D is supposed to enforce C pointer semantics. At least, code which is both safe and pure must do so. The semantics are currently enforced in CTFE. pointer - pointer is undefined behaviour in C, if the pointers come from different arrays. It's OK if they are from the same array, which is true in this case.I don't know exactly, since safe is neither fully specified nor implemented. In my understanding, in safe code, operations that may lead to memory corruption are forbidden. Pointer - pointer cannot, other kinds of pointer arithmetic may.Pointer arithmetic is definitely forbidden in safe, but I'm not sure that that forbids pointer - pointer, since it's not dangerous. It's changing a pointer via arithmetic which is dangerous. - Jonathan M Davis
Jan 17 2012
On 1/17/12 6:59 AM, Don Clugston wrote:On 17/01/12 10:40, Jonathan M Davis wrote:Yah, that C rule is to allow segmented memory architectures work properly. One possibility for D is to require a flat memory model, in which the difference between any two pointers can be taken. AndreiOn Tuesday, January 17, 2012 05:04:39 Timon Gehr wrote:My guess is that safe D is supposed to enforce C pointer semantics. At least, code which is both safe and pure must do so. The semantics are currently enforced in CTFE. pointer - pointer is undefined behaviour in C, if the pointers come from different arrays. It's OK if they are from the same array, which is true in this case.I don't know exactly, since safe is neither fully specified nor implemented. In my understanding, in safe code, operations that may lead to memory corruption are forbidden. Pointer - pointer cannot, other kinds of pointer arithmetic may.Pointer arithmetic is definitely forbidden in safe, but I'm not sure that that forbids pointer - pointer, since it's not dangerous. It's changing a pointer via arithmetic which is dangerous. - Jonathan M Davis
Jan 17 2012
2012/1/17 Vladimir Panteleev <vladimir thecybershadow.net>:On Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:Correct me if I'm wrong, but wouldn't this be better: (m_captures[1].ptr - s.ptr) / s[0].sizeof;On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012
Mail Mantis <mail.mantis.88 gmail.com> writes:2012/1/17 Vladimir Panteleev <vladimir thecybershadow.net>:I *think* pointer arithmetic handles that. However this is much uglier than: m_captures[1].begin m_captures[1].end JerryOn Tuesday, 17 January 2012 at 01:44:37 UTC, Vladimir Panteleev wrote:Correct me if I'm wrong, but wouldn't this be better: (m_captures[1].ptr - s.ptr) / s[0].sizeof;On Monday, 16 January 2012 at 19:28:42 UTC, Jerry wrote:Even simpler: m.captures[1].ptr - s.ptr (s is the string being matched)As far as I can tell, the only way to do this would be to capture every chunk of text, then iterate to determine the offsets.Not sure if this is what you were referring to, but you can do...
Jan 16 2012