www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1750] New: RegExp: lack of support for wchar, dchar; lack of lookingAt() method

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1750

           Summary: RegExp: lack of support for wchar, dchar; lack of
                    lookingAt() method
           Product: D
           Version: 2.008
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: aarti interia.pl


1. RegExp should work for at least wchar & dchar. Maybe also for integral array
types (e.g. int[]).

2. There is no bool lookingAt() method which tries to match string at its
beginning and if it doesn't match return. For reference:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html
Currently it is very ineffective to match pattern in incoming stream of data.
Solution with lookingAt() will be much faster.


-- 
Dec 26 2007
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1750


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 11 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1750




11:37:42 PDT ---
The new RegEx supports wchar and dchar. Regarding lookingAt(), I'm unclear: how
is it different from searching for a pattern starting with the anchor "^"?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1750




---
lookingAt() can be used on streams without a need for getting whole string from
stream. Also ^ can not be used for matching some specific pattern in stream.
You just can not assume that your input is starting after line end. Input can
even not be splitted into lines.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1750


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED



01:45:41 PDT ---
Ok. Meant to do it for ages.
The second point rised in this bug report has no proof, and, in fact, is
invalid.
Truth of the matter is that looking through all of Java's regex documentation I
observe:
1. There is no such thing as regex on stream in Java, all objects it works on
are  3 variants of character buffers i.e. wrapped arrays and it's ilk.
2. lookingAt is indeed equivalent to appending '^' to a regex pattern, and as
far as performance concerns go both versions should use the same optimization,
namely "no search" optimization. And at least current std.regex does optimize
for '^' _somewhere_ at start e.g. sily things like "(^...)..." still get
optimized.
3. Due to implementation details of Java-style regex there is no way it can to
work directly on stream and keep all it's syntax features, even if tried to do
so, the problem common to all backtracking engines. And yes, in some cases it
has to walk the entire input to make sure it matched what it should match.

Marking as fixed as the first point of the report was solved long ago, the
second isinvalid as is. It also rises a good point on however that was
accounted for already.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 12 2012