digitalmars.D - earthquake changes of std.regexp to come
- Andrei Alexandrescu (14/14) Feb 17 2009 I'm quite unhappy with the API of std.regexp. It's a chaotic design that...
- bearophile (10/13) Feb 17 2009 I have no problems in accepting changes here too. D2 is already essentia...
- Joel C. Salomon (14/15) Feb 17 2009 So steal one, rather than invent something new. My suggestion would be
- Andrei Alexandrescu (4/19) Feb 17 2009 s/string/input range/
- dsimcha (14/28) Feb 17 2009 As I've said before, anyone who can't stomach breaking changes w/o compl...
- dsimcha (3/31) Feb 17 2009 BTW, can you elaborate on how arrays, both builtin and any library versi...
- Andrei Alexandrescu (34/66) Feb 17 2009 Well finalizations hinges not only on me but on Walter (bugfixes and a
- Yigal Chripun (16/101) Feb 18 2009 I've got a few questions about the proposed container value semantics:
- Yigal Chripun (4/89) Feb 18 2009 Another question regarding the container design - have you considered
- Georg Wrede (11/20) Feb 20 2009 I admit I'm tired right now... You mention disadvantages, the one I
- Andrei Alexandrescu (13/37) Feb 20 2009 Better said, I was too tired when I posted that. I gave too little
- Bill Baxter (9/13) Feb 17 2009 So what do you think it should be, a struct?
- Andrei Alexandrescu (26/40) Feb 17 2009 Well you'd be surprised. The RegEx class saves the state of the last
- bearophile (7/10) Feb 17 2009 (I often use xplit() that is like split but yields items lazily, for lar...
- Bill Baxter (12/53) Feb 17 2009 So that sounds to me like RegEx should have a .dup, and then it would
- Andrei Alexandrescu (7/13) Feb 17 2009 I lost that perspective when criticizing RegExp, you're right. But still...
- Bill Baxter (10/22) Feb 17 2009 Ok. I'm certainly not in love with the API either. Though, the only
- bearophile (5/6) Feb 17 2009 I agree, I too need the Python docs every time I want to use something m...
- Jarrett Billingsley (5/12) Feb 17 2009 Is there ever a situation where you want to use a single regexp for
- Andrei Alexandrescu (17/31) Feb 17 2009 Ehm, that's odd. You'd think that after Perl has set the precedent, it
- Bill Baxter (9/41) Feb 17 2009 All I know is that I found one incantation that works and I've been
- Walter Bright (5/9) Feb 20 2009 std.regexp evolved out of the ECMAscript regex functions - they have the...
- Andrei Alexandrescu (3/13) Feb 20 2009 s/good \(bad\?\)/REALLY BAD/
- Denis Koroskin (4/16) Feb 20 2009 Backward compatibility is almost always a bad thing.
- Andrei Alexandrescu (4/25) Feb 20 2009 In this case it's even worse, as I don't think anyone expects to paste
- Jarrett Billingsley (20/33) Feb 17 2009 Well I don't mean to, uh, toot my own horn but.. I recently bound
- Bill Baxter (9/22) Feb 17 2009 Btw, I've got no problems with you breaking the API of 2.0 either.
- Andrei Alexandrescu (3/29) Feb 17 2009 I was thinking of moving older stuff to etc, is that ok?
- Walter Bright (3/4) Feb 17 2009 Yes. But you should also rename the new one, perhaps to std.regex. That
- Andrei Alexandrescu (5/10) Feb 17 2009 Terrific. I prefer "regex" to "regexp" because it's easier to pronounce,...
- bearophile (4/7) Feb 17 2009 I'd like std.re :-)
- Chris Nicholson-Sauls (7/20) Feb 17 2009 It sounds to me like a frog who, immediately post-utterance, just got
- Leandro Lucarella (11/20) Feb 19 2009 What's the rationale for "etc"? Why not "deprecated", o something shorte...
- Andrei Alexandrescu (3/17) Feb 19 2009 In the words of George Costanza: "Because it's there!"
- Ellery Newcomer (2/21) Feb 19 2009 Shouldn't that be George Mallory?
- Andrei Alexandrescu (14/37) Feb 19 2009 No, he said "because it is there". George said "because it's there":
- Georg Wrede (4/12) Feb 20 2009 With the critique you've given to the existing regexp stuff, deprecated
- Bill Baxter (10/23) Feb 20 2009 Agreed.
- Leandro Lucarella (9/21) Feb 20 2009 Why not "misc" for that? =)
- BCS (4/24) Feb 17 2009 For what it's worth, I have a partial clone of the .NET API built on top...
- Daniel de Kok (8/11) Feb 17 2009 Actually, I was wondering why nobody is considering real regular
- Andrei Alexandrescu (6/16) Feb 17 2009 I am considering that. One nice feature of "classic" regexes is that
- Jarrett Billingsley (12/21) Feb 17 2009 Tango's regex engine is just that. It uses a tagged NFA method.
- bearophile (4/5) Feb 17 2009 A modern CPU is able to do something like 60*2*2E9 operations in that ti...
- BCS (14/31) Feb 17 2009 could this be transitioned to CTFE? you could even have a debug mode tha...
- Jarrett Billingsley (6/19) Feb 17 2009 For what it's worth the Tango regexes actually have a method to output
- BCS (4/28) Feb 17 2009 For any kind of debug, yeah, that's a problem. OTOH for release, as long...
- Chris Nicholson-Sauls (5/28) Feb 17 2009 I feature which I *adore* by the way. So long as the precompiled regex
- Daniel de Kok (3/6) Feb 17 2009 I have only been tinkering with Phobos, but that's good to hear, thanks!
- Daniel de Kok (13/20) Feb 17 2009 Hmmm, define "complex", I suppose it's ok for the general
- Jarrett Billingsley (5/6) Feb 17 2009 \w+([\-+.]\w+)*@\w+([\-.]\w+)*\.\w+([\-.]\w+)*
- BCS (4/16) Feb 17 2009 I wonder how well it would work on this:
- Daniel de Kok (17/23) Feb 17 2009 Hmm, odd. I have translated that regexp to the syntax of the tool that
- Andrei Alexandrescu (4/30) Feb 17 2009 That would be cool; I find the engine in std.regexp rather hard to
- Derek Parnell (8/13) Feb 17 2009 If your changes are going to make things better for coding and maintenan...
I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. Andrei
Feb 17 2009
Don't be too much hard with the good Walter, please :-) One good thing in his designs (in D1) is that they are often simple to use: they give you back much more than you give them. D2 seems to ask much more from the programmer. I agree that the API of regexes in Phobos is not much good, but I think designing a good API for it is quite hard.I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.I have no problems in accepting changes here too. D2 is already essentially another language compared to D1. Regarding regexes of D1 Phobos, it has problems bigger than just the API, in the past I have found some common cases where it is O(n^2) or more. You can see a case of such behaviours here (look at my comments that show what parts are slow, I have also commented out versions that more logical but much slower): http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna&lang=gdc&id=4 If you want to test that code you can generate test data with this other code: http://shootout.alioth.debian.org/debian/benchmark.php?test=fasta&lang=dlang&id=1 Bye, bearophile
Feb 17 2009
bearophile wrote:I agree that the API of regexes in Phobos is not much good, but I think designing a good API for it is quite hard.So steal one, rather than invent something new. My suggestion would be to expose the DFA object, as in Plan 9’s library (documentation at <http://plan9.bell-labs.com/magic/man2html/2/regexp>, implementation at <http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/>, discussion and links to a Unix implementation at <http://swtch.com/~rsc/regexp/>). Simple API: • regcomp: Compile a regexp DFA; • regexec: Apply it to a string, returning a slice of the string that matches the first hit (or an array of slices if parenthesized expressions are used); and • regsub: Apply substitutions to subexpressions of the matching slice. —Joel Salomon
Feb 17 2009
Joel C. Salomon wrote:bearophile wrote:s/string/input range/ Also returning a range instead of an array of slices is more flexible. AndreiI agree that the API of regexes in Phobos is not much good, but I think designing a good API for it is quite hard.So steal one, rather than invent something new. My suggestion would be to expose the DFA object, as in Plan 9’s library (documentation at <http://plan9.bell-labs.com/magic/man2html/2/regexp>, implementation at <http://plan9.bell-labs.com/sources/plan9/sys/src/libregexp/>, discussion and links to a Unix implementation at <http://swtch.com/~rsc/regexp/>). Simple API: • regcomp: Compile a regexp DFA; • regexec: Apply it to a string, returning a slice of the string that matches the first hit (or an array of slices if parenthesized expressions are used); and
Feb 17 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleI'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiAs I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point. I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run. This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them. This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
Feb 17 2009
== Quote from dsimcha (dsimcha yahoo.com)'s article== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBTW, can you elaborate on how arrays, both builtin and any library versions, will work when everything is finalized?I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiAs I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point. I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run. This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them. This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
Feb 17 2009
dsimcha wrote:== Quote from dsimcha (dsimcha yahoo.com)'s articleWell finalizations hinges not only on me but on Walter (bugfixes and a couple of new features) and on all of you with the continuous stream of great suggestions and ideas. Again, without being able to experiment much I don't have a clear idea on how arrays/containers should at best look like. The interesting challenge is accommodating good, precise semantics with the freedom given by garbage collection. Here are some highlights: * Today's T[] will be firmly an incarnation of the random-access range concept, to the extent that all code expecting a random-access range can always be passed a T[] without any impedance adaptation. * $ will be generalized to mean "end of range" even for infinite ranges. * We don't have a solution to address the perils of extending a slice by using ~=. We're considering adding the type T[new], but I'm not sure we should take the hit of a new built-in type constructor, particularly when it's implementable as a library. * Fixed-size arrays will in all likelihood be value types. We couldn't find any other semantics that works. * Containers will have value semantics. * "Resources come and go; memory is forever" is the likely default in D resource management. This means that destroying e.g. an array of File objects will close the underlying files, but will not deallocate the memory allocated for them. In essence, destroying values means calling the destructor but not delete-ing them (unless of course they're on the stack). This approach has a number of disadvantages, but plenty of advantages that compensate them in most applications. * std.matrix will define memory layouts for a variety of popular libraries and also the common means to iterate said layouts. * For those who want containers with reference semantics, they can use the type Class!(T) for any value type T. That includes built-in value types (int, float...) and whichever value containers we define. It's unclear to me whether this is enough to satisfy those in need for complex container hierarchies. Andrei== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBTW, can you elaborate on how arrays, both builtin and any library versions, will work when everything is finalized?I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiAs I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point. I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run. This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them. This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
Feb 17 2009
Andrei Alexandrescu wrote:dsimcha wrote:I've got a few questions about the proposed container value semantics: a) I'd like to be able to do for instance: List lst = new LinkedList(); i.e use interfaces everywhere and especially in functions so that I can switch implementations easily when the need arises. In the above I can choose to use singly or doubly linked list without making changes throughout the code by using the List interface. Will this be possible and how? is D going to get proper struct interfaces? b) it is sometimes useful to have a container!(Base) store references to instances of derived classes, a caconical example of this is a container of Widget class in a UI framework, where you can, for instance iterate over the container and paint all the different kinds of widgets on the screen by calling the virtual paint method of the base class. How can this be implemented with your proposed Class template? -- Yigal== Quote from dsimcha (dsimcha yahoo.com)'s articleWell finalizations hinges not only on me but on Walter (bugfixes and a couple of new features) and on all of you with the continuous stream of great suggestions and ideas. Again, without being able to experiment much I don't have a clear idea on how arrays/containers should at best look like. The interesting challenge is accommodating good, precise semantics with the freedom given by garbage collection. Here are some highlights: * Today's T[] will be firmly an incarnation of the random-access range concept, to the extent that all code expecting a random-access range can always be passed a T[] without any impedance adaptation. * $ will be generalized to mean "end of range" even for infinite ranges. * We don't have a solution to address the perils of extending a slice by using ~=. We're considering adding the type T[new], but I'm not sure we should take the hit of a new built-in type constructor, particularly when it's implementable as a library. * Fixed-size arrays will in all likelihood be value types. We couldn't find any other semantics that works. * Containers will have value semantics. * "Resources come and go; memory is forever" is the likely default in D resource management. This means that destroying e.g. an array of File objects will close the underlying files, but will not deallocate the memory allocated for them. In essence, destroying values means calling the destructor but not delete-ing them (unless of course they're on the stack). This approach has a number of disadvantages, but plenty of advantages that compensate them in most applications. * std.matrix will define memory layouts for a variety of popular libraries and also the common means to iterate said layouts. * For those who want containers with reference semantics, they can use the type Class!(T) for any value type T. That includes built-in value types (int, float...) and whichever value containers we define. It's unclear to me whether this is enough to satisfy those in need for complex container hierarchies. Andrei== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBTW, can you elaborate on how arrays, both builtin and any library versions, will work when everything is finalized?I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiAs I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point. I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run. This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them. This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
Feb 18 2009
Andrei Alexandrescu wrote:dsimcha wrote:Another question regarding the container design - have you considered mutable containers vs. functional style imutable containers? does it make sense to provide both options?== Quote from dsimcha (dsimcha yahoo.com)'s articleWell finalizations hinges not only on me but on Walter (bugfixes and a couple of new features) and on all of you with the continuous stream of great suggestions and ideas. Again, without being able to experiment much I don't have a clear idea on how arrays/containers should at best look like. The interesting challenge is accommodating good, precise semantics with the freedom given by garbage collection. Here are some highlights: * Today's T[] will be firmly an incarnation of the random-access range concept, to the extent that all code expecting a random-access range can always be passed a T[] without any impedance adaptation. * $ will be generalized to mean "end of range" even for infinite ranges. * We don't have a solution to address the perils of extending a slice by using ~=. We're considering adding the type T[new], but I'm not sure we should take the hit of a new built-in type constructor, particularly when it's implementable as a library. * Fixed-size arrays will in all likelihood be value types. We couldn't find any other semantics that works. * Containers will have value semantics. * "Resources come and go; memory is forever" is the likely default in D resource management. This means that destroying e.g. an array of File objects will close the underlying files, but will not deallocate the memory allocated for them. In essence, destroying values means calling the destructor but not delete-ing them (unless of course they're on the stack). This approach has a number of disadvantages, but plenty of advantages that compensate them in most applications. * std.matrix will define memory layouts for a variety of popular libraries and also the common means to iterate said layouts. * For those who want containers with reference semantics, they can use the type Class!(T) for any value type T. That includes built-in value types (int, float...) and whichever value containers we define. It's unclear to me whether this is enough to satisfy those in need for complex container hierarchies. Andrei== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBTW, can you elaborate on how arrays, both builtin and any library versions, will work when everything is finalized?I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiAs I've said before, anyone who can't stomach breaking changes w/o complaining has no business using D2 at this point. I'd rather deal with the aggravation of stuff breaking in the sort run to have a nice language and libraries to go with it in the long run. This whole concept of ranges as you've created them seems to have achieved the the holy grail of both making simple things simple and complex things possible, where "complex things" includes needing code to be efficient, so I can see your reason for wanting to redo all kinds of stuff in them. This compares favorably to C++ STL iterators, which are very flexible and efficient but a huge PITA to use for simple things because the syntax is so low-level and ugly, and to the D1/early D2 way, which gives beautiful, simple notation for the more common cases (basic dynamic arrays), at the expense of flexiblity when doing more complicated things like streams, chaining, strides, etc.
Feb 18 2009
Andrei Alexandrescu wrote:* "Resources come and go; memory is forever" is the likely default in D resource management. This means that destroying e.g. an array of File objects will close the underlying files, but will not deallocate the memory allocated for them. In essence, destroying values means calling the destructor but not delete-ing them (unless of course they're on the stack). This approach has a number of disadvantages, but plenty of advantages that compensate them in most applications.I admit I'm tired right now... You mention disadvantages, the one I can't avoid thinking of is memory leak! Which means you can't write e.g. a simple web server that opens and closes files, instead of creating and managing a file object pool? Eventually it'll run out of memory, unless I'm way too tired now...* std.matrix will define memory layouts for a variety of popular libraries and also the common means to iterate said layouts.I assume this is for handy and practical rectangular (and cubic, etc.) "arrays". Which would be most welcome. This "memory is forever" philosophy, is this discussed in depth somewhere? (With the current amount of traffic here, I simply can't follow every thread anymore. :-( )
Feb 20 2009
Georg Wrede wrote:Andrei Alexandrescu wrote:Better said, I was too tired when I posted that. I gave too little detail. Files are resources, so they will "come and go", i.e. will be under deterministic control; there's no need to worry. Only memory will have a "lives forever" regime for safety reasons. It's not really forever as the GC collects it. In short, my proposed system is to admit that GC is good _only_ for memory, and that deterministic management must prevail for other resources. I'll get back later on this.* "Resources come and go; memory is forever" is the likely default in D resource management. This means that destroying e.g. an array of File objects will close the underlying files, but will not deallocate the memory allocated for them. In essence, destroying values means calling the destructor but not delete-ing them (unless of course they're on the stack). This approach has a number of disadvantages, but plenty of advantages that compensate them in most applications.I admit I'm tired right now... You mention disadvantages, the one I can't avoid thinking of is memory leak! Which means you can't write e.g. a simple web server that opens and closes files, instead of creating and managing a file object pool? Eventually it'll run out of memory, unless I'm way too tired now...I decided to curb my posting as well. Beyond a point even passable content becomes just white noise. Also since we don't have an off-topic group, off-topic discussions tend to carry on here as well and are not trivial to ignore. I'm happy they are civilized (congrats to all involved). Andrei* std.matrix will define memory layouts for a variety of popular libraries and also the common means to iterate said layouts.I assume this is for handy and practical rectangular (and cubic, etc.) "arrays". Which would be most welcome. This "memory is forever" philosophy, is this discussed in depth somewhere? (With the current amount of traffic here, I simply can't follow every thread anymore. :-( )
Feb 20 2009
On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.So what do you think it should be, a struct? That would imply to me that everybody's favorite pastime is making value copies of regex structures, when in fact nobody does that. Regex is a class in order to give it reference semantics and provide encapsulation of some re-usable state. Maybe it should be a final class, but my impression is "final class" doesn't really work in D. --bb
Feb 17 2009
Bill Baxter wrote:On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yes.Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.So what do you think it should be, a struct?That would imply to me that everybody's favorite pastime is making value copies of regex structures, when in fact nobody does that.Well you'd be surprised. The RegEx class saves the state of the last search, which is a sensible thing to do. But then consider a simple range Splitter that, when iterated, nicely gives you... string a = ",a, bcd, def,gh,"; foreach (e; splitter(a, pattern(", *")) writeln("[", e, "]"); writes [] [a] [bcd] [def] [gh] This is similar to the function std.regex.split with the notable difference that no extra memory is allocated. Now Splitter is an input range. This means you wouldn't expect that you copy a Splitter and then have iterating the original value affect the copy. Well, that's exactly what happens when you use the "good" reference semantics of the RegEx stored inside splitter. Worse, RegExp has no cloning primitive, so I need to resort to storing the pattern and recompiling it from scratch at every copy of Splitter. So essentially the "good" semantics of RegEx are useless when it comes to composing it in larger objects.Regex is a class in order to give it reference semantics and provide encapsulation of some re-usable state. Maybe it should be a final class, but my impression is "final class" doesn't really work in D.Re-usable state is provided by structs too. In addition they can choose value vs. reference semantics with ease. Andrei
Feb 17 2009
Andrei Alexandrescu:string a = ",a, bcd, def,gh,"; foreach (e; splitter(a, pattern(", *")) writeln("[", e, "]");(I often use xplit() that is like split but yields items lazily, for larger strings it's much faster). A better approach is to fuse the xsplit and such xsplitter function in a single lazy generator that can take as a second argument a string or char or RE pattern. A 3rd optional argument can be the max number of splits (so after such max it yields all the rest of the string). You can then add an eager splitter function with the same signature, that outputs an array. Bye, bearophile
Feb 17 2009
On Wed, Feb 18, 2009 at 6:56 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Bill Baxter wrote:So that sounds to me like RegEx should have a .dup, and then it would be fine, no? I agree it should have a dup for the odd occasion when you do want to make a copy for some reason.On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yes.Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it.So what do you think it should be, a struct?That would imply to me that everybody's favorite pastime is making value copies of regex structures, when in fact nobody does that.Well you'd be surprised. The RegEx class saves the state of the last search, which is a sensible thing to do. But then consider a simple range Splitter that, when iterated, nicely gives you... string a = ",a, bcd, def,gh,"; foreach (e; splitter(a, pattern(", *")) writeln("[", e, "]"); writes [] [a] [bcd] [def] [gh] This is similar to the function std.regex.split with the notable difference that no extra memory is allocated. Now Splitter is an input range. This means you wouldn't expect that you copy a Splitter and then have iterating the original value affect the copy. Well, that's exactly what happens when you use the "good" reference semantics of the RegEx stored inside splitter. Worse, RegExp has no cloning primitive, so I need to resort to storing the pattern and recompiling it from scratch at every copy of Splitter. So essentially the "good" semantics of RegEx are useless when it comes to composing it in larger objects.Regex is a class in order to give it reference semantics and provide encapsulation of some re-usable state. Maybe it should be a final class, but my impression is "final class" doesn't really work in D.Re-usable state is provided by structs too. In addition they can choose value vs. reference semantics with ease.I think this choice is not so much available with D1, plus the constructor situation with D1 is less than ideal. Given that, I think the choice of class for RegEx was apropriate. But if the struct problems are all going away in D2, then that's great. Sounds like you're saying we'll really be able to use D structs just like one uses a non-polymorphic C++ class. If so, then that's super. --bb
Feb 17 2009
Bill Baxter wrote:I think this choice is not so much available with D1, plus the constructor situation with D1 is less than ideal. Given that, I think the choice of class for RegEx was apropriate. But if the struct problems are all going away in D2, then that's great. Sounds like you're saying we'll really be able to use D structs just like one uses a non-polymorphic C++ class. If so, then that's super.I lost that perspective when criticizing RegExp, you're right. But still the API is lousy - every single time I am using a RegExp, I find myself fumbling through the thoroughly overlapping primitives in the documentation, and never seem to find an idiom that's simple, comfortable, and memorable. Andrei
Feb 17 2009
On Wed, Feb 18, 2009 at 7:44 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Bill Baxter wrote:Ok. I'm certainly not in love with the API either. Though, the only RegEx API I've ever used that felt totally comfortable with was Perl's, which in large part is syntax instead of an API. Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember. (http://docs.python.org/library/re.html) --bbI think this choice is not so much available with D1, plus the constructor situation with D1 is less than ideal. Given that, I think the choice of class for RegEx was apropriate. But if the struct problems are all going away in D2, then that's great. Sounds like you're saying we'll really be able to use D structs just like one uses a non-polymorphic C++ class. If so, then that's super.I lost that perspective when criticizing RegExp, you're right. But still the API is lousy - every single time I am using a RegExp, I find myself fumbling through the thoroughly overlapping primitives in the documentation, and never seem to find an idiom that's simple, comfortable, and memorable.
Feb 17 2009
Bill Baxter:Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember.<I agree, I too need the Python docs every time I want to use something more than the basics. The syntax for group catching too is bad (groups? group? itersomething? etc). I have proposed an improvement (using [5] to grab the 5th group() but it was not implemented. Such syntax is possible in D too *hint*). It's because of situations like this that I say that designing a good API for std.re isn't easy at all. It will require care, brain, and maybe two or more tries :-) Bye, bearophile
Feb 17 2009
On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:Ok. I'm certainly not in love with the API either. Though, the only RegEx API I've ever used that felt totally comfortable with was Perl's, which in large part is syntax instead of an API. Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember. (http://docs.python.org/library/re.html)Is there ever a situation where you want to use a single regexp for both matching _and_ searching? And if not, couldn't you just use ^ to anchor it? I never understood why Python's API makes such a distinction.
Feb 17 2009
Jarrett Billingsley wrote:On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:Ehm, that's odd. You'd think that after Perl has set the precedent, it would be hard to do major goofs in designing a regex API. By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex. But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs. AndreiOk. I'm certainly not in love with the API either. Though, the only RegEx API I've ever used that felt totally comfortable with was Perl's, which in large part is syntax instead of an API. Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember. (http://docs.python.org/library/re.html)Is there ever a situation where you want to use a single regexp for both matching _and_ searching? And if not, couldn't you just use ^ to anchor it? I never understood why Python's API makes such a distinction.
Feb 17 2009
On Wed, Feb 18, 2009 at 11:38 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Jarrett Billingsley wrote:All I know is that I found one incantation that works and I've been copy-pasting that every since. :-)On Tue, Feb 17, 2009 at 7:13 PM, Bill Baxter <wbaxter gmail.com> wrote:Ehm, that's odd. You'd think that after Perl has set the precedent, it would be hard to do major goofs in designing a regex API. By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex.Ok. I'm certainly not in love with the API either. Though, the only RegEx API I've ever used that felt totally comfortable with was Perl's, which in large part is syntax instead of an API. Python's syntax I have to look over the documentation every time I use it, too. Maybe it's because of the "matching" vs "searching" distinction that I find impossible to remember. (http://docs.python.org/library/re.html)Is there ever a situation where you want to use a single regexp for both matching _and_ searching? And if not, couldn't you just use ^ to anchor it? I never understood why Python's API makes such a distinction.But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs.Maybe "design" is too strong a word. Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great. --bb
Feb 17 2009
Bill Baxter wrote:Maybe "design" is too strong a word. Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great.std.regexp evolved out of the ECMAscript regex functions - they have the same names and functionality. Layered on top of that was ruby-like names and functionality. It's a good (bad?) example of an api evolving without sacrificing backwards compatibility.
Feb 20 2009
Walter Bright wrote:Bill Baxter wrote:s/good \(bad\?\)/REALLY BAD/ AndreiMaybe "design" is too strong a word. Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great.std.regexp evolved out of the ECMAscript regex functions - they have the same names and functionality. Layered on top of that was ruby-like names and functionality. It's a good (bad?) example of an api evolving without sacrificing backwards compatibility.
Feb 20 2009
On Fri, 20 Feb 2009 16:35:54 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Walter Bright wrote:Backward compatibility is almost always a bad thing. Look what's happened to C++ and OpenGL.Bill Baxter wrote:s/good \(bad\?\)/REALLY BAD/ AndreiMaybe "design" is too strong a word. Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great.std.regexp evolved out of the ECMAscript regex functions - they have the same names and functionality. Layered on top of that was ruby-like names and functionality. It's a good (bad?) example of an api evolving without sacrificing backwards compatibility.
Feb 20 2009
Denis Koroskin wrote:On Fri, 20 Feb 2009 16:35:54 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:In this case it's even worse, as I don't think anyone expects to paste their Ruby code and compile it with dmd. AndreiWalter Bright wrote:Backward compatibility is almost always a bad thing. Look what's happened to C++ and OpenGL.Bill Baxter wrote:s/good \(bad\?\)/REALLY BAD/ AndreiMaybe "design" is too strong a word. Most Phobos modules seem to have been put together rather hastily in order to fill a pressing need. Often *something* is better than nothing at all, even if the something is not so great.std.regexp evolved out of the ECMAscript regex functions - they have the same names and functionality. Layered on top of that was ruby-like names and functionality. It's a good (bad?) example of an api evolving without sacrificing backwards compatibility.
Feb 20 2009
On Tue, Feb 17, 2009 at 9:38 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:By the way, the more I dig into std.regexp, the stiffer the hair on my neck gets. Get this: the API offers both global functions and member functions, with both RegExp and plain string arguments. The latter are carefully designed to maximize the number of clashes, potential confusions, and errors when using both std.string and std.regex. But wait, there's more. The API defines the following functions that all ostensibly do some sort of mattern patching (sic): find, search, test, match, and exec. I wish I were kidding. There's some opIndex and opEquals thrown in for good measure. Knuth wouldn't know what each of them does after studying them for a week and then watching an episode from "The Bachelor". And get this: global search() does not do what member search() does. Nope. Global search() does what member test() does. I have only contempt for such designs.Well I don't mean to, uh, toot my own horn but.. I recently bound libpcre to MiniD and came up with a relatively simple but powerful and orthogonal API. http://www.dsource.org/projects/minid/wiki/Addons/PcreLib#LibraryReference The regex object has a single "subject" string at a time, the string that it's matching against. The subject is set with "search" and "test" does everything. All other functions are basically defined in terms of those two. "test" looks for the next match of the regex in the subject and returns true if it matched. "match" returns match groups (0 for the whole regex and 1..n for subgroups, as well as string indices for named subgroups). opApply is just a quicker way of writing something like: re.search(someSubject) while(re.test()) // use re.match to get matches You'll notice that opApply is also just defined in terms of test. I've found it far more intuitive than other APIs. I've never used Perl and I doubt I ever will, though.
Feb 17 2009
On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements. --bb
Feb 17 2009
Bill Baxter wrote:On Wed, Feb 18, 2009 at 3:36 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I was thinking of moving older stuff to etc, is that ok? AndreiI'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.
Feb 17 2009
Andrei Alexandrescu wrote:I was thinking of moving older stuff to etc, is that ok?Yes. But you should also rename the new one, perhaps to std.regex. That way, legacy code will refuse to compile, rather than compile wrongly.
Feb 17 2009
Walter Bright wrote:Andrei Alexandrescu wrote:Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, particularly if you're a foreigner. "Regex" sounds like a frog utterance by a forest lake, "regexp" sounds like nothing in particular. AndreiI was thinking of moving older stuff to etc, is that ok?Yes. But you should also rename the new one, perhaps to std.regex. That way, legacy code will refuse to compile, rather than compile wrongly.
Feb 17 2009
Andrei Alexandrescu:Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, particularly if you're a foreigner. "Regex" sounds like a frog utterance by a forest lake, "regexp" sounds like nothing in particular.I'd like std.re :-) Bye, bearophile
Feb 17 2009
Andrei Alexandrescu wrote:Walter Bright wrote:It sounds to me like a frog who, immediately post-utterance, just got gigged. I guess that makes "regex" sound even better... as its still alive (sounding). -- Chris Nicholson-Sauls -- Who so far agrees with pretty much everything you've said, and therefore has no real contribution...Andrei Alexandrescu wrote:Terrific. I prefer "regex" to "regexp" because it's easier to pronounce, particularly if you're a foreigner. "Regex" sounds like a frog utterance by a forest lake, "regexp" sounds like nothing in particular. AndreiI was thinking of moving older stuff to etc, is that ok?Yes. But you should also rename the new one, perhaps to std.regex. That way, legacy code will refuse to compile, rather than compile wrongly.
Feb 17 2009
Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:What's the rationale for "etc"? Why not "deprecated", o something shorter like "old", or "d1" (this last one could be good for future deprecated libraries, like when D3 is available there probably be a "d2" too). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Hey you, don't tell me there's no hope at all Together we stand, divided we fall.Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 19 2009
Leandro Lucarella wrote:Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:In the words of George Costanza: "Because it's there!" AndreiWhat's the rationale for "etc"? Why not "deprecated", o something shorter like "old", or "d1" (this last one could be good for future deprecated libraries, like when D3 is available there probably be a "d2" too).Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 19 2009
Andrei Alexandrescu wrote:Leandro Lucarella wrote:Shouldn't that be George Mallory?Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:In the words of George Costanza: "Because it's there!" AndreiWhat's the rationale for "etc"? Why not "deprecated", o something shorter like "old", or "d1" (this last one could be good for future deprecated libraries, like when D3 is available there probably be a "d2" too).Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 19 2009
Ellery Newcomer wrote:Andrei Alexandrescu wrote:No, he said "because it is there". George said "because it's there": http://www.classictvquotes.com/quotes/characters/george-costanza/page_14.html George: So, she fell, and then she started screaming, "My back! My back!" So, I picked her up and took her to the hospital. Elaine: How is she? George: She's in traction. Elaine: Okay, I'm sorry. George: It's not funny, Elaine. Elaine: I know. I'm sorry. I'm serious. George: Her back went out. She's gotta be there for a couple of days. All she said on the way over in the car was, "Why, George, why?!" I said, "Because it's there!" AndreiLeandro Lucarella wrote:Shouldn't that be George Mallory?Andrei Alexandrescu, el 17 de febrero a las 13:56 me escribiste:In the words of George Costanza: "Because it's there!" AndreiWhat's the rationale for "etc"? Why not "deprecated", o something shorter like "old", or "d1" (this last one could be good for future deprecated libraries, like when D3 is available there probably be a "d2" too).Btw, I've got no problems with you breaking the API of 2.0 either. Though you might consider moving the current implementation to std.deprecated.regex and leaving it there for a year with a pragma(msg, "This module is deprecated"). That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 19 2009
Andrei Alexandrescu wrote:With the critique you've given to the existing regexp stuff, deprecated would be the obvious choice. Then we could have etc for Miscellaneous Stuff.In the words of George Costanza: "Because it's there!"What's the rationale for "etc"? Why not "deprecated"That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 20 2009
On Sat, Feb 21, 2009 at 2:38 AM, Georg Wrede <georg.wrede iki.fi> wrote:Andrei Alexandrescu wrote:Agreed. etc implies to me that it's stuff that might be useful sometimes but not very often. It does not suggest to me that you shouldn't use it if you can avoid it. Or how about make it std.etc.deprecated.regexp That way it's clear that it's *both* something that might be useful occasionally but something that you should avoid if possible. ... Deprecated is a keyword though, isn't it. Dang. :-P --bbWith the critique you've given to the existing regexp stuff, deprecated would be the obvious choice. Then we could have etc for Miscellaneous Stuff.In the words of George Costanza: "Because it's there!"What's the rationale for "etc"? Why not "deprecated"That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 20 2009
Georg Wrede, el 20 de febrero a las 19:38 me escribiste:Andrei Alexandrescu wrote:Why not "misc" for that? =) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- TIGRE SE COMIO A EMPLEADO DE CIRCO: DETUVIERON A DUEÑO Y DOMADOR -- Crónica TVWith the critique you've given to the existing regexp stuff, deprecated would be the obvious choice. Then we could have etc for Miscellaneous Stuff.In the words of George Costanza: "Because it's there!"What's the rationale for "etc"? Why not "deprecated"That way making a quick fix to broken code is just a matter of inserting ".deprecated" into your import statements.I was thinking of moving older stuff to etc, is that ok?
Feb 20 2009
Reply to Andrei,I'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiFor what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.
Feb 17 2009
On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:For what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition. Take care, Daniel
Feb 17 2009
Daniel de Kok wrote:On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:I am considering that. One nice feature of "classic" regexes is that they never backtrack, so they work with pure input iterators. This has crucial consequences with regard to where and how regexes fit the range concept hierarchy. AndreiFor what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition.
Feb 17 2009
On Tue, Feb 17, 2009 at 2:47 PM, Daniel de Kok <me danieldk.org> wrote:On Tue, Feb 17, 2009 at 8:39 PM, BCS <ao pathlink.com> wrote:Tango's regex engine is just that. It uses a tagged NFA method. http://www.dsource.org/projects/tango/docs/current/tango.text.Regex.html The problem with this method is that while it's certainly faster to match, it's MUCH slower to compile. There are no pathological matches; only pathological compiles ;) I'm talking 60-70 seconds to compile a more complex regex. This might be an acceptable tradeoff for when you need to compile a regex in a long-running app like a server, but it's completely unacceptable for most small, Perl-like text munging programs. Unless of course this slowdown is unique to Tango's implementation of this method!For what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition.
Feb 17 2009
Jarrett Billingsley:I'm talking 60-70 seconds to compile a more complex regex.<A modern CPU is able to do something like 60*2*2E9 operations in that time, DMD needs 6 seconds or less to compile about 60000-80000 lines of my D code, so I think it's a bit too much time (probably 100 or 1000 times too much). Bye, bearophile
Feb 17 2009
Reply to Jarrett,On Tue, Feb 17, 2009 at 2:47 PM, Daniel de Kok <me danieldk.org> wrote:could this be transitioned to CTFE? you could even have a debug mode that delays till runtime RegEx mather = new CTFERegEx!("some regex"); class CTFERegEx(char[] regex) : RegEx { debug(NoCTFE) static char[] done; else static const char[] done = CTFECompile(regex); public this() { debug(NoCTFE) if(done == null) done = CTFECompile(regex); base(done) } }Actually, I was wondering why nobody is considering real regular languages anymore, that can be compiled to a normal finite state recognizer or transducer. While this may not be as fancy as Perl-like extensions, they are much faster, and it's easier to do fun stuff such as composition.Tango's regex engine is just that. It uses a tagged NFA method. http://www.dsource.org/projects/tango/docs/current/tango.text.Regex.ht ml The problem with this method is that while it's certainly faster to match, it's MUCH slower to compile. There are no pathological matches; only pathological compiles ;) I'm talking 60-70 seconds to compile a more complex regex.
Feb 17 2009
On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:could this be transitioned to CTFE? you could even have a debug mode that delays till runtime RegEx mather = new CTFERegEx!("some regex"); class CTFERegEx(char[] regex) : RegEx { debug(NoCTFE) static char[] done; else static const char[] done = CTFECompile(regex); public this() { debug(NoCTFE) if(done == null) done = CTFECompile(regex); base(done) } }For what it's worth the Tango regexes actually have a method to output a D function that will implement the regex after it's compiled. So you _could_ precompile the regex into D code and use that. But seriously, man - if something takes 60 seconds to complete at _runtime_, making it CTFE would simply make your computer explode.
Feb 17 2009
Reply to Jarrett,On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:For any kind of debug, yeah, that's a problem. OTOH for release, as long as it /does/ compile, who cares? How many real release builds does anyone do a week?could this be transitioned to CTFE? you could even have a debug mode that delays till runtime RegEx mather = new CTFERegEx!("some regex"); class CTFERegEx(char[] regex) : RegEx { debug(NoCTFE) static char[] done; else static const char[] done = CTFECompile(regex); public this() { debug(NoCTFE) if(done == null) done = CTFECompile(regex); base(done) } }For what it's worth the Tango regexes actually have a method to output a D function that will implement the regex after it's compiled. So you _could_ precompile the regex into D code and use that. But seriously, man - if something takes 60 seconds to complete at _runtime_, making it CTFE would simply make your computer explode.
Feb 17 2009
Jarrett Billingsley wrote:On Tue, Feb 17, 2009 at 3:16 PM, BCS <ao pathlink.com> wrote:I feature which I *adore* by the way. So long as the precompiled regex is "guaranteed" to run at best possible performance (hand-rolled, hand-optimized solutions notwithstanding) I for one prefer them. -- Chris Nicholson-Saulscould this be transitioned to CTFE? you could even have a debug mode that delays till runtime RegEx mather = new CTFERegEx!("some regex"); class CTFERegEx(char[] regex) : RegEx { debug(NoCTFE) static char[] done; else static const char[] done = CTFECompile(regex); public this() { debug(NoCTFE) if(done == null) done = CTFECompile(regex); base(done) } }For what it's worth the Tango regexes actually have a method to output a D function that will implement the regex after it's compiled. So you _could_ precompile the regex into D code and use that.
Feb 17 2009
On Tue, Feb 17, 2009 at 9:26 PM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:For what it's worth the Tango regexes actually have a method to output a D function that will implement the regex after it's compiled. So you _could_ precompile the regex into D code and use that.I have only been tinkering with Phobos, but that's good to hear, thanks!
Feb 17 2009
On Tue, Feb 17, 2009 at 8:57 PM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:The problem with this method is that while it's certainly faster to match, it's MUCH slower to compile. There are no pathological matches; only pathological compiles ;) I'm talking 60-70 seconds to compile a more complex regex. This might be an acceptable tradeoff for when you need to compile a regex in a long-running app like a server, but it's completely unacceptable for most small, Perl-like text munging programs.Hmmm, define "complex", I suppose it's ok for the general line-splitting/matching stuff? I got into trouble (time-wise) when we compiled a part of speech tagger into a transducer. In those cases we generally pre-compile stuff, and output it as a large struct in the target language. Of course, it would be fun if we can do it at compile-time ;). Besides that, if we'd have a good general recognizer/transducer implementation it could also be used for compact dictionary storage, perfect hashing automata, etc. Take care, Daniel
Feb 17 2009
On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org> wrote:Hmmm, define "complex"\w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)* This is a simple email regexp. This takes about 4 or 5 seconds to compile on my lappy (Pentium M). It only goes up from there.
Feb 17 2009
Reply to Jarrett,On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org> wrote:I wonder how well it would work on this: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html :bHmmm, define "complex"\w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)* This is a simple email regexp. This takes about 4 or 5 seconds to compile on my lappy (Pentium M). It only goes up from there.
Feb 17 2009
On Tue, Feb 17, 2009 at 9:50 PM, Jarrett Billingsley <jarrett.billingsley gmail.com> wrote:On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me danieldk.org> wrote:Hmm, odd. I have translated that regexp to the syntax of the tool that we used, that is written in Prolog (it is generally a constant factor slower than C/C++/D equivalents). Generating a minimized DFA takes far less than a second. I used the following expression (abstracted a bit with macros): --- macro(letter, {a..z, 'A'..'Z'}). macro(punctlet,[{-,+,.},letter+]). macro(dompunctlet,[{-,.},letter+]). macro(email,[letter+,punctlet*, ,letter+,dompunctlet*,.,letter+,dompunctlet*]). --- The software is available from: http://www.let.rug.nl/~vannoord/Fsa/fsa.html Take care, DanielHmmm, define "complex"\w+([\-+.]\w+)* \w+([\-.]\w+)*\.\w+([\-.]\w+)* This is a simple email regexp. This takes about 4 or 5 seconds to compile on my lappy (Pentium M).
Feb 17 2009
BCS wrote:Reply to Andrei,That would be cool; I find the engine in std.regexp rather hard to understand. AndreiI'm quite unhappy with the API of std.regexp. It's a chaotic design that provides a hodgepodge of functionality and tries to use as many synonyms of "to find" in the dictionary (e.g. search, match). I could swear Walter never really cared for using regexps, and that is felt throughout the design: it fills the bullet point but it's asinine to use. Besides std.regexp only works with (narrow) strings and we want it to work on streams of all widths and structures. One pet complaint I have is that std.regexp puts a class around it all as if everybody's favorite pastime would be to inherit Regexp and override some random function in it. In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being. AndreiFor what it's worth, I have a partial clone of the .NET API built on top of PCRE. I would have to ask my boss but I expect I could donate it if anyone want to use it as a basis.
Feb 17 2009
On Tue, 17 Feb 2009 10:36:06 -0800, Andrei Alexandrescu wrote:I'm quite unhappy with the API of std.regexp.I was so happy with using it I wrote my own simplified regex ;-)In the upcoming releases of D 2.0 there will be rather dramatic breaking changes of phobos. I just wanted to ask whether y'all could stomach yet another rewritten API or you'd rather use std.regexp as it is for the time being.If your changes are going to make things better for coding and maintenance then go for it. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Feb 17 2009