digitalmars.D - Why does readln include the line terminator?
- Georg Wrede (9/9) Apr 13 2009 Readln returns a string which contains the line terminator.
- Daniel Keep (7/20) Apr 14 2009 Because if it stripped it, there's no way to know what it was. If you
- Walter Bright (8/11) Apr 14 2009 That's right; there are currently at least 6 different line terminators:
- Georg Wrede (20/32) Apr 14 2009 So the programmer who wants to write portable code, has to implement
- BCS (4/9) Apr 14 2009 Only if you considering wanting to maintain merge-ability/diff-ability a...
- Georg Wrede (6/17) Apr 14 2009 Doesn't this kind of prove my point? Changing a line ending /should not/...
- BCS (2/16) Apr 14 2009 I make no assertions about what should be, only what is.
- bearophile (8/11) Apr 14 2009 You use a string function or string method that removes the eventually p...
- Georg Wrede (2/13) Apr 14 2009 Your code ends up printing the output on every other line.
- Andrei Alexandrescu (3/5) Apr 14 2009 25 years and no networking code?
- Steven Schveighoffer (5/9) Apr 14 2009 Been writing code for about 12 years, lots and lots of networking code. ...
- Sean Kelly (6/17) Apr 14 2009 With HTTP, for example, lines are terminated with \r\n. The lines
- Georg Wrede (4/9) Apr 14 2009 I can see having to use one or another line ending in the whole output
- Nick Sabalausky (5/8) Apr 14 2009 Source code with unescaped nl's/cr's embedded in a string literal? Thoug...
- Andrei Alexandrescu (31/41) Apr 14 2009 I think there are a few concerns when designing an API for reading
- Daniel Keep (7/20) Apr 14 2009 Why not:
- Andrei Alexandrescu (3/26) Apr 14 2009 And how about when sep is elaborate (e.g. regex)?
- Daniel Keep (6/44) Apr 14 2009 Whatever was matched. If we have a file containing:
- Andrei Alexandrescu (3/52) Apr 14 2009 Where did you specify the separator in the call to byLine?
- Steven Schveighoffer (25/71) Apr 14 2009 I think he's not read the docs. Consider this usage instead:
- Christopher Wright (4/5) Apr 15 2009 Why specify anything at compile time when a user could reasonably
- Robert Fraser (3/10) Apr 15 2009 Yes, and for maximum abstraction, the config file should be stored as
- Christopher Wright (5/16) Apr 15 2009 I just really hate to see templates when a regular function would
- Steven Schveighoffer (9/24) Apr 15 2009 It's just a demonstration of what the OP was talking about but wasn't
- Georg Wrede (2/16) Apr 15 2009
- Stewart Gordon (21/26) Apr 14 2009 But readln only stops on '\n' (or whatever character you tell it to
- Christopher Wright (5/18) Apr 14 2009 By default, tango does not exhibit this behavior. If you wish, you can
- Georg Wrede (12/32) Apr 14 2009 Now this is more like it. The default should really be (in Phobos too)
- Manfred Nowak (5/7) Apr 14 2009 This is false in case of simple copying. And I doubt, that for more
- Denis Koroskin (2/9) Apr 14 2009 Tango does the best by having an optional parameter that denotes whether...
- Georg Wrede (6/14) Apr 14 2009 For copying there is the operating system command, copy.
- Manfred Nowak (3/4) Apr 14 2009 Agreed.
- Kagamin (2/13) Apr 15 2009 I think, only (d) is important, all others are *strange* things. I usual...
- Stewart Gordon (5/21) Apr 15 2009 So you expect text editors to discard both kinds of information?
- Kagamin (2/3) Apr 16 2009 No. Text editor is a *specialized* text processing tool and it usually u...
Readln returns a string which contains the line terminator. Is there a grand reason for this? Currently there are a few drawbacks with this. The naive user doesn't expect it, and the seasoned user has to keep stripping it. And then he has to search the docs (or get hold of other OSs) to determine what terminator to expect on other systems. And it can't really be a speed optimization either, because to do anything useful with a string, you have to strip the terminator anyway at some point.
Apr 13 2009
Georg Wrede wrote:Readln returns a string which contains the line terminator. Is there a grand reason for this? Currently there are a few drawbacks with this. The naive user doesn't expect it, and the seasoned user has to keep stripping it. And then he has to search the docs (or get hold of other OSs) to determine what terminator to expect on other systems. And it can't really be a speed optimization either, because to do anything useful with a string, you have to strip the terminator anyway at some point.Because if it stripped it, there's no way to know what it was. If you want to do per-line processing but don't want to clobber the line endings, readln has to return the line terminator. Besides which, it's a single function call to strip it off irrespective of OS. -- Daniel
Apr 14 2009
Daniel Keep wrote:Because if it stripped it, there's no way to know what it was. If you want to do per-line processing but don't want to clobber the line endings, readln has to return the line terminator.That's right; there are currently at least 6 different line terminators: CR LF CRLF FF PS LS
Apr 14 2009
Walter Bright wrote:Daniel Keep wrote:Who wants to receive a line with varying line endings anyway???Because if it stripped it, there's no way to know what it was. If you want to do per-line processing but don't want to clobber the line endings, readln has to return the line terminator.That's right; there are currently at least 6 different line terminators: CR LF CRLF FF PS LSSo the programmer who wants to write portable code, has to implement awareness for all of these cases, in each of his programs? This seems a bit laborious. Replacing stuff at the end of the string forces him to check, for *each* line, the length of the terminator, and then use ...$-1 and at other times ...$-2, etc. in his code. In 25 years of computing, I have yet to see a file where variation of line termintators in the file contained some /deliberate/ information. And the only purpose for keeping the line endings would be to edit files while preserving the particular line terminator for each line. Which raises the question, how do you decide which terminator to use if you've inserted a line? So the whole point is absurd. A reasonable default behavior for a file mongering program would be to output line terminators according to the operating system default. The case where one *wants* to preserve them, should be considered the exception. I'm simply asking for the default to be to strip the terminator, thus relieving the programmer from, imho, gratuituos labor. You can still preserve the current functionality as an option.
Apr 14 2009
Reply to Georg,So the whole point is absurd. A reasonable default behavior for a file mongering program would be to output line terminators according to the operating system default. The case where one *wants* to preserve them, should be considered the exception.Only if you considering wanting to maintain merge-ability/diff-ability as the exception. Some, if not most, source control/diff/merge tools consider changes in line endings as changes.
Apr 14 2009
BCS wrote:Reply to Georg,Doesn't this kind of prove my point? Changing a line ending /should not/ be a "difference". Not by default. They should have a switch to explicitly turn it on. A good diff is complex enough that it should not stumble on form when it is supposed to examine content.So the whole point is absurd. A reasonable default behavior for a file mongering program would be to output line terminators according to the operating system default. The case where one *wants* to preserve them, should be considered the exception.Only if you considering wanting to maintain merge-ability/diff-ability as the exception. Some, if not most, source control/diff/merge tools consider changes in line endings as changes.
Apr 14 2009
Reply to Georg,BCS wrote:I make no assertions about what should be, only what is.Only if you considering wanting to maintain merge-ability/diff-ability as the exception. Some, if not most, source control/diff/merge tools consider changes in line endings as changes.Doesn't this kind of prove my point? Changing a line ending /should not/ be a "difference". Not by default. They should have a switch to explicitly turn it on. A good diff is complex enough that it should not stumble on form when it is supposed to examine content.
Apr 14 2009
Georg Wrede:This seems a bit laborious. Replacing stuff at the end of the string forces him to check, for *each* line, the length of the terminator, and then use ...$-1 and at other times ...$-2, etc. in his code.You use a string function or string method that removes the eventually present ending newline, any kind of. There is one in std.string too. Its main problem (beside working with char[] only in D1) is that its name is too much similar to another string function. I have complained about this time ago. Regarding the newline at the end of lines, in Python: for line in file("somefilename.txt"): print line line contains the ending new line too. Bye, bearophile
Apr 14 2009
bearophile wrote:Georg Wrede:Your code ends up printing the output on every other line.This seems a bit laborious. Replacing stuff at the end of the string forces him to check, for *each* line, the length of the terminator, and then use ...$-1 and at other times ...$-2, etc. in his code.You use a string function or string method that removes the eventually present ending newline, any kind of. There is one in std.string too. Its main problem (beside working with char[] only in D1) is that its name is too much similar to another string function. I have complained about this time ago. Regarding the newline at the end of lines, in Python: for line in file("somefilename.txt"): print line line contains the ending new line too.
Apr 14 2009
Georg Wrede wrote:In 25 years of computing, I have yet to see a file where variation of line termintators in the file contained some /deliberate/ information.25 years and no networking code? Andrei
Apr 14 2009
On Tue, 14 Apr 2009 13:19:49 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Georg Wrede wrote:Been writing code for about 12 years, lots and lots of networking code. Still have never seen this. Don't see your point either. -SteveIn 25 years of computing, I have yet to see a file where variation of line termintators in the file contained some /deliberate/ information.25 years and no networking code?
Apr 14 2009
Steven Schveighoffer wrote:On Tue, 14 Apr 2009 13:19:49 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:With HTTP, for example, lines are terminated with \r\n. The lines themselves (in the header, at least) have constraints on the character range they allow, so one might want to error on solo \n but break on a \r\n, etc. Still, I don't know why anyone would use readln() for processing a network protocol, so perhaps the issue is moot.Georg Wrede wrote:Been writing code for about 12 years, lots and lots of networking code. Still have never seen this. Don't see your point either.In 25 years of computing, I have yet to see a file where variation of line termintators in the file contained some /deliberate/ information.25 years and no networking code?
Apr 14 2009
Andrei Alexandrescu wrote:Georg Wrede wrote:I can see having to use one or another line ending in the whole output file, but not a situation where some lines and not some other need this or that kind of line ending.In 25 years of computing, I have yet to see a file where variation of line termintators in the file contained some /deliberate/ information.25 years and no networking code?
Apr 14 2009
"Georg Wrede" <georg.wrede iki.fi> wrote in message news:gs2o15$233h$2 digitalmars.com...I can see having to use one or another line ending in the whole output file, but not a situation where some lines and not some other need this or that kind of line ending.Source code with unescaped nl's/cr's embedded in a string literal? Though I admit that may not be a particularly compelling case for at least a couple of different reasons. (I do agree with your original point though.)
Apr 14 2009
Nick Sabalausky wrote:"Georg Wrede" <georg.wrede iki.fi> wrote in message news:gs2o15$233h$2 digitalmars.com...I think there are a few concerns when designing an API for reading separated lines. 1. Reasonably complex separators should be allowed, e.g. regexes. For streams that have lookahead = 1, only regexes without backtracking (i.e., classic regular expressions) can be allowed. 2. Alternate separators should be allowed, and information should be passed as to which one, if any, matched: readln(stream, '\n', '\r', "Brought to you by Carl's Jr.\n"); You should be able to somehow extract which one of these matched, or whether the stream ended without having seen any. The match process is similar to regexes, but the information returned would be difficult to extract from a regex match. 3. Given (1) and (2), the process of eliminating the matched separator can become rather involved. So there should be an option to just eliminate the separator. 4. However, the separator should be made available to the called. That makes for programs that preserve the separator, whatever it was. I plan to implement a little API around these considerations, but haven't gotten around to it. Particularly the regex thing is rather thorny because std.regex does not distinguish classic regular expressions from those needing backtracking, and does not have an implementation that works with limited-lookahead streams. I suspect that that would be a major effort. Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiI can see having to use one or another line ending in the whole output file, but not a situation where some lines and not some other need this or that kind of line ending.Source code with unescaped nl's/cr's embedded in a string literal? Though I admit that may not be a particularly compelling case for at least a couple of different reasons. (I do agree with your original point though.)
Apr 14 2009
Andrei Alexandrescu wrote:... Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiWhy not: char[] line, sep; line = File.byLine(); // discard sep line = File.byLine(sep); // pass sep out The separator is likely to be more useful once extracted. -- Daniel
Apr 14 2009
Daniel Keep wrote:Andrei Alexandrescu wrote:And how about when sep is elaborate (e.g. regex)? Andrei... Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiWhy not: char[] line, sep; line = File.byLine(); // discard sep line = File.byLine(sep); // pass sep out The separator is likely to be more useful once extracted.
Apr 14 2009
Andrei Alexandrescu wrote:Daniel Keep wrote:Whatever was matched. If we have a file containing: "A.B,C" And we split lines using /[.,]/, then this:Andrei Alexandrescu wrote:And how about when sep is elaborate (e.g. regex)? Andrei... Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiWhy not: char[] line, sep; line = File.byLine(); // discard sep line = File.byLine(sep); // pass sep out The separator is likely to be more useful once extracted.char[] line, sep; line = File.byLine(sep); while( line != "" ) { writefln(`line = "%s", sep = "%s"`, line, sep); line = File.byLine(sep); }Would output this:line = "A", sep = "." line = "B", sep = "," line = "C", sep = ""-- Daniel
Apr 14 2009
Daniel Keep wrote:Andrei Alexandrescu wrote:Where did you specify the separator in the call to byLine? AndreiDaniel Keep wrote:Whatever was matched. If we have a file containing: "A.B,C" And we split lines using /[.,]/, then this:Andrei Alexandrescu wrote:And how about when sep is elaborate (e.g. regex)? Andrei... Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiWhy not: char[] line, sep; line = File.byLine(); // discard sep line = File.byLine(sep); // pass sep out The separator is likely to be more useful once extracted.char[] line, sep; line = File.byLine(sep); while( line != "" ) { writefln(`line = "%s", sep = "%s"`, line, sep); line = File.byLine(sep); }Would output this:line = "A", sep = "." line = "B", sep = "," line = "C", sep = ""-- Daniel
Apr 14 2009
On Wed, 15 Apr 2009 00:21:48 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Daniel Keep wrote:I think he's not read the docs. Consider this usage instead: auto reader = file.byLine!("/[.,]/")(); // normal usage, doesn't return separators foreach(line; reader) { ... } // alternate usage, returns separators as well while(!reader.empty) { char[] sep; char[] line = reader.front(sep); // can't remember if this is what you decided on. ... reader.popFront(); // ditto } //Note that if foreach on ranges was extended to allow multiple parameters per pass, you could do: foreach(sep, line; reader) { ... } -SteveAndrei Alexandrescu wrote:Where did you specify the separator in the call to byLine?Daniel Keep wrote:Whatever was matched. If we have a file containing: "A.B,C" And we split lines using /[.,]/, then this:Andrei Alexandrescu wrote:And how about when sep is elaborate (e.g. regex)? Andrei... Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by calling File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLine I consider such an API adequate but insufficient; we need to add to it. AndreiWhy not: char[] line, sep; line = File.byLine(); // discard sep line = File.byLine(sep); // pass sep out The separator is likely to be more useful once extracted.char[] line, sep; line = File.byLine(sep); while( line != "" ) { writefln(`line = "%s", sep = "%s"`, line, sep); line = File.byLine(sep); }Would output this:line = "A", sep = "." line = "B", sep = "," line = "C", sep = ""-- Daniel
Apr 14 2009
Steven Schveighoffer wrote:auto reader = file.byLine!("/[.,]/")();Why specify anything at compile time when a user could reasonably generate the value at runtime? auto reader = file.byLine(readConfig().separator);
Apr 15 2009
Christopher Wright wrote:Steven Schveighoffer wrote:Yes, and for maximum abstraction, the config file should be stored as XML in a TEXT field of a database on another server.auto reader = file.byLine!("/[.,]/")();Why specify anything at compile time when a user could reasonably generate the value at runtime? auto reader = file.byLine(readConfig().separator);
Apr 15 2009
Robert Fraser wrote:Christopher Wright wrote:I just really hate to see templates when a regular function would suffice and be so close to the same efficiency as makes no difference for most reasonable situations. If there's a significant performance increase, I want to see both options.Steven Schveighoffer wrote:Yes, and for maximum abstraction, the config file should be stored as XML in a TEXT field of a database on another server.auto reader = file.byLine!("/[.,]/")();Why specify anything at compile time when a user could reasonably generate the value at runtime? auto reader = file.byLine(readConfig().separator);
Apr 15 2009
On Wed, 15 Apr 2009 22:54:50 -0400, Christopher Wright <dhasenan gmail.com> wrote:Robert Fraser wrote:It's just a demonstration of what the OP was talking about but wasn't explaining properly. I have no intention of writing or supporting this code. I think its fine if Andrei decides to write this code and uses a function parameter instead of a template parameter, that I used a template parameter instead of a function parameter is not a hidden suggestion. -SteveChristopher Wright wrote:I just really hate to see templates when a regular function would suffice and be so close to the same efficiency as makes no difference for most reasonable situations. If there's a significant performance increase, I want to see both options.Steven Schveighoffer wrote:Yes, and for maximum abstraction, the config file should be stored as XML in a TEXT field of a database on another server.auto reader = file.byLine!("/[.,]/")();Why specify anything at compile time when a user could reasonably generate the value at runtime? auto reader = file.byLine(readConfig().separator);
Apr 15 2009
Andrei Alexandrescu wrote:I plan to implement a little API around these considerations, but haven't gotten around to it. Particularly the regex thing is rather thorny because std.regex does not distinguish classic regular expressions from those needing backtracking, and does not have an implementation that works with limited-lookahead streams. I suspect that that would be a major effort. Right now readln preserves the separator. The newer File.byLine eliminates it by default and offers to keep it by callingExcellent!!File.byLine(KeepTerminator.yes). The allowed terminators are one character or a string. See http://erdani.dreamhosters.com/d/web/phobos/std_stdio.html#byLineI consider such an API adequate but insufficient; we need to add to it.
Apr 15 2009
Daniel Keep wrote:Georg Wrede wrote:<snip>Readln returns a string which contains the line terminator.Because if it stripped it, there's no way to know what it was. If you want to do per-line processing but don't want to clobber the line endings, readln has to return the line terminator.But readln only stops on '\n' (or whatever character you tell it to otherwise), so will miss Mac "\r" endings altogether. As such, it's useless for this purpose. The big question, however, is why std.stream.InputStream doesn't have readln. It has readLine, which has different semantics - it understands all three line break styles and strips them. This is absurd since you're more likely to care about what line ending is used when reading in a text file than when reading from stdin. Take these four cases: (a) you want to process only files with a specific line ending style (b) you want to know what line endings are used (c) you don't care about what line endings are used, but still want to know whether or not the file ends with one (d) you just want to read the file line by line, without caring about the line endings or the presence or absence of one at the end At the moment, readln is good only for (a). readLine is good only for (d). If you want (b) or (c), you'll have to come up with an alternative means. Stewart.
Apr 14 2009
Georg Wrede wrote:Readln returns a string which contains the line terminator. Is there a grand reason for this? Currently there are a few drawbacks with this. The naive user doesn't expect it, and the seasoned user has to keep stripping it. And then he has to search the docs (or get hold of other OSs) to determine what terminator to expect on other systems. And it can't really be a speed optimization either, because to do anything useful with a string, you have to strip the terminator anyway at some point.By default, tango does not exhibit this behavior. If you wish, you can include newlines: auto str = Cin.copyln; // no newline in str auto str2 = Cin.copyln(true); // has system-dependent newline
Apr 14 2009
Christopher Wright wrote:Georg Wrede wrote:Now this is more like it. The default should really be (in Phobos too) to not return the newline. (Hint to Walter: Tango is for users, by users, and if they have no newline as the default, it should be considered a serious hint as to what the programmer prefers.) If one is really interested in doing some file manipulation which might *preserve* varying line terminators in files that might have been edited in both linux and dos, then he should use "the non-default" line reading, like the Cin.copyln(true) above. Not that I'd see the point. I'm certain that the overwhelming majority of cases where one reads lines (_especially_ from the console, but from text files, too), one just wants the contents of the string.Readln returns a string which contains the line terminator. Is there a grand reason for this? Currently there are a few drawbacks with this. The naive user doesn't expect it, and the seasoned user has to keep stripping it. And then he has to search the docs (or get hold of other OSs) to determine what terminator to expect on other systems. And it can't really be a speed optimization either, because to do anything useful with a string, you have to strip the terminator anyway at some point.By default, tango does not exhibit this behavior. If you wish, you can include newlines: auto str = Cin.copyln; // no newline in str auto str2 = Cin.copyln(true); // has system-dependent newline
Apr 14 2009
Georg Wrede wrote:because to do anything useful with a string, you have to strip the terminatorThis is false in case of simple copying. And I doubt, that for more complex operations splitting `readln' into `readlnBody' and `readlnEOL' and calling them intermittent would be of any benefit. -manfred
Apr 14 2009
On Tue, 14 Apr 2009 18:01:52 +0400, Manfred Nowak <svv1999 hotmail.com> wrote:Georg Wrede wrote:Tango does the best by having an optional parameter that denotes whether a line ending needs to be retained.because to do anything useful with a string, you have to strip the terminatorThis is false in case of simple copying. And I doubt, that for more complex operations splitting `readln' into `readlnBody' and `readlnEOL' and calling them intermittent would be of any benefit. -manfred
Apr 14 2009
Manfred Nowak wrote:Georg Wrede wrote:For copying there is the operating system command, copy. Additionally, simple copy is hardly the most used thing when readln is invoked. So, either, there should be two functions, one of which preserves the terminator, or (like in Tango) there should be a parameter to turn them on.because to do anything useful with a string, you have to strip the terminatorThis is false in case of simple copying. And I doubt, that for more complex operations splitting `readln' into `readlnBody' and `readlnEOL' and calling them intermittent would be of any benefit.
Apr 14 2009
Georg Wrede wrote:So, either, there should be [...]Agreed. -manfred
Apr 14 2009
Stewart Gordon Wrote:Take these four cases: (a) you want to process only files with a specific line ending style (b) you want to know what line endings are used (c) you don't care about what line endings are used, but still want to know whether or not the file ends with one (d) you just want to read the file line by line, without caring about the line endings or the presence or absence of one at the end At the moment, readln is good only for (a). readLine is good only for (d). If you want (b) or (c), you'll have to come up with an alternative means.I think, only (d) is important, all others are *strange* things. I usually use ReadLine in conjunction with WriteLine.
Apr 15 2009
Kagamin wrote:Stewart Gordon Wrote:So you expect text editors to discard both kinds of information? I expect any text editor (don't get me started on Notepad) to do (c), and any decent text editor to be capable of (b). Stewart.Take these four cases: (a) you want to process only files with a specific line ending style (b) you want to know what line endings are used (c) you don't care about what line endings are used, but still want to know whether or not the file ends with one (d) you just want to read the file line by line, without caring about the line endings or the presence or absence of one at the end At the moment, readln is good only for (a). readLine is good only for (d). If you want (b) or (c), you'll have to come up with an alternative means.I think, only (d) is important, all others are *strange* things. I usually use ReadLine in conjunction with WriteLine.
Apr 15 2009
Stewart Gordon Wrote:So you expect text editors to discard both kinds of information?No. Text editor is a *specialized* text processing tool and it usually uses specialized text processing algorithms. Otherwise it is *quite* decent.
Apr 16 2009