www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - one step towards unification of std.algorithm and std.string

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I've wanted for a long time to move more stuff from std.string into 
std.algorithm. One issue that has kept me from doing that is the case 
issue, i.e. some string functions have mind/ignore case flavors that 
don't make sense for other data types. For example consider cmp() and 
icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or 
the recently changed indexOf: 
http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf

It occurred to me that I can transfer the case sensitivity away from the 
algorithm into the data. To do so, we only need to define one more data 
type NoCase that behaves much like a dchar but defines opEquals to 
compare ignoring case. Then, we need to define a NoCase range that 
behaves like a bidirectional range of dchar but again uses 
case-insensitive comparisons. Add some garnishing and you get to write:

string a = "Hello, World!"
auto x = indexOf(a, nocase("world"));
assert(x == 7);

I'm quite excited about this because it modularizes the entire case 
business, opens strings to many algorithms, and allows generalization of 
string algorithms.

Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659

Any thoughts and ideas would be appreciated.


Andrei
Dec 30 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I'm quite excited about this because it modularizes the entire case 
 business, opens strings to many algorithms, and allows generalization of 
 string algorithms.
It's nice. In my Python code the caseless operations aren't that common. How much common are those in your code? Is the nocase leading to a lower performance for some algorithms (like KMP search)?
 Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659
I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-) Bye, bearophile
Dec 30 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 I'm quite excited about this because it modularizes the entire case 
 business, opens strings to many algorithms, and allows generalization of 
 string algorithms.
It's nice. In my Python code the caseless operations aren't that common. How much common are those in your code?
So and so (e.g. quite some in NLP and a lot in HTML parsing because it has case-insensitive tags), but regardless, we can't pretend the need doesn't exist.
 Is the nocase leading to a lower performance for some algorithms (like KMP
search)?
There are a couple more comparisons per item comparison, so performance will be degraded by a constant factor.
 Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659
I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-)
One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const. Andrei
Dec 30 2009
next sibling parent reply "Steven E. Harris" <seh panix.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 I think opEquals for classes is at fault for requiring const.
Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad. -- Steven E. Harris
Dec 30 2009
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-12-30 17:44:16 -0500, "Steven E. Harris" <seh panix.com> said:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
 
 I think opEquals for classes is at fault for requiring const.
Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad.
The thing is that const is transitive in D. That and you can't make a variable mutable in a const object; you can in C++ with the mutable keyword. So you want to use const only when you know you won't change anything through that reference. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 30 2009
prev sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
On Wed, 30 Dec 2009 23:44:16 +0100, Steven E. Harris <seh panix.com> wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 I think opEquals for classes is at fault for requiring const.
Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad.
Anything is implicitly castable to const in D, so it's the latter. This is correct D: struct foo { bool opEquals( const ref foo rhs ) const { return true; } } As long as that function is defined though, one may add as many other opEquals signatures as one fancies. -- Simen
Dec 31 2009
prev sibling parent reply grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 bearophile wrote:
 I don't know C++ much, and I have to confess that I have to fully 
 understand the const business still. I hope your book will teach me 
 this topic very well :-)
One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.
Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?
 
 Andrei
Dec 31 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
grauzone wrote:
 Andrei Alexandrescu wrote:
 bearophile wrote:
 I don't know C++ much, and I have to confess that I have to fully 
 understand the const business still. I hope your book will teach me 
 this topic very well :-)
One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.
Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?
The statement doesn't apply to immutable because C++ doesn't have it. Andrei
Dec 31 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 31 Dec 2009 16:37:57 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 grauzone wrote:
 Andrei Alexandrescu wrote:
 bearophile wrote:
 I don't know C++ much, and I have to confess that I have to fully  
 understand the const business still. I hope your book will teach me  
 this topic very well :-)
One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.
Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?
The statement doesn't apply to immutable because C++ doesn't have it. Andrei
You won't be able to call opEquals on immutable objects if opEquals would require mutable pointer.
Dec 31 2009
prev sibling next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
On Wed, 30 Dec 2009 19:45:02 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've wanted for a long time to move more stuff from std.string into  
 std.algorithm. One issue that has kept me from doing that is the case  
 issue, i.e. some string functions have mind/ignore case flavors that  
 don't make sense for other data types. For example consider cmp() and  
 icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or  
 the recently changed indexOf:  
 http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf

 It occurred to me that I can transfer the case sensitivity away from the  
 algorithm into the data. To do so, we only need to define one more data  
 type NoCase that behaves much like a dchar but defines opEquals to  
 compare ignoring case. Then, we need to define a NoCase range that  
 behaves like a bidirectional range of dchar but again uses  
 case-insensitive comparisons. Add some garnishing and you get to write:

 string a = "Hello, World!"
 auto x = indexOf(a, nocase("world"));
 assert(x == 7);

 I'm quite excited about this because it modularizes the entire case  
 business, opens strings to many algorithms, and allows generalization of  
 string algorithms.

 Well, that is until I hit  
 http://d.puremagic.com/issues/show_bug.cgi?id=3659

 Any thoughts and ideas would be appreciated.


 Andrei
Sound nice. I've also wanted to see the two combined. I take it the NoCase range is a lazy wrapper of a (d|w)?char range? Some testing shows that 3659 can be sidestepped by making opEquals a template function, or by creating more than one opEquals, where one matches const bool( const ref typeof( this ) ). -- Simen
Dec 30 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Simen kjaeraas wrote:
 On Wed, 30 Dec 2009 19:45:02 +0100, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 I've wanted for a long time to move more stuff from std.string into 
 std.algorithm. One issue that has kept me from doing that is the case 
 issue, i.e. some string functions have mind/ignore case flavors that 
 don't make sense for other data types. For example consider cmp() and 
 icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or 
 the recently changed indexOf: 
 http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf

 It occurred to me that I can transfer the case sensitivity away from 
 the algorithm into the data. To do so, we only need to define one more 
 data type NoCase that behaves much like a dchar but defines opEquals 
 to compare ignoring case. Then, we need to define a NoCase range that 
 behaves like a bidirectional range of dchar but again uses 
 case-insensitive comparisons. Add some garnishing and you get to write:

 string a = "Hello, World!"
 auto x = indexOf(a, nocase("world"));
 assert(x == 7);

 I'm quite excited about this because it modularizes the entire case 
 business, opens strings to many algorithms, and allows generalization 
 of string algorithms.

 Well, that is until I hit 
 http://d.puremagic.com/issues/show_bug.cgi?id=3659

 Any thoughts and ideas would be appreciated.


 Andrei
Sound nice. I've also wanted to see the two combined. I take it the NoCase range is a lazy wrapper of a (d|w)?char range?
I think there will be a NoCase character type that can be compared to dchar, and a NoCase range type that wraps any other range that traffics in dchar.
 Some testing shows that 3659 can be sidestepped by making opEquals a
 template function, or by creating more than one opEquals, where one
 matches const bool( const ref typeof( this ) ).
Thanks a lot Simen, that's a lifesaver. Andrei
Dec 30 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-12-30 13:45:02 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I'm quite excited about this because it modularizes the entire case
 business, opens strings to many algorithms, and allows generalization of
 string algorithms.
Looks like a great idea... but it'd be clearer if named caseinsensitive(str), or perhaps foldcase(str) for shorter. But whatever the name, this approach is great in that it also makes it easy to implement different collations.
 Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659
 
 Any thoughts and ideas would be appreciated.
That's a bug. Get it fixed. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 30 2009