digitalmars.D - one step towards unification of std.algorithm and std.string
- Andrei Alexandrescu (22/22) Dec 30 2009 I've wanted for a long time to move more stuff from std.string into
- bearophile (6/10) Dec 30 2009 It's nice. In my Python code the caseless operations aren't that common....
- Andrei Alexandrescu (11/21) Dec 30 2009 So and so (e.g. quite some in NLP and a lot in HTML parsing because it
- Steven E. Harris (11/12) Dec 30 2009 Something seems different from C++'s const here. One can always call a
- Michel Fortin (9/22) Dec 30 2009 The thing is that const is transitive in D. That and you can't make a
- Simen kjaeraas (12/22) Dec 31 2009 Anything is implicitly castable to const in D, so it's the latter. This
- grauzone (5/16) Dec 31 2009 Interesting statement. Does this apply to immutable as well, or only
- Andrei Alexandrescu (3/18) Dec 31 2009 The statement doesn't apply to immutable because C++ doesn't have it.
- Denis Koroskin (4/21) Dec 31 2009 You won't be able to call opEquals on immutable objects if opEquals woul...
- Simen kjaeraas (9/32) Dec 30 2009 Sound nice. I've also wanted to see the two combined.
- Andrei Alexandrescu (6/45) Dec 30 2009 I think there will be a NoCase character type that can be compared to
- Michel Fortin (11/17) Dec 30 2009 Looks like a great idea... but it'd be clearer if named
I've wanted for a long time to move more stuff from std.string into std.algorithm. One issue that has kept me from doing that is the case issue, i.e. some string functions have mind/ignore case flavors that don't make sense for other data types. For example consider cmp() and icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or the recently changed indexOf: http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf It occurred to me that I can transfer the case sensitivity away from the algorithm into the data. To do so, we only need to define one more data type NoCase that behaves much like a dchar but defines opEquals to compare ignoring case. Then, we need to define a NoCase range that behaves like a bidirectional range of dchar but again uses case-insensitive comparisons. Add some garnishing and you get to write: string a = "Hello, World!" auto x = indexOf(a, nocase("world")); assert(x == 7); I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms. Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659 Any thoughts and ideas would be appreciated. Andrei
Dec 30 2009
Andrei Alexandrescu:I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms.It's nice. In my Python code the caseless operations aren't that common. How much common are those in your code? Is the nocase leading to a lower performance for some algorithms (like KMP search)?Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-) Bye, bearophile
Dec 30 2009
bearophile wrote:Andrei Alexandrescu:So and so (e.g. quite some in NLP and a lot in HTML parsing because it has case-insensitive tags), but regardless, we can't pretend the need doesn't exist.I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms.It's nice. In my Python code the caseless operations aren't that common. How much common are those in your code?Is the nocase leading to a lower performance for some algorithms (like KMP search)?There are a couple more comparisons per item comparison, so performance will be degraded by a constant factor.One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const. AndreiWell, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-)
Dec 30 2009
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:I think opEquals for classes is at fault for requiring const.Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad. -- Steven E. Harris
Dec 30 2009
On 2009-12-30 17:44:16 -0500, "Steven E. Harris" <seh panix.com> said:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:The thing is that const is transitive in D. That and you can't make a variable mutable in a const object; you can in C++ with the mutable keyword. So you want to use const only when you know you won't change anything through that reference. -- Michel Fortin michel.fortin michelf.com http://michelf.com/I think opEquals for classes is at fault for requiring const.Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad.
Dec 30 2009
On Wed, 30 Dec 2009 23:44:16 +0100, Steven E. Harris <seh panix.com> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:Anything is implicitly castable to const in D, so it's the latter. This is correct D: struct foo { bool opEquals( const ref foo rhs ) const { return true; } } As long as that function is defined though, one may add as many other opEquals signatures as one fancies. -- SimenI think opEquals for classes is at fault for requiring const.Something seems different from C++'s const here. One can always call a const member function on a class instance in C++, regardless of whether the instance is referred to through a const or non-const reference. Is this bug saying that you can't call a const member function through a non-const reference to an instance? Or maybe it's complaining that your opEquals() declaration isn't const? If it's declared non-const, can one then not call it through a const reference to an instance? That would be bad.
Dec 31 2009
Andrei Alexandrescu wrote:bearophile wrote:Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-)One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.Andrei
Dec 31 2009
grauzone wrote:Andrei Alexandrescu wrote:The statement doesn't apply to immutable because C++ doesn't have it. Andreibearophile wrote:Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-)One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.
Dec 31 2009
On Thu, 31 Dec 2009 16:37:57 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:grauzone wrote:You won't be able to call opEquals on immutable objects if opEquals would require mutable pointer.Andrei Alexandrescu wrote:The statement doesn't apply to immutable because C++ doesn't have it. Andreibearophile wrote:Interesting statement. Does this apply to immutable as well, or only const? Because I thought const/immutable was supposed to make program logic clearer etc... That implied it would be heavily used in "normal" code. That's all a bit vague to me, care to clarify this a bit?I don't know C++ much, and I have to confess that I have to fully understand the const business still. I hope your book will teach me this topic very well :-)One thing about const that is slowly downing on this community is that it will _not_ be used as often as in C++. It will be rare, and the compiler and standard library should not require it without very good reason. I think opEquals for classes is at fault for requiring const.
Dec 31 2009
On Wed, 30 Dec 2009 19:45:02 +0100, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I've wanted for a long time to move more stuff from std.string into std.algorithm. One issue that has kept me from doing that is the case issue, i.e. some string functions have mind/ignore case flavors that don't make sense for other data types. For example consider cmp() and icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or the recently changed indexOf: http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf It occurred to me that I can transfer the case sensitivity away from the algorithm into the data. To do so, we only need to define one more data type NoCase that behaves much like a dchar but defines opEquals to compare ignoring case. Then, we need to define a NoCase range that behaves like a bidirectional range of dchar but again uses case-insensitive comparisons. Add some garnishing and you get to write: string a = "Hello, World!" auto x = indexOf(a, nocase("world")); assert(x == 7); I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms. Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659 Any thoughts and ideas would be appreciated. AndreiSound nice. I've also wanted to see the two combined. I take it the NoCase range is a lazy wrapper of a (d|w)?char range? Some testing shows that 3659 can be sidestepped by making opEquals a template function, or by creating more than one opEquals, where one matches const bool( const ref typeof( this ) ). -- Simen
Dec 30 2009
Simen kjaeraas wrote:On Wed, 30 Dec 2009 19:45:02 +0100, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think there will be a NoCase character type that can be compared to dchar, and a NoCase range type that wraps any other range that traffics in dchar.I've wanted for a long time to move more stuff from std.string into std.algorithm. One issue that has kept me from doing that is the case issue, i.e. some string functions have mind/ignore case flavors that don't make sense for other data types. For example consider cmp() and icmp(): http://www.digitalmars.com/d/2.0/phobos/std_string.html#cmp or the recently changed indexOf: http://www.digitalmars.com/d/2.0/phobos/std_string.html#indexOf It occurred to me that I can transfer the case sensitivity away from the algorithm into the data. To do so, we only need to define one more data type NoCase that behaves much like a dchar but defines opEquals to compare ignoring case. Then, we need to define a NoCase range that behaves like a bidirectional range of dchar but again uses case-insensitive comparisons. Add some garnishing and you get to write: string a = "Hello, World!" auto x = indexOf(a, nocase("world")); assert(x == 7); I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms. Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659 Any thoughts and ideas would be appreciated. AndreiSound nice. I've also wanted to see the two combined. I take it the NoCase range is a lazy wrapper of a (d|w)?char range?Some testing shows that 3659 can be sidestepped by making opEquals a template function, or by creating more than one opEquals, where one matches const bool( const ref typeof( this ) ).Thanks a lot Simen, that's a lifesaver. Andrei
Dec 30 2009
On 2009-12-30 13:45:02 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:I'm quite excited about this because it modularizes the entire case business, opens strings to many algorithms, and allows generalization of string algorithms.Looks like a great idea... but it'd be clearer if named caseinsensitive(str), or perhaps foldcase(str) for shorter. But whatever the name, this approach is great in that it also makes it easy to implement different collations.Well, that is until I hit http://d.puremagic.com/issues/show_bug.cgi?id=3659 Any thoughts and ideas would be appreciated.That's a bug. Get it fixed. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 30 2009