digitalmars.D - Shouldn't Phobes have a non-case sensitive find() and rfind()
- David L. Davis (91/91) Jun 03 2004 Shouldn't there be a non-case sensitive version of find (ifind) and rfin...
- Sean Kelly (4/8) Jun 03 2004 How about allowing the user to pass a comparison delegate? Case means d...
- David L. Davis (3/12) Jun 03 2004 Sean: I hate to ask you this, but could you explain this a bit more. I'm...
- Ivan Senji (17/34) Jun 03 2004 rfind
- Oskar Linde (19/33) Jun 03 2004 Since a D char[] is considered UTF-8 encoded, a delegate(char,char)
- Sean Kelly (13/18) Jun 03 2004 Some languages don't have upper and lowercase letters. And many others ...
- David L. Davis (21/42) Jun 03 2004 Sean: Correct me if I'm wrong, but looking at the Phobes std.string html...
- Arcane Jill (17/20) Jun 03 2004 The Unicode Standard defines the uppercasing, lowercasing and titlecasin...
- KTC (7/10) Jun 03 2004 Tabs doesn't show up on some newsreader like OE. I'm afraid you need to ...
Shouldn't there be a non-case sensitive version of find (ifind) and rfind (irfind) in the Phobes std.string library? Also an additional position parameter, would also make the function(s) even more useful, unless of course we really have this, and I've missed it somehow looking thru the "D" docs. Below I wrote some sample code I've tested, which sort of show its usefulness. I've indented the code this time with tabs, so hopeful it all appears readable in my posted message. <*crosses-fingers*> ;) import std.string; int main() { char[] sStr = "ApO 123355 PO Box 23, Waterpool Street Portland, Texas"; printf("Case Insensitive ifind and irfind tests\n"); printf("Test String sStr=%.*s\n\n", sStr); printf("Default = 0, ifind \'PO\' in sStr, result=%d\n", ifind( sStr, "PO" ) ); printf("StartPos= 2, ifind \'PO\' in sStr, result=%d\n", ifind( sStr, "PO", 2 ) ); printf("StartPos=15, ifind \'PO\' in sStr, result=%d\n", ifind( sStr, "PO", 15 ) ); printf("StartPos=33, ifind \'PO\' in sStr, result=%d\n\n", ifind( sStr, "PO", 33 ) ); printf("Default = sStr.length - 1, irfind \'PO\' in sStr2, result=%d\n", irfind( sStr, "PO" ) ); printf("StartPos= 2, irfind \'PO\' in sStr, result=%d\n", irfind( sStr, "PO", 2 ) ); printf("StartPos=15, irfind \'PO\' in sStr, result=%d\n", irfind( sStr, "PO", 15 ) ); printf("StartPos=33, irfind \'PO\' in sStr, result=%d\n", irfind( sStr, "PO", 33 ) ); return 0; } // end-function int main( void ) // Case insensitive version of std.string.find int ifind ( in char[] sStr, in char[] sSubStr ) { return ifind( sStr, sSubStr, 0 ); } // end-function int ifind( char[], char[] ) // Case insensitive version of std.string.find // with an optional "String Start Position" parameter. int ifind ( in char[] sStr, in char[] sSubStr, in int iStartPos ) { char[] sTmpStr; int iRtnVal; // Out of Boundary return not found if ( iStartPos > sStr.length - 1 ) return -1; if ( iStartPos < 0 ) return - 1; sTmpStr = tolower( sStr[ iStartPos .. sStr.length ] ); if ( iStartPos == 0 ) return find( sTmpStr, tolower( sSubStr ) ); else { iRtnVal = find( sTmpStr, tolower( sSubStr ) ); if ( iRtnVal != -1 ) return iStartPos + iRtnVal; else return -1; // end-if } // end-if } // end-function int ifind( char[],char[], int ) // Case insensitive version of std.string.rfind int irfind ( in char[] sStr, in char[] sSubStr ) { return irfind( sStr, sSubStr, sStr.length - 1 ); } // end-function int irfind( char[], char[] ) // Case insensitive version of std.string.rfind // with an optional "String End Position" parameter int irfind ( in char[] sStr, in char[] sSubStr, in int iEndPos ) { char[] sTmpStr; // If "Out of Boundary" return not found if ( iEndPos > sStr.length - 1 ) return -1; if ( iEndPos < 0 ) return - 1; sTmpStr = tolower( sStr[ 0 .. iEndPos + 1 ] ); return rfind( sTmpStr, tolower( sSubStr ) ); } // end-function int irfind( char[],char[], int )
Jun 03 2004
In article <c9ntu5$kch$1 digitaldaemon.com>, David L. Davis says...Shouldn't there be a non-case sensitive version of find (ifind) and rfind (irfind) in the Phobes std.string library? Also an additional position parameter, would also make the function(s) even more useful, unless of course we really have this, and I've missed it somehow looking thru the "D" docs.How about allowing the user to pass a comparison delegate? Case means different in different languages. Sean
Jun 03 2004
In article <c9nuts$ls2$1 digitaldaemon.com>, Sean Kelly says...In article <c9ntu5$kch$1 digitaldaemon.com>, David L. Davis says...Sean: I hate to ask you this, but could you explain this a bit more. I'm not sure I follow you. Thxs in advance. :)Shouldn't there be a non-case sensitive version of find (ifind) and rfind (irfind) in the Phobes std.string library? Also an additional position parameter, would also make the function(s) even more useful, unless of course we really have this, and I've missed it somehow looking thru the "D" docs.How about allowing the user to pass a comparison delegate? Case means different in different languages. Sean
Jun 03 2004
"David L. Davis" <SpottedTiger yahoo.com> wrote in message news:c9o1k8$pp2$1 digitaldaemon.com...In article <c9nuts$ls2$1 digitaldaemon.com>, Sean Kelly says...rfindIn article <c9ntu5$kch$1 digitaldaemon.com>, David L. Davis says...Shouldn't there be a non-case sensitive version of find (ifind) andcourse we(irfind) in the Phobes std.string library? Also an additional position parameter, would also make the function(s) even more useful, unless ofdifferentreally have this, and I've missed it somehow looking thru the "D" docs.How about allowing the user to pass a comparison delegate? Case meansnotin different languages. SeanSean: I hate to ask you this, but could you explain this a bit more. I'msure I follow you. Thxs in advance. :)The problem is that universal case sensitive function can't be written because many languages have special letters (like cczsd, i don't know if you will see these correct). So the idea would be something like: findCaseSensitive(char[] str, char[] search,bool delegate(char,char) comparefunc); i'm not shure about the prototype but the idea is to give the find function the comparefunc which decides if two characters are considered same or not.
Jun 03 2004
Ivan Senji wrote:The problem is that universal case sensitive function can't be written because many languages have special letters (like cczsd, i don't know if you will see these correct). So the idea would be something like: findCaseSensitive(char[] str, char[] search,bool delegate(char,char) comparefunc); i'm not shure about the prototype but the idea is to give the find function the comparefunc which decides if two characters are considered same or not.Since a D char[] is considered UTF-8 encoded, a delegate(char,char) won't be enough. Sure, std.string.icmp(char[], char[]) only considers English ascii but that behavior seems broken as char[]s really are supposed to be UTF-8. Also std.string.cmp(char[], char[]) is broken in that respect too (uses memcmp()). However, some general kind of locale-handling is needed even for UTF-8. Some letters have different ordering in different languages. One solution is to use something like the C librarys setlocale() and change the comparison functions correspondingly. That way, findCaseSensitive could be defined as findCaseSensitive(char[] str, char[] search) together with a global (hidden) locale-state. I'm not sure how thread-safe the C library locale is in the case of using multiple locales. A different solution would be the use of a String class template to keep track of the locale a string is represented in. The best solution is probably to use delegate comparison functions taking dchars or whatever and also make cmp and icmp versions taking such. /Oskar
Jun 03 2004
In article <c9o1k8$pp2$1 digitaldaemon.com>, David L. Davis says...In article <c9nuts$ls2$1 digitaldaemon.com>, Sean Kelly says...Some languages don't have upper and lowercase letters. And many others don't convert properly using the default routines, even if the ASCII character set contains all the appropriate symbols. So tolower(x)==tolower(y) may yield the incorrect result if the string contains characters beyond the usual 52 ASCII English values. I'd like to assume that a D string is a sequence of characters, unicode or otherwise, and I think it would be a mistake to provide methods that don't work properly outside of ASCII English. While I'm not much of an expert on localization, I do think that the library should be designed with localization in mind. For a more thorough explanation, Scott Meyers discusses the problem in one of his "Effective C++" books, the second one IIRC. SeanHow about allowing the user to pass a comparison delegate? Case means different in different languages.Sean: I hate to ask you this, but could you explain this a bit more. I'm not sure I follow you. Thxs in advance. :)
Jun 03 2004
In article <c9o7nn$1360$1 digitaldaemon.com>, Sean Kelly says...In article <c9o1k8$pp2$1 digitaldaemon.com>, David L. Davis says...Sean: Correct me if I'm wrong, but looking at the Phobes std.string html online information, it looks like the std.string mainly handles "ASCII English" with the way the consts are defined. Plus I've been playing around with these functions for a few weeks, converting my VB6 ProperCase() to a "D" propercase() function...and toupper() and tolower() use the below consts to do their work. If a string is say, all lower-case and it's passed in into the tolower() function it will do nothing since none of the characters in the string are upper-case based off the uppercase const. Plus, it also works the other way around, if an all upper-case string to passed into toupper() it will use the lowercase const. const char[] lowercase; "abcdefghijklmnopqrstuvwxyz" const char[] uppercase; "ABCDEFGHIJKLMNOPQRSTUVWXYZ" const char[] letters; "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" const char[] whitespace; " \t\v\r\n\f" Off the subject just a little bit, why are the consts defined in lowercase characters, and not in the standard "C\C++" way that all consts should be defined in uppercase characters?In article <c9nuts$ls2$1 digitaldaemon.com>, Sean Kelly says...Some languages don't have upper and lowercase letters. And many others don't convert properly using the default routines, even if the ASCII character set contains all the appropriate symbols. So tolower(x)==tolower(y) may yield the incorrect result if the string contains characters beyond the usual 52 ASCII English values. I'd like to assume that a D string is a sequence of characters, unicode or otherwise, and I think it would be a mistake to provide methods that don't work properly outside of ASCII English. While I'm not much of an expert on localization, I do think that the library should be designed with localization in mind. For a more thorough explanation, Scott Meyers discusses the problem in one of his "Effective C++" books, the second one IIRC. SeanHow about allowing the user to pass a comparison delegate? Case means different in different languages.Sean: I hate to ask you this, but could you explain this a bit more. I'm not sure I follow you. Thxs in advance. :)
Jun 03 2004
In article <c9nuts$ls2$1 digitaldaemon.com>, Sean Kelly says...How about allowing the user to pass a comparison delegate? Case means different in different languages. SeanThe Unicode Standard defines the uppercasing, lowercasing and titlecasing of all Unicode characters. There are regional variations only for Lithuanian, Turkish and Azeri, otherwise it's a world standard. Casing rules can found in the document http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt, with explanatory text on the Unicode web site. One interesting feature of Unicode casing is that sometimes one character becomes two characters. For instance, from SpecialCasing.txt: This shows that the German eszett character (Unicode codepoint 0x00DF) uppercases to the two-character sequence "SS". Perhaps even more useful would be the Unicode Collation algorithm. Collation - as opposed to casing - is definitely regional, so here you do need to specify the manner in which you need to do your collating. But anyway, the algorithms are all there. All we need is for someone to implement them. Arcane Jill
Jun 03 2004
Below I wrote some sample code I've tested, which sort of show itsusefulness.I've indented the code this time with tabs, so hopeful it all appearsreadablein my posted message. <*crosses-fingers*> ;)Tabs doesn't show up on some newsreader like OE. I'm afraid you need to use space... (Can't actually comment on anything to do with D as I haven't actually look at a single line of D code yet coz of university work and my lack of knowledge of programming :S)
Jun 03 2004