www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.string and std.algorithm: what to do?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'm not sure what needs to be done about the combo string + algorithm. 
There's quite some overlap, and also functions that have the same name 
in both modules (e.g. find()), which forces you to disambiguate.

Should std.algorithm automatically recognize strings and proceed 
accordingly, should it just consider them straight arrays and leave 
everything else to std.string (risky!), or refuse to handle strings?

Also, I dislike the signature int find() that returns -1 if not found. 
Time and again experience shows that find() returning a range is much 
better in real code because it works seamlessly when the 
substring/element was not found (no more need for an extra test!)

So I want to rename std.string.find() into something like findIndex or 
indexOf. But then we also have ifind, rfind, irfind. Ideas for renaming?


Andrei
May 14 2009
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Also, I dislike the signature int find() that returns -1 if not found.  
 Time and again experience shows that find() returning a range is much  
 better in real code because it works seamlessly when the  
 substring/element was not found (no more need for an extra test!)
the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
May 14 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Steven Schveighoffer wrote:
 On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Also, I dislike the signature int find() that returns -1 if not found.
 Time and again experience shows that find() returning a range is much
 better in real code because it works seamlessly when the
 substring/element was not found (no more need for an extra test!)
be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff; -- Daniel
May 14 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Also, I dislike the signature int find() that returns -1 if not found.
 Time and again experience shows that find() returning a range is much
 better in real code because it works seamlessly when the
 substring/element was not found (no more need for an extra test!)
be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;
I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess. Andrei
May 14 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 11:06:44 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Daniel Keep wrote:
  Steven Schveighoffer wrote:
 On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Also, I dislike the signature int find() that returns -1 if not found.
 Time and again experience shows that find() returning a range is much
 better in real code because it works seamlessly when the
 substring/element was not found (no more need for an extra test!)
be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;
I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess.
Yes, and to Daniels point, I don't see a drawback. Typing str.length vs -1 is really not that bad :) -Steve
May 14 2009
prev sibling next sibling parent grauzone <none example.net> writes:
Andrei Alexandrescu wrote:
 Daniel Keep wrote:
 Steven Schveighoffer wrote:
 On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Also, I dislike the signature int find() that returns -1 if not found.
 Time and again experience shows that find() returning a range is much
 better in real code because it works seamlessly when the
 substring/element was not found (no more need for an extra test!)
be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;
I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess.
But most time, you want to know both _if_ something was found, and where. Returning the length of the string makes checking if something was found harder. That's also quite a mess. Maybe a good way would be to return a pair of slices, before and after something was found (ignoring that there are no MRVs): (char[], char[]) myfind(char[] str, char[] tofind) { int index = find(str, tofind); //returns str.length if not found return str[0..index], str[index..$]; } This would give the user lots of flexibility, without having to declare useless temporaries, especially not for the first argument passed to myfind(). For example: was something found? bool found = myfind(bla, blubb)[1].length; (That's still more elegant than to compare lengths.)
 
 Andrei
May 14 2009
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-05-14 11:06:44 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 auto before = str[0 .. find(str, something)];
Hum, wouldn't this be more intuitive: auto before = str.before(str.find(something)); That could work by having "find" return a range over the searched string. At first glance having find(str, "def") returning "def" sounds silly and unhelpful, but since the returned "def" is a range over the original, you can do a lot of interesting things with it, such as: auto str = "abcdefghi"; auto strSlice = str.find("def"); // strSlice == "def" auto strBefore = str.before(slice); // strBefore == "abc" auto strAfter = str.after(slice); // strAfter == "ghi" An interesting thing is that it makes a lot of sense with a regex argument in find: auto str = "abcdefghi"; auto strSlice = str.find(regex("c.*g")); // strSlice = "cdefg" auto strBefore = str.before(strSlice); // strBefore = "ab" auto strAfter = str.after(strSlice); // strAfter = "hi" And you can modify the result in place: auto str = "hello world"; str.find("world")[] = "range"; assert(str == "hello range"); ... although that's somewhat limited since slice[] = x can't change the number of characters in the slice. The "not found" result would be the end of the string: str[$..$]. Getting the position of the found string would be done like this: auto str = "abcdefhij"; auto pos = str.before(str.find("def)).length; // pos == 3 But I guess having "findPos" or "indexOf" for that would be more readable. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 16 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 16 May 2009 14:31:04 +0400, Michel Fortin <michel.fortin michelf.com>
wrote:

 On 2009-05-14 11:06:44 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:

 auto before = str[0 .. find(str, something)];
Hum, wouldn't this be more intuitive: auto before = str.before(str.find(something)); That could work by having "find" return a range over the searched string. At first glance having find(str, "def") returning "def" sounds silly and unhelpful, but since the returned "def" is a range over the original, you can do a lot of interesting things with it, such as: auto str = "abcdefghi"; auto strSlice = str.find("def"); // strSlice == "def" auto strBefore = str.before(slice); // strBefore == "abc" auto strAfter = str.after(slice); // strAfter == "ghi" An interesting thing is that it makes a lot of sense with a regex argument in find: auto str = "abcdefghi"; auto strSlice = str.find(regex("c.*g")); // strSlice = "cdefg" auto strBefore = str.before(strSlice); // strBefore = "ab" auto strAfter = str.after(strSlice); // strAfter = "hi" And you can modify the result in place: auto str = "hello world"; str.find("world")[] = "range"; assert(str == "hello range"); ... although that's somewhat limited since slice[] = x can't change the number of characters in the slice. The "not found" result would be the end of the string: str[$..$]. Getting the position of the found string would be done like this: auto str = "abcdefhij"; auto pos = str.before(str.find("def)).length; // pos == 3 But I guess having "findPos" or "indexOf" for that would be more readable.
Interesting idea! I like it.
May 16 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 I could never understand the mentality of returning -1 versus end of 
 string.
It's not -1, it's uint.max. The reason for returning that value is that it is the only integer value that can never be a valid index.
May 14 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 16:51:27 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Steven Schveighoffer wrote:
 I could never understand the mentality of returning -1 versus end of  
 string.
It's not -1, it's uint.max. The reason for returning that value is that it is the only integer value that can never be a valid index.
Pretty sure uint.max - 1 is also invalid (as well as a host of other values) ;) but in any case, it's much less useful than str.length, as you can't directly use it, you simply have to check for it. If all you ever do is check to see if it's a certain value, why does it have to be uint.max? I guess I understand the mentality of picking -1 or uint.max as a reasonable "failure" return, but what I don't understand is why you would do so when you could use the length of the string, which is useful for many other purposes. -1 or uint.max just is a failure, and needs to be checked, it can't be used to do any slicing operations. I always hated doing string processing with Java vs. C++ because of this. -Steve
May 14 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 14 May 2009 16:59:32 -0400, Steven Schveighoffer wrote:

 I guess I understand the mentality of picking -1 or uint.max as a  
 reasonable "failure" return, but what I don't understand is why you would  
 do so when you could use the length of the string, which is useful for  
 many other purposes.  -1 or uint.max just is a failure, and needs to be  
 checked, it can't be used to do any slicing operations.
A problem with .length is that it requires the knowledge of the string for it to be meaningful. If find() can return any integer, we can only know if that integer represents WasFound or WasNotFound if we have the string in context. Currently we can do this ... funcA( find(needle, haystack), xyzzy); and 'funcA' doesn't need to know anything more about the 'haystack' to work. If find/indexOf() can return any integer we would have to code ... funcA( indexOf(needle, haystack) != haystack.length ? uint.max : 0, xyzzy); (assuming that funcA is expecting uint.max to mean WasNotFound). So there are really arguments for employing both methods. It just depends on why you are trying to 'find' stuff. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 14 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:

 On Thu, 14 May 2009 16:59:32 -0400, Steven Schveighoffer wrote:

 I guess I understand the mentality of picking -1 or uint.max as a
 reasonable "failure" return, but what I don't understand is why you  
 would
 do so when you could use the length of the string, which is useful for
 many other purposes.  -1 or uint.max just is a failure, and needs to be
 checked, it can't be used to do any slicing operations.
A problem with .length is that it requires the knowledge of the string for it to be meaningful. If find() can return any integer, we can only know if that integer represents WasFound or WasNotFound if we have the string in context. Currently we can do this ... funcA( find(needle, haystack), xyzzy); and 'funcA' doesn't need to know anything more about the 'haystack' to work. If find/indexOf() can return any integer we would have to code ... funcA( indexOf(needle, haystack) != haystack.length ? uint.max : 0, xyzzy); (assuming that funcA is expecting uint.max to mean WasNotFound). So there are really arguments for employing both methods. It just depends on why you are trying to 'find' stuff.
Not really. What could funcA possibly do with the index without the string itself? If it's just a flag (uint.max or not), then funcA should be: funcA(bool found, ...) and you call it with funcA(find(needle, haystack) < haystack.length, xyzzy) This doesn't cause any problems with people who use Tango, which returns the length if not found. In other words, if you find yourself writing code to "morph" the length into uint.max or -1, you are thinking about the problem incorrectly. -Steve
May 14 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:

 On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:
 Not really.  What could funcA possibly do with the index without the  
 string itself?  If it's just a flag (uint.max or not), then funcA should  
 be:
 
 funcA(bool found, ...)
 
 and you call it with
 
 funcA(find(needle, haystack) < haystack.length, xyzzy)
 
 This doesn't cause any problems with people who use Tango, which returns  
 the length if not found.  In other words, if you find yourself writing  
 code to "morph" the length into uint.max or -1, you are thinking about the  
 problem incorrectly.
Who said that I had control of how funcA() was implemented? -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 14 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 18:15:01 -0400, Derek Parnell <derek psych.ward> wrote:

 On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:

 On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward>  
 wrote:
 Not really.  What could funcA possibly do with the index without the
 string itself?  If it's just a flag (uint.max or not), then funcA should
 be:

 funcA(bool found, ...)

 and you call it with

 funcA(find(needle, haystack) < haystack.length, xyzzy)

 This doesn't cause any problems with people who use Tango, which returns
 the length if not found.  In other words, if you find yourself writing
 code to "morph" the length into uint.max or -1, you are thinking about  
 the
 problem incorrectly.
Who said that I had control of how funcA() was implemented?
Then I guess the rebuttal to that is, why should we make the design of std.string suffer to support a poorly designed legacy function? It's easy enough to write a wrapper around the properly designed string function to return what you wish. -Steve
May 15 2009
prev sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Derek Parnell wrote:
 On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:
 
 On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:
 Not really.  What could funcA possibly do with the index without the  
 string itself?  If it's just a flag (uint.max or not), then funcA should  
 be:

 funcA(bool found, ...)

 and you call it with

 funcA(find(needle, haystack) < haystack.length, xyzzy)

 This doesn't cause any problems with people who use Tango, which returns  
 the length if not found.  In other words, if you find yourself writing  
 code to "morph" the length into uint.max or -1, you are thinking about the  
 problem incorrectly.
Who said that I had control of how funcA() was implemented?
Well, if you wrote funcA(find(needle, haystack) < haystack.length, xyzzy) then it is you, who decided to use funcA. Now, if that wasn't to your liking, you'd of course write your own funcA (or use another function), that works as you want. Therefore, you're "in control of funcA() here".
May 15 2009
parent Derek Parnell <derek psych.ward> writes:
On Sat, 16 May 2009 04:36:29 +0300, Georg Wrede wrote:


 Well, if you wrote
 
    funcA(find(needle, haystack) < haystack.length, xyzzy)
 
 then it is you, who decided to use funcA. Now, if that wasn't to your 
 liking, you'd of course write your own funcA (or use another function), 
 that works as you want.
 
 Therefore, you're "in control of funcA() here".
Thank you, Georg. That is a conclusion I would never have come to. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 15 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm not sure what needs to be done about the combo string + algorithm.
 There's quite some overlap, and also functions that have the same name
 in both modules (e.g. find()), which forces you to disambiguate.
 Should std.algorithm automatically recognize strings and proceed
 accordingly, should it just consider them straight arrays and leave
 everything else to std.string (risky!), or refuse to handle strings?
I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm not sure what needs to be done about the combo string + algorithm.
 There's quite some overlap, and also functions that have the same name
 in both modules (e.g. find()), which forces you to disambiguate.
 Should std.algorithm automatically recognize strings and proceed
 accordingly, should it just consider them straight arrays and leave
 everything else to std.string (risky!), or refuse to handle strings?
I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? Andrei
May 14 2009
next sibling parent reply John C <johnch_atms hotmail.com> writes:
Andrei Alexandrescu Wrote:

 Cool! So then how do I rename find, ifind, rfind, and irfind in std.string?
 
 Andrei
indexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
May 14 2009
parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"John C" <johnch_atms hotmail.com> wrote in message 
news:guhel3$18kk$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 Cool! So then how do I rename find, ifind, rfind, and irfind in 
 std.string?

 Andrei
indexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
Please, an enum instead of bool... It's not apparent what indexOf(true) is supposed to do when you encounter it. Yes, named parameters would solve this :) L.
May 14 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lionello Lunesu wrote:
 
 "John C" <johnch_atms hotmail.com> wrote in message 
 news:guhel3$18kk$1 digitalmars.com...
 Andrei Alexandrescu Wrote:

 Cool! So then how do I rename find, ifind, rfind, and irfind in 
 std.string?

 Andrei
indexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
Please, an enum instead of bool... It's not apparent what indexOf(true) is supposed to do when you encounter it. Yes, named parameters would solve this :)
Yah, I defined enum CaseSensitive { no, yes } Andrei
May 14 2009
parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Yah, I defined
 
 enum CaseSensitive { no, yes }
Minor nitpick: there are lots of different ways to canonicalize text before performing a comparison. Ascii case conversions are just one way. Instead of an enum with a yes/no value, what about future-proofing it with something more along the lines of... enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair } ...or something like that. The yes/no enum will outlive its usefulness before long. --benji
May 16 2009
next sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Yah, I defined

 enum CaseSensitive { no, yes }
Minor nitpick: there are lots of different ways to canonicalize text before performing a comparison. Ascii case conversions are just one way. Instead of an enum with a yes/no value, what about future-proofing it with something more along the lines of... enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair } ...or something like that. The yes/no enum will outlive its usefulness before long.
True, as implemented, case sensitive functions only work on true ASCII strings in D. And I hope we don't even try to fix this universally, because the corner cases involved (including accents in separate entities), simply are too much to handle. And the only way to try to handle them, while keeping the code fast, is to first examine the string for any non-ASCII stuff, and then having two separate case functions for each usage.
May 16 2009
prev sibling parent Rainer Deyke <rainerd eldwood.com> writes:
Benji Smith wrote:
    enum CaseSensitivity {
       None, Ascii, UnicodeChar, UnicodeSurrogatePair
    }
D uses unicode, period, so D string functions should use unicode, always. (What's the difference between UnicodeChar and UnicodeSurrogatePair anyway? Unicode is unicode. wchar/UTF-16 uses surrogate pairs, char/UTF-8 and dchar/UTF-32 do not.) These options don't address the real issue, which is that different language have different rules. Which is the uppercase version of 'i', 'I' or 'İ'? Which is the lowercase version of 'I', 'i' or 'ı'? -- Rainer Deyke - rainerd eldwood.com
May 16 2009
prev sibling next sibling parent reply Steve Teale <steve.teale britseyeview.com> writes:
Andrei Alexandrescu Wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 I'm not sure what needs to be done about the combo string + algorithm.
 There's quite some overlap, and also functions that have the same name
 in both modules (e.g. find()), which forces you to disambiguate.
 Should std.algorithm automatically recognize strings and proceed
 accordingly, should it just consider them straight arrays and leave
 everything else to std.string (risky!), or refuse to handle strings?
I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? Andrei
Maybe you should rename the new stuff - indexOf, etc. Why break existing code that works with D1 and D2 and tests for -1. I think that find may be a rather widely used function. Steve
May 14 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steve Teale wrote:
 Andrei Alexandrescu Wrote:
 
 dsimcha wrote:
 == Quote from Andrei Alexandrescu
 (SeeWebsiteForEmail erdani.org)'s article
 I'm not sure what needs to be done about the combo string +
 algorithm. There's quite some overlap, and also functions that
 have the same name in both modules (e.g. find()), which forces
 you to disambiguate. Should std.algorithm automatically
 recognize strings and proceed accordingly, should it just
 consider them straight arrays and leave everything else to
 std.string (risky!), or refuse to handle strings?
I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? Andrei
Maybe you should rename the new stuff - indexOf, etc. Why break existing code that works with D1 and D2 and tests for -1. I think that find may be a rather widely used function.
That's the gist of the problem. The new functions don't return indexOf, the old ones do. I'd rather define good names and break things now, than stay with bad names forever. Andrei
May 14 2009
prev sibling parent Jacob Carlborg <doob me.com> writes:
Andrei Alexandrescu wrote:
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s 
 article
 I'm not sure what needs to be done about the combo string + algorithm.
 There's quite some overlap, and also functions that have the same name
 in both modules (e.g. find()), which forces you to disambiguate.
 Should std.algorithm automatically recognize strings and proceed
 accordingly, should it just consider them straight arrays and leave
 everything else to std.string (risky!), or refuse to handle strings?
I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? Andrei
Like others I think the names should be: find -> indexOf rfind -> lastIndexOf but I think the declaration should look like this: size_t indexOf (string s, dchar d, size_t start = 0) perhaps an argument for case sensitive also: size_t indexOf (string s, dchar d, size_t start = 0, bool caseSensitive = true)
May 14 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I'm not sure what needs to be done about the combo string + algorithm.  
 There's quite some overlap, and also functions that have the same name  
 in both modules (e.g. find()), which forces you to disambiguate.

 Should std.algorithm automatically recognize strings and proceed  
 accordingly, should it just consider them straight arrays and leave  
 everything else to std.string (risky!), or refuse to handle strings?

 Also, I dislike the signature int find() that returns -1 if not found.  
 Time and again experience shows that find() returning a range is much  
 better in real code because it works seamlessly when the  
 substring/element was not found (no more need for an extra test!)

 So I want to rename std.string.find() into something like findIndex or  
 indexOf. But then we also have ifind, rfind, irfind. Ideas for renaming?
find -> indexOf rfind -> lastIndexOf ifind, irfind -> do we care about renaming? They kind of seem niche anyways. another alternative, leave ifind and irfind deprecated for a while and redefine indexOf to be e.g.: indexOf(string s, dchar d, bool caseSensitive = true) -Steve
May 14 2009
prev sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Andrei Alexandrescu wrote:

 I'm not sure what needs to be done about the combo string + algorithm.
 There's quite some overlap, and also functions that have the same name
 in both modules (e.g. find()), which forces you to disambiguate.
 
 Should std.algorithm automatically recognize strings and proceed
 accordingly, should it just consider them straight arrays and leave
 everything else to std.string (risky!), or refuse to handle strings?
You gave me an idea there. Perhaps hierarchical modules are a bit outdated? Perhaps a modern programming language should instead work with a system of tags, since a function/class/entity may belong to more than one group. I too hate making a decision like that. So don't. Your std.string.find() may carry the `algorithm' tag and the `string' tag. So perhaps if both (or either?) of those tags are imported into a project, the function would become available. -- Michiel Helvensteijn
May 14 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Michiel Helvensteijn wrote:

 Your std.string.find() may carry the `algorithm' tag and the `string' tag.
 So perhaps if both (or either?) of those tags are imported into a project,
 the function would become available.
Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think? -- Michiel Helvensteijn
May 14 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Michiel Helvensteijn (m.helvensteijn.remove gmail.com)'s article
 Michiel Helvensteijn wrote:
 Your std.string.find() may carry the `algorithm' tag and the `string' tag.
 So perhaps if both (or either?) of those tags are imported into a project,
 the function would become available.
Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think?
This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems. Yes, namespace pollution is annoying, but so is a ridiculously fine-grained import system. After all, the whole point of a module system is to avoid namespace pollution. Otherwise, it would make sense to just import every darn module in every import path implicitly. Sometimes, when people get ridiculously crazy with hierarchical import structure and stuff, I feel like just having a more polluted namespace is the lesser of two evils, especially in D, where naming collision resolution is well-defined and sane.
May 14 2009
next sibling parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
dsimcha wrote:

 Hm.. I suppose a project could import any Boolean combination of tags:
 import algorithm & string;
 import algorithm | string;
 import algorithm & !string;
This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems.
Hey, just because the system is available, doesn't mean that every programmer should use it. For most people, imports would become even simpler than they are now: import string; would import every symbol with the `string' tag. If you only want the std.string functions, import std & string; or even import std.string; would do. And no one would notice the difference, except those Boolean PhD types who want to use the system to its fullest. ;-) Mostly it would make life easier for the library writers. Is std.string.find() a string function xor an algorithm? Nope. It's both. -- Michiel Helvensteijn
May 14 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Michiel Helvensteijn wrote:

 dsimcha wrote:
 
 Hm.. I suppose a project could import any Boolean combination of tags:
 import algorithm & string;
 import algorithm | string;
 import algorithm & !string;
This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems.
Hey, just because the system is available, doesn't mean that every programmer should use it. For most people, imports would become even simpler than they are now: import string; would import every symbol with the `string' tag. If you only want the std.string functions, import std & string; or even import std.string; would do. And no one would notice the difference, except those Boolean PhD types who want to use the system to its fullest. ;-) Mostly it would make life easier for the library writers. Is std.string.find() a string function xor an algorithm? Nope. It's both.
It has it's advantages, but misses the simplicity of matching symbols to the filesystem. You will also become more and more reliant on (specialized) tools with these kinds of systems, like .NET.
May 14 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Lutger wrote:

 It has it's advantages, but misses the simplicity of matching symbols to
 the filesystem. You will also become more and more reliant on
 (specialized) tools with these kinds of systems, like .NET.
I believe most filesystems are ready for this. You could probably match the tags to the filesystem if libraries install symbolic links. The library itself could have a hard link to the files (own the files) and the tag directories could have symbolic links. I haven't worked it out exactly, but it could be quite elegant. -- Michiel Helvensteijn
May 14 2009
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Michiel Helvensteijn wrote:
 Lutger wrote:
 
 It has it's advantages, but misses the simplicity of matching symbols to
 the filesystem. You will also become more and more reliant on
 (specialized) tools with these kinds of systems, like .NET.
I believe most filesystems are ready for this. You could probably match the tags to the filesystem if libraries install symbolic links. The library itself could have a hard link to the files (own the files) and the tag directories could have symbolic links. I haven't worked it out exactly, but it could be quite elegant.
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
May 14 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Daniel Keep wrote:

 http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
Heh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags. I can imagine a similar discussion having taken place when someone thought of higher-level control-flow constructs (you know, `if' and `while'). I imagine most people were perfectly happy with goto's and conditional jumps. -- Michiel Helvensteijn
May 15 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Michiel Helvensteijn wrote:
 Daniel Keep wrote:
 
 http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
Heh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags.
You're proposing taking a simple, easy to understand module system and replacing it with a poor version of Google. Your idea requires that all modules be registered with the compiler somehow since there's now no correlation between module and file. Then you go on to suggest that you could use the filesystem, something which is blatantly not possible: WinXP doesn't support symbolic links. And this still means you can't simply drop a folder with source files into a project: you've got to screw around setting up symbolic links. Even then, what exactly does this get you? The ability to say: import std & string; Now you have absolutely NO idea what you've just imported. You can't possibly look it up because anything could silently be included in that. And what happens when two libraries define two conflicting symbols? Now you've got to have some sort of disambiguation system. The tag system is the complicator's gloves: you're solving a problem which already has a simple working solution with the most complex system possible for no discernible benefit. Tags are great for when you have only a vague idea of what you're looking for. This might be pretty useful for searching docs, but it's completely unsuitable for actual code: you have to know what a function is called and what its arguments are before you call it. In order to find that out, you look up its documentation which ALSO tells you what module its in. And once you know what module a function's in, the tag system becomes completely superfluous and counter-productive.
 I can imagine a similar discussion having taken place when someone
 thought of higher-level control-flow constructs (you know, `if' and
 `while'). I imagine most people were perfectly happy with goto's and
 conditional jumps.
You must be joking. Unstructured programming was a complete mess. You're actually proposing taking a structured system and replacing it with an unstructured one... the complete opposite of the introduction of flow control structures. -- Daniel
May 15 2009
parent reply "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
Daniel Keep wrote:

 WinXP doesn't support symbolic links.
This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284 -- Simen
May 16 2009
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Simen Kjaeraas wrote:
 Daniel Keep wrote:
 
 WinXP doesn't support symbolic links.
This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284 -- Simen
You can't claim it supports them when the shell itself cannot create them, nor distinguish them from regular folders in any way. That's like saying that D supports returning the overflow flag from addition because it has inline assembler. Besides which, links in Windows tend to be horribly restrictive and/or dangerous, often both. I mean, look under "Deleting Symlinks With Normal Windows Filesystem Tools Is Not Only Dangerous But Also Bizarre" in the linked article. -- Daniel
May 16 2009
prev sibling parent reply Georg Wrede <georg.wrede iki.fi> writes:
Simen Kjaeraas wrote:
 Daniel Keep wrote:
 
 WinXP doesn't support symbolic links.
This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284
"and they work" may be somewhat optimistic. I read 30% of the article. You can expect the symliks to work, but as with most Windows stuff that's even two inches off the beaten path, it will work until the day you really need it. The more they seem to work, the more you'd of course use them. But come the day you need to restore your backup, to copy a file hierarchy, or somethig else, and *poof*, you'll have garbage all over the place. And no warning about it. Microsoft has this uncanny ability to fix everything with "slap-on code". Now, their real forte is to get it to work in the first place, whereas any normal coder would hopelessly tangle themselves within the first five minutes. But it only works in demos and as long as you follow some existing guidelines without *ever* doing something new, based on a (or actually, *the*) obvious theory of how it works. As an opposite, the entire *point* of the Unix user experience, is that once you have an understanding of the workings of the system, you can create new idioms, new patterns of work, and new ways to use the entire system -- without ever fearing that this would cause mysterious behavior. Just an example: MS-DOS had pipes already when regular PCs didn't have hard disks. One could watch grand demos in trade shows, where the guy piped stuff to sort, to find (their sorry version of grep), to more, and to custom made filters. What nobody told the user (until he had already bought a PC with MSDOS, and he had tried to actually use the feature, unsuccessfully, and then called the $10-a-minute hotline), is that the pipes were implemented so that the first program writes the entire output into a temporary file on the floppy, and once it has finished running, the next program then opens the file as input. Now, with the 0.00036 GB floppies of the day, it's not hard to see why nobody ever got any real pipe work done. Pipes were an integral part of Unix way before that time. And still, the Microsoft sales "persons" made all idiots believe Microsoft freaking *invented* the concept. (I've actually witnessed this in trade shows.)
May 16 2009
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Georg Wrede wrote:
 ...
 
 What nobody told the user (until he had already bought a PC with MSDOS,
 and he had tried to actually use the feature, unsuccessfully, and then
 called the $10-a-minute hotline), is that the pipes were implemented so
 that the first program writes the entire output into a temporary file on
 the floppy, and once it has finished running, the next program then
 opens the file as input.
 
 Now, with the 0.00036 GB floppies of the day, it's not hard to see why
 nobody ever got any real pipe work done.
 
 Pipes were an integral part of Unix way before that time. And still, the
 Microsoft sales "persons" made all idiots believe Microsoft freaking
 *invented* the concept. (I've actually witnessed this in trade shows.)
I imagine that was because DOS couldn't run more than a single program. -- Daniel
May 17 2009
prev sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
Georg Wrede wrote:

 Just an example: MS-DOS had pipes already when regular PCs didn't have  
 hard disks. One could watch grand demos in trade shows, where the guy  
 piped stuff to sort, to find (their sorry version of grep), to more, and  
 to custom made filters.

 What nobody told the user (until he had already bought a PC with MSDOS,  
 and he had tried to actually use the feature, unsuccessfully, and then  
 called the $10-a-minute hotline), is that the pipes were implemented so  
 that the first program writes the entire output into a temporary file on  
 the floppy, and once it has finished running, the next program then  
 opens the file as input.

 Now, with the 0.00036 GB floppies of the day, it's not hard to see why  
 nobody ever got any real pipe work done.
I guess it was just a pipe dream, then. :p -- Simen
May 18 2009
prev sibling parent Georg Wrede <georg.wrede iki.fi> writes:
Michiel Helvensteijn wrote:
 Daniel Keep wrote:
 
 http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
Heh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags. I can imagine a similar discussion having taken place when someone thought of higher-level control-flow constructs (you know, `if' and `while'). I imagine most people were perfectly happy with goto's and conditional jumps.
I used to work in a software house where half of the projects contained at least one sub project, of the Glove kind. The funniest thing is, these folks didn't understand the notion of Gloves, even when explained about it. (Of course, at the time I didn't have the word "Gloves", nor Wikipedia or www.c2.com to help me.)
May 15 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
dsimcha, el 14 de mayo a las 19:50 me escribiste:
 == Quote from Michiel Helvensteijn (m.helvensteijn.remove gmail.com)'s article
 Michiel Helvensteijn wrote:
 Your std.string.find() may carry the `algorithm' tag and the `string' tag.
 So perhaps if both (or either?) of those tags are imported into a project,
 the function would become available.
Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think?
This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems. Yes, namespace pollution is annoying, but so is a ridiculously fine-grained import system. After all, the whole point of a module system is to avoid namespace pollution. Otherwise, it would make sense to just import every darn module in every import path implicitly. Sometimes, when people get ridiculously crazy with hierarchical import structure and stuff, I feel like just having a more polluted namespace is the lesser of two evils, especially in D, where naming collision resolution is well-defined and sane.
I think the module system of D is pretty good, but there are a few things that can be improved (besides bugs =). As suggested in the path, I think the default should be static imports (like it's safest to default globals to be threadlocal, it's safer, or clearer for a code reviewer to use static imports by default). If you want to import all, something like import module: *; ca be added (like __gshared ;). There are a few other shortcomings that I don't remember now that can be fixed (I think modules can't have the same name as the package they are in or something). It would be nice to be able to put symbols in a package (a la Python). This can be easily done by allowing a .d file and a directory with the same name. For example: x/y.d module x.y; void f() {} void g() {} x.d module x; import x.y: g; z.d import x; x.f(); // error x.g(); // ok This allows to have some kind of "private modules", leaving the public interface in the package itself. Without this you have to use a "nested main module": z.d import x.y; x.y.g(); // why the user how to know about the details of package x? This last one maybe is more a matter of taste than a real issue... -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------
May 14 2009