digitalmars.D - std.string and std.algorithm: what to do?
- Andrei Alexandrescu (13/13) May 14 2009 I'm not sure what needs to be done about the combo string + algorithm.
- Steven Schveighoffer (6/10) May 14 2009 Ech, returning -1 is the bane of Java and C#. The return value should b...
- Daniel Keep (6/20) May 14 2009 int idx = find(str);
- Andrei Alexandrescu (8/30) May 14 2009 I think Steve's point is that often you want to do something like:
- Steven Schveighoffer (5/32) May 14 2009 Yes, and to Daniels point, I don't see a drawback. Typing str.length vs...
- grauzone (16/53) May 14 2009 But most time, you want to know both _if_ something was found, and
- Michel Fortin (33/34) May 16 2009 Hum, wouldn't this be more intuitive:
- Denis Koroskin (2/33) May 16 2009 Interesting idea! I like it.
- Walter Bright (3/5) May 14 2009 It's not -1, it's uint.max. The reason for returning that value is that
- Steven Schveighoffer (14/19) May 14 2009 Pretty sure uint.max - 1 is also invalid (as well as a host of other
- Derek Parnell (18/23) May 14 2009 A problem with .length is that it requires the knowledge of the string f...
- Steven Schveighoffer (12/34) May 14 2009 Not really. What could funcA possibly do with the index without the
- Derek Parnell (7/22) May 14 2009 Who said that I had control of how funcA() was implemented?
- Steven Schveighoffer (6/25) May 15 2009 Then I guess the rebuttal to that is, why should we make the design of
- Georg Wrede (7/27) May 15 2009 Well, if you wrote
- Derek Parnell (6/15) May 15 2009 Thank you, Georg. That is a conclusion I would never have come to.
- dsimcha (6/12) May 14 2009 I like the idea of automatically specializing on strings, at least on th...
- Andrei Alexandrescu (3/16) May 14 2009 Cool! So then how do I rename find, ifind, rfind, and irfind in std.stri...
- John C (2/5) May 14 2009 indexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
- Lionello Lunesu (6/12) May 14 2009 Please, an enum instead of bool... It's not apparent what indexOf(true) ...
- Andrei Alexandrescu (4/19) May 14 2009 Yah, I defined
- Benji Smith (11/14) May 16 2009 Minor nitpick: there are lots of different ways to canonicalize text
- Georg Wrede (8/26) May 16 2009 True, as implemented, case sensitive functions only work on true ASCII
- Rainer Deyke (10/13) May 16 2009 D uses unicode, period, so D string functions should use unicode, always...
- Steve Teale (3/21) May 14 2009 Maybe you should rename the new stuff - indexOf, etc. Why break existing...
- Andrei Alexandrescu (5/31) May 14 2009 That's the gist of the problem. The new functions don't return indexOf,
- Jacob Carlborg (9/32) May 14 2009 Like others I think the names should be:
- Steven Schveighoffer (11/23) May 14 2009 find -> indexOf
- Michiel Helvensteijn (10/17) May 14 2009 You gave me an idea there. Perhaps hierarchical modules are a bit outdat...
- Michiel Helvensteijn (11/14) May 14 2009 Hm.. I suppose a project could import any Boolean combination of tags:
- dsimcha (11/23) May 14 2009 This seems like overkill. Module/package/import/whatever management sho...
- Michiel Helvensteijn (16/25) May 14 2009 Hey, just because the system is available, doesn't mean that every
- Lutger (3/36) May 14 2009 It has it's advantages, but misses the simplicity of matching symbols to...
- Michiel Helvensteijn (8/11) May 14 2009 I believe most filesystems are ready for this. You could probably match ...
- Daniel Keep (2/14) May 14 2009 http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
- Michiel Helvensteijn (11/12) May 15 2009 Heh, that's a funny story. I didn't know that one. (I do think the sensi...
- Daniel Keep (31/45) May 15 2009 You're proposing taking a simple, easy to understand module system and
- Simen Kjaeraas (9/10) May 16 2009 This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic
- Daniel Keep (10/25) May 16 2009 You can't claim it supports them when the shell itself cannot create
- Georg Wrede (35/47) May 16 2009 "and they work" may be somewhat optimistic.
- Daniel Keep (3/18) May 17 2009 I imagine that was because DOS couldn't run more than a single program.
- Simen Kjaeraas (4/16) May 18 2009 I guess it was just a pipe dream, then. :p
- Georg Wrede (6/20) May 15 2009 I used to work in a software house where half of the projects contained
- Leandro Lucarella (37/62) May 14 2009 I think the module system of D is pretty good, but there are a few thing...
I'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings? Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!) So I want to rename std.string.find() into something like findIndex or indexOf. But then we also have ifind, rfind, irfind. Ideas for renaming? Andrei
May 14 2009
On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!)the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
May 14 2009
Steven Schveighoffer wrote:On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff; -- DanielAlso, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!)be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
May 14 2009
Daniel Keep wrote:Steven Schveighoffer wrote:I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess. AndreiOn Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!)be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
May 14 2009
On Thu, 14 May 2009 11:06:44 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Daniel Keep wrote:Yes, and to Daniels point, I don't see a drawback. Typing str.length vs -1 is really not that bad :) -SteveSteven Schveighoffer wrote:I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess.On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!)be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -Steve
May 14 2009
Andrei Alexandrescu wrote:Daniel Keep wrote:But most time, you want to know both _if_ something was found, and where. Returning the length of the string makes checking if something was found harder. That's also quite a mess. Maybe a good way would be to return a pair of slices, before and after something was found (ignoring that there are no MRVs): (char[], char[]) myfind(char[] str, char[] tofind) { int index = find(str, tofind); //returns str.length if not found return str[0..index], str[index..$]; } This would give the user lots of flexibility, without having to declare useless temporaries, especially not for the first argument passed to myfind(). For example: was something found? bool found = myfind(bla, blubb)[1].length; (That's still more elegant than to compare lengths.)Steven Schveighoffer wrote:I think Steve's point is that often you want to do something like: auto before = str[0 .. find(str, something)]; or auto after = str[find(str, something) .. $]; which behave nicely at limit conditions (when something isn't found). In contrast, the version with -1 needs a separate varible and a test - a mess.On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:int idx = find(str); if( idx == -1 ) doStuff; - or - if( idx == str.length ) doStuff;Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!)be the end of the string, regardless of whether it is a range or index. I could never understand the mentality of returning -1 versus end of string. -SteveAndrei
May 14 2009
On 2009-05-14 11:06:44 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:auto before = str[0 .. find(str, something)];Hum, wouldn't this be more intuitive: auto before = str.before(str.find(something)); That could work by having "find" return a range over the searched string. At first glance having find(str, "def") returning "def" sounds silly and unhelpful, but since the returned "def" is a range over the original, you can do a lot of interesting things with it, such as: auto str = "abcdefghi"; auto strSlice = str.find("def"); // strSlice == "def" auto strBefore = str.before(slice); // strBefore == "abc" auto strAfter = str.after(slice); // strAfter == "ghi" An interesting thing is that it makes a lot of sense with a regex argument in find: auto str = "abcdefghi"; auto strSlice = str.find(regex("c.*g")); // strSlice = "cdefg" auto strBefore = str.before(strSlice); // strBefore = "ab" auto strAfter = str.after(strSlice); // strAfter = "hi" And you can modify the result in place: auto str = "hello world"; str.find("world")[] = "range"; assert(str == "hello range"); ... although that's somewhat limited since slice[] = x can't change the number of characters in the slice. The "not found" result would be the end of the string: str[$..$]. Getting the position of the found string would be done like this: auto str = "abcdefhij"; auto pos = str.before(str.find("def)).length; // pos == 3 But I guess having "findPos" or "indexOf" for that would be more readable. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
May 16 2009
On Sat, 16 May 2009 14:31:04 +0400, Michel Fortin <michel.fortin michelf.com> wrote:On 2009-05-14 11:06:44 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Interesting idea! I like it.auto before = str[0 .. find(str, something)];Hum, wouldn't this be more intuitive: auto before = str.before(str.find(something)); That could work by having "find" return a range over the searched string. At first glance having find(str, "def") returning "def" sounds silly and unhelpful, but since the returned "def" is a range over the original, you can do a lot of interesting things with it, such as: auto str = "abcdefghi"; auto strSlice = str.find("def"); // strSlice == "def" auto strBefore = str.before(slice); // strBefore == "abc" auto strAfter = str.after(slice); // strAfter == "ghi" An interesting thing is that it makes a lot of sense with a regex argument in find: auto str = "abcdefghi"; auto strSlice = str.find(regex("c.*g")); // strSlice = "cdefg" auto strBefore = str.before(strSlice); // strBefore = "ab" auto strAfter = str.after(strSlice); // strAfter = "hi" And you can modify the result in place: auto str = "hello world"; str.find("world")[] = "range"; assert(str == "hello range"); ... although that's somewhat limited since slice[] = x can't change the number of characters in the slice. The "not found" result would be the end of the string: str[$..$]. Getting the position of the found string would be done like this: auto str = "abcdefhij"; auto pos = str.before(str.find("def)).length; // pos == 3 But I guess having "findPos" or "indexOf" for that would be more readable.
May 16 2009
Steven Schveighoffer wrote:I could never understand the mentality of returning -1 versus end of string.It's not -1, it's uint.max. The reason for returning that value is that it is the only integer value that can never be a valid index.
May 14 2009
On Thu, 14 May 2009 16:51:27 -0400, Walter Bright <newshound1 digitalmars.com> wrote:Steven Schveighoffer wrote:Pretty sure uint.max - 1 is also invalid (as well as a host of other values) ;) but in any case, it's much less useful than str.length, as you can't directly use it, you simply have to check for it. If all you ever do is check to see if it's a certain value, why does it have to be uint.max? I guess I understand the mentality of picking -1 or uint.max as a reasonable "failure" return, but what I don't understand is why you would do so when you could use the length of the string, which is useful for many other purposes. -1 or uint.max just is a failure, and needs to be checked, it can't be used to do any slicing operations. I always hated doing string processing with Java vs. C++ because of this. -SteveI could never understand the mentality of returning -1 versus end of string.It's not -1, it's uint.max. The reason for returning that value is that it is the only integer value that can never be a valid index.
May 14 2009
On Thu, 14 May 2009 16:59:32 -0400, Steven Schveighoffer wrote:I guess I understand the mentality of picking -1 or uint.max as a reasonable "failure" return, but what I don't understand is why you would do so when you could use the length of the string, which is useful for many other purposes. -1 or uint.max just is a failure, and needs to be checked, it can't be used to do any slicing operations.A problem with .length is that it requires the knowledge of the string for it to be meaningful. If find() can return any integer, we can only know if that integer represents WasFound or WasNotFound if we have the string in context. Currently we can do this ... funcA( find(needle, haystack), xyzzy); and 'funcA' doesn't need to know anything more about the 'haystack' to work. If find/indexOf() can return any integer we would have to code ... funcA( indexOf(needle, haystack) != haystack.length ? uint.max : 0, xyzzy); (assuming that funcA is expecting uint.max to mean WasNotFound). So there are really arguments for employing both methods. It just depends on why you are trying to 'find' stuff. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 14 2009
On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:On Thu, 14 May 2009 16:59:32 -0400, Steven Schveighoffer wrote:Not really. What could funcA possibly do with the index without the string itself? If it's just a flag (uint.max or not), then funcA should be: funcA(bool found, ...) and you call it with funcA(find(needle, haystack) < haystack.length, xyzzy) This doesn't cause any problems with people who use Tango, which returns the length if not found. In other words, if you find yourself writing code to "morph" the length into uint.max or -1, you are thinking about the problem incorrectly. -SteveI guess I understand the mentality of picking -1 or uint.max as a reasonable "failure" return, but what I don't understand is why you would do so when you could use the length of the string, which is useful for many other purposes. -1 or uint.max just is a failure, and needs to be checked, it can't be used to do any slicing operations.A problem with .length is that it requires the knowledge of the string for it to be meaningful. If find() can return any integer, we can only know if that integer represents WasFound or WasNotFound if we have the string in context. Currently we can do this ... funcA( find(needle, haystack), xyzzy); and 'funcA' doesn't need to know anything more about the 'haystack' to work. If find/indexOf() can return any integer we would have to code ... funcA( indexOf(needle, haystack) != haystack.length ? uint.max : 0, xyzzy); (assuming that funcA is expecting uint.max to mean WasNotFound). So there are really arguments for employing both methods. It just depends on why you are trying to 'find' stuff.
May 14 2009
On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:Not really. What could funcA possibly do with the index without the string itself? If it's just a flag (uint.max or not), then funcA should be: funcA(bool found, ...) and you call it with funcA(find(needle, haystack) < haystack.length, xyzzy) This doesn't cause any problems with people who use Tango, which returns the length if not found. In other words, if you find yourself writing code to "morph" the length into uint.max or -1, you are thinking about the problem incorrectly.Who said that I had control of how funcA() was implemented? -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 14 2009
On Thu, 14 May 2009 18:15:01 -0400, Derek Parnell <derek psych.ward> wrote:On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:Then I guess the rebuttal to that is, why should we make the design of std.string suffer to support a poorly designed legacy function? It's easy enough to write a wrapper around the properly designed string function to return what you wish. -SteveOn Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:Not really. What could funcA possibly do with the index without the string itself? If it's just a flag (uint.max or not), then funcA should be: funcA(bool found, ...) and you call it with funcA(find(needle, haystack) < haystack.length, xyzzy) This doesn't cause any problems with people who use Tango, which returns the length if not found. In other words, if you find yourself writing code to "morph" the length into uint.max or -1, you are thinking about the problem incorrectly.Who said that I had control of how funcA() was implemented?
May 15 2009
Derek Parnell wrote:On Thu, 14 May 2009 17:33:40 -0400, Steven Schveighoffer wrote:Well, if you wrote funcA(find(needle, haystack) < haystack.length, xyzzy) then it is you, who decided to use funcA. Now, if that wasn't to your liking, you'd of course write your own funcA (or use another function), that works as you want. Therefore, you're "in control of funcA() here".On Thu, 14 May 2009 17:21:02 -0400, Derek Parnell <derek psych.ward> wrote:Not really. What could funcA possibly do with the index without the string itself? If it's just a flag (uint.max or not), then funcA should be: funcA(bool found, ...) and you call it with funcA(find(needle, haystack) < haystack.length, xyzzy) This doesn't cause any problems with people who use Tango, which returns the length if not found. In other words, if you find yourself writing code to "morph" the length into uint.max or -1, you are thinking about the problem incorrectly.Who said that I had control of how funcA() was implemented?
May 15 2009
On Sat, 16 May 2009 04:36:29 +0300, Georg Wrede wrote:Well, if you wrote funcA(find(needle, haystack) < haystack.length, xyzzy) then it is you, who decided to use funcA. Now, if that wasn't to your liking, you'd of course write your own funcA (or use another function), that works as you want. Therefore, you're "in control of funcA() here".Thank you, Georg. That is a conclusion I would never have come to. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 15 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleI'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleCool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiI'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
Andrei Alexandrescu Wrote:Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiindexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
May 14 2009
"John C" <johnch_atms hotmail.com> wrote in message news:guhel3$18kk$1 digitalmars.com...Andrei Alexandrescu Wrote:Please, an enum instead of bool... It's not apparent what indexOf(true) is supposed to do when you encounter it. Yes, named parameters would solve this :) L.Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiindexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
May 14 2009
Lionello Lunesu wrote:"John C" <johnch_atms hotmail.com> wrote in message news:guhel3$18kk$1 digitalmars.com...Yah, I defined enum CaseSensitive { no, yes } AndreiAndrei Alexandrescu Wrote:Please, an enum instead of bool... It's not apparent what indexOf(true) is supposed to do when you encounter it. Yes, named parameters would solve this :)Cool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiindexOf(bool ignoreCase = false), lastIndexOf(bool ignoreCase = false).
May 14 2009
Andrei Alexandrescu wrote:Yah, I defined enum CaseSensitive { no, yes }Minor nitpick: there are lots of different ways to canonicalize text before performing a comparison. Ascii case conversions are just one way. Instead of an enum with a yes/no value, what about future-proofing it with something more along the lines of... enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair } ...or something like that. The yes/no enum will outlive its usefulness before long. --benji
May 16 2009
Benji Smith wrote:Andrei Alexandrescu wrote:True, as implemented, case sensitive functions only work on true ASCII strings in D. And I hope we don't even try to fix this universally, because the corner cases involved (including accents in separate entities), simply are too much to handle. And the only way to try to handle them, while keeping the code fast, is to first examine the string for any non-ASCII stuff, and then having two separate case functions for each usage.Yah, I defined enum CaseSensitive { no, yes }Minor nitpick: there are lots of different ways to canonicalize text before performing a comparison. Ascii case conversions are just one way. Instead of an enum with a yes/no value, what about future-proofing it with something more along the lines of... enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair } ...or something like that. The yes/no enum will outlive its usefulness before long.
May 16 2009
Benji Smith wrote:enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair }D uses unicode, period, so D string functions should use unicode, always. (What's the difference between UnicodeChar and UnicodeSurrogatePair anyway? Unicode is unicode. wchar/UTF-16 uses surrogate pairs, char/UTF-8 and dchar/UTF-32 do not.) These options don't address the real issue, which is that different language have different rules. Which is the uppercase version of 'i', 'I' or 'İ'? Which is the lowercase version of 'I', 'i' or 'ı'? -- Rainer Deyke - rainerd eldwood.com
May 16 2009
Andrei Alexandrescu Wrote:dsimcha wrote:Maybe you should rename the new stuff - indexOf, etc. Why break existing code that works with D1 and D2 and tests for -1. I think that find may be a rather widely used function. Steve== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleCool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiI'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
Steve Teale wrote:Andrei Alexandrescu Wrote:That's the gist of the problem. The new functions don't return indexOf, the old ones do. I'd rather define good names and break things now, than stay with bad names forever. Andreidsimcha wrote:Maybe you should rename the new stuff - indexOf, etc. Why break existing code that works with D1 and D2 and tests for -1. I think that find may be a rather widely used function.== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleCool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiI'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
Andrei Alexandrescu wrote:dsimcha wrote:Like others I think the names should be: find -> indexOf rfind -> lastIndexOf but I think the declaration should look like this: size_t indexOf (string s, dchar d, size_t start = 0) perhaps an argument for case sensitive also: size_t indexOf (string s, dchar d, size_t start = 0, bool caseSensitive = true)== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleCool! So then how do I rename find, ifind, rfind, and irfind in std.string? AndreiI'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?I like the idea of automatically specializing on strings, at least on the surface. It's less to remember for the programmer, less annoying naming collisions, and stuff "just works". You get your generic std.algorithm stuff when you need it and your more optimized/variable length character encoding stuff when you need it without having to explicitly specify which one you want.
May 14 2009
On Thu, 14 May 2009 09:55:08 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings? Also, I dislike the signature int find() that returns -1 if not found. Time and again experience shows that find() returning a range is much better in real code because it works seamlessly when the substring/element was not found (no more need for an extra test!) So I want to rename std.string.find() into something like findIndex or indexOf. But then we also have ifind, rfind, irfind. Ideas for renaming?find -> indexOf rfind -> lastIndexOf ifind, irfind -> do we care about renaming? They kind of seem niche anyways. another alternative, leave ifind and irfind deprecated for a while and redefine indexOf to be e.g.: indexOf(string s, dchar d, bool caseSensitive = true) -Steve
May 14 2009
Andrei Alexandrescu wrote:I'm not sure what needs to be done about the combo string + algorithm. There's quite some overlap, and also functions that have the same name in both modules (e.g. find()), which forces you to disambiguate. Should std.algorithm automatically recognize strings and proceed accordingly, should it just consider them straight arrays and leave everything else to std.string (risky!), or refuse to handle strings?You gave me an idea there. Perhaps hierarchical modules are a bit outdated? Perhaps a modern programming language should instead work with a system of tags, since a function/class/entity may belong to more than one group. I too hate making a decision like that. So don't. Your std.string.find() may carry the `algorithm' tag and the `string' tag. So perhaps if both (or either?) of those tags are imported into a project, the function would become available. -- Michiel Helvensteijn
May 14 2009
Michiel Helvensteijn wrote:Your std.string.find() may carry the `algorithm' tag and the `string' tag. So perhaps if both (or either?) of those tags are imported into a project, the function would become available.Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think? -- Michiel Helvensteijn
May 14 2009
== Quote from Michiel Helvensteijn (m.helvensteijn.remove gmail.com)'s articleMichiel Helvensteijn wrote:This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems. Yes, namespace pollution is annoying, but so is a ridiculously fine-grained import system. After all, the whole point of a module system is to avoid namespace pollution. Otherwise, it would make sense to just import every darn module in every import path implicitly. Sometimes, when people get ridiculously crazy with hierarchical import structure and stuff, I feel like just having a more polluted namespace is the lesser of two evils, especially in D, where naming collision resolution is well-defined and sane.Your std.string.find() may carry the `algorithm' tag and the `string' tag. So perhaps if both (or either?) of those tags are imported into a project, the function would become available.Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think?
May 14 2009
dsimcha wrote:Hey, just because the system is available, doesn't mean that every programmer should use it. For most people, imports would become even simpler than they are now: import string; would import every symbol with the `string' tag. If you only want the std.string functions, import std & string; or even import std.string; would do. And no one would notice the difference, except those Boolean PhD types who want to use the system to its fullest. ;-) Mostly it would make life easier for the library writers. Is std.string.find() a string function xor an algorithm? Nope. It's both. -- Michiel HelvensteijnHm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string;This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems.
May 14 2009
Michiel Helvensteijn wrote:dsimcha wrote:It has it's advantages, but misses the simplicity of matching symbols to the filesystem. You will also become more and more reliant on (specialized) tools with these kinds of systems, like .NET.Hey, just because the system is available, doesn't mean that every programmer should use it. For most people, imports would become even simpler than they are now: import string; would import every symbol with the `string' tag. If you only want the std.string functions, import std & string; or even import std.string; would do. And no one would notice the difference, except those Boolean PhD types who want to use the system to its fullest. ;-) Mostly it would make life easier for the library writers. Is std.string.find() a string function xor an algorithm? Nope. It's both.Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string;This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems.
May 14 2009
Lutger wrote:It has it's advantages, but misses the simplicity of matching symbols to the filesystem. You will also become more and more reliant on (specialized) tools with these kinds of systems, like .NET.I believe most filesystems are ready for this. You could probably match the tags to the filesystem if libraries install symbolic links. The library itself could have a hard link to the files (own the files) and the tag directories could have symbolic links. I haven't worked it out exactly, but it could be quite elegant. -- Michiel Helvensteijn
May 14 2009
Michiel Helvensteijn wrote:Lutger wrote:http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspxIt has it's advantages, but misses the simplicity of matching symbols to the filesystem. You will also become more and more reliant on (specialized) tools with these kinds of systems, like .NET.I believe most filesystems are ready for this. You could probably match the tags to the filesystem if libraries install symbolic links. The library itself could have a hard link to the files (own the files) and the tag directories could have symbolic links. I haven't worked it out exactly, but it could be quite elegant.
May 14 2009
Daniel Keep wrote:http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspxHeh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags. I can imagine a similar discussion having taken place when someone thought of higher-level control-flow constructs (you know, `if' and `while'). I imagine most people were perfectly happy with goto's and conditional jumps. -- Michiel Helvensteijn
May 15 2009
Michiel Helvensteijn wrote:Daniel Keep wrote:You're proposing taking a simple, easy to understand module system and replacing it with a poor version of Google. Your idea requires that all modules be registered with the compiler somehow since there's now no correlation between module and file. Then you go on to suggest that you could use the filesystem, something which is blatantly not possible: WinXP doesn't support symbolic links. And this still means you can't simply drop a folder with source files into a project: you've got to screw around setting up symbolic links. Even then, what exactly does this get you? The ability to say: import std & string; Now you have absolutely NO idea what you've just imported. You can't possibly look it up because anything could silently be included in that. And what happens when two libraries define two conflicting symbols? Now you've got to have some sort of disambiguation system. The tag system is the complicator's gloves: you're solving a problem which already has a simple working solution with the most complex system possible for no discernible benefit. Tags are great for when you have only a vague idea of what you're looking for. This might be pretty useful for searching docs, but it's completely unsuitable for actual code: you have to know what a function is called and what its arguments are before you call it. In order to find that out, you look up its documentation which ALSO tells you what module its in. And once you know what module a function's in, the tag system becomes completely superfluous and counter-productive.http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspxHeh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags.I can imagine a similar discussion having taken place when someone thought of higher-level control-flow constructs (you know, `if' and `while'). I imagine most people were perfectly happy with goto's and conditional jumps.You must be joking. Unstructured programming was a complete mess. You're actually proposing taking a structured system and replacing it with an unstructured one... the complete opposite of the introduction of flow control structures. -- Daniel
May 15 2009
Daniel Keep wrote:WinXP doesn't support symbolic links.This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284 -- Simen
May 16 2009
Simen Kjaeraas wrote:Daniel Keep wrote:You can't claim it supports them when the shell itself cannot create them, nor distinguish them from regular folders in any way. That's like saying that D supports returning the overflow flag from addition because it has inline assembler. Besides which, links in Windows tend to be horribly restrictive and/or dangerous, often both. I mean, look under "Deleting Symlinks With Normal Windows Filesystem Tools Is Not Only Dangerous But Also Bizarre" in the linked article. -- DanielWinXP doesn't support symbolic links.This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284 -- Simen
May 16 2009
Simen Kjaeraas wrote:Daniel Keep wrote:"and they work" may be somewhat optimistic. I read 30% of the article. You can expect the symliks to work, but as with most Windows stuff that's even two inches off the beaten path, it will work until the day you really need it. The more they seem to work, the more you'd of course use them. But come the day you need to restore your backup, to copy a file hierarchy, or somethig else, and *poof*, you'll have garbage all over the place. And no warning about it. Microsoft has this uncanny ability to fix everything with "slap-on code". Now, their real forte is to get it to work in the first place, whereas any normal coder would hopelessly tangle themselves within the first five minutes. But it only works in demos and as long as you follow some existing guidelines without *ever* doing something new, based on a (or actually, *the*) obvious theory of how it works. As an opposite, the entire *point* of the Unix user experience, is that once you have an understanding of the workings of the system, you can create new idioms, new patterns of work, and new ways to use the entire system -- without ever fearing that this would cause mysterious behavior. Just an example: MS-DOS had pipes already when regular PCs didn't have hard disks. One could watch grand demos in trade shows, where the guy piped stuff to sort, to find (their sorry version of grep), to more, and to custom made filters. What nobody told the user (until he had already bought a PC with MSDOS, and he had tried to actually use the feature, unsuccessfully, and then called the $10-a-minute hotline), is that the pipes were implemented so that the first program writes the entire output into a temporary file on the floppy, and once it has finished running, the next program then opens the file as input. Now, with the 0.00036 GB floppies of the day, it's not hard to see why nobody ever got any real pipe work done. Pipes were an integral part of Unix way before that time. And still, the Microsoft sales "persons" made all idiots believe Microsoft freaking *invented* the concept. (I've actually witnessed this in trade shows.)WinXP doesn't support symbolic links.This is not true. NTFS 5.0+ (Windows 2000+) has support for symbolic links, they're just not readily available to the average user. They are known as 'junction points' or 'reparse points'. There are certain gotchas, and Explorer has no idea how to handle them intelligently, but they exist and they work. More here: http://www.shell-shocked.org/article.php?id=284
May 16 2009
Georg Wrede wrote:... What nobody told the user (until he had already bought a PC with MSDOS, and he had tried to actually use the feature, unsuccessfully, and then called the $10-a-minute hotline), is that the pipes were implemented so that the first program writes the entire output into a temporary file on the floppy, and once it has finished running, the next program then opens the file as input. Now, with the 0.00036 GB floppies of the day, it's not hard to see why nobody ever got any real pipe work done. Pipes were an integral part of Unix way before that time. And still, the Microsoft sales "persons" made all idiots believe Microsoft freaking *invented* the concept. (I've actually witnessed this in trade shows.)I imagine that was because DOS couldn't run more than a single program. -- Daniel
May 17 2009
Georg Wrede wrote:Just an example: MS-DOS had pipes already when regular PCs didn't have hard disks. One could watch grand demos in trade shows, where the guy piped stuff to sort, to find (their sorry version of grep), to more, and to custom made filters. What nobody told the user (until he had already bought a PC with MSDOS, and he had tried to actually use the feature, unsuccessfully, and then called the $10-a-minute hotline), is that the pipes were implemented so that the first program writes the entire output into a temporary file on the floppy, and once it has finished running, the next program then opens the file as input. Now, with the 0.00036 GB floppies of the day, it's not hard to see why nobody ever got any real pipe work done.I guess it was just a pipe dream, then. :p -- Simen
May 18 2009
Michiel Helvensteijn wrote:Daniel Keep wrote:I used to work in a software house where half of the projects contained at least one sub project, of the Glove kind. The funniest thing is, these folks didn't understand the notion of Gloves, even when explained about it. (Of course, at the time I didn't have the word "Gloves", nor Wikipedia or www.c2.com to help me.)http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspxHeh, that's a funny story. I didn't know that one. (I do think the sensible developer was a bit of a killjoy, though.) But how about coming up with some actual arguments here? What are the `gloves' in this situation? Don't say `modules' without explaining how they offer the same advantages as tags. I can imagine a similar discussion having taken place when someone thought of higher-level control-flow constructs (you know, `if' and `while'). I imagine most people were perfectly happy with goto's and conditional jumps.
May 15 2009
dsimcha, el 14 de mayo a las 19:50 me escribiste:== Quote from Michiel Helvensteijn (m.helvensteijn.remove gmail.com)'s articleI think the module system of D is pretty good, but there are a few things that can be improved (besides bugs =). As suggested in the path, I think the default should be static imports (like it's safest to default globals to be threadlocal, it's safer, or clearer for a code reviewer to use static imports by default). If you want to import all, something like import module: *; ca be added (like __gshared ;). There are a few other shortcomings that I don't remember now that can be fixed (I think modules can't have the same name as the package they are in or something). It would be nice to be able to put symbols in a package (a la Python). This can be easily done by allowing a .d file and a directory with the same name. For example: x/y.d module x.y; void f() {} void g() {} x.d module x; import x.y: g; z.d import x; x.f(); // error x.g(); // ok This allows to have some kind of "private modules", leaving the public interface in the package itself. Without this you have to use a "nested main module": z.d import x.y; x.y.g(); // why the user how to know about the details of package x? This last one maybe is more a matter of taste than a real issue... -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------------Michiel Helvensteijn wrote:This seems like overkill. Module/package/import/whatever management should not require a Ph.D. in Boolean logic. It should be dead simple (i.e. like the current system is, esp. after 314 gets fixed) and allow people to get on with solving real problems. Yes, namespace pollution is annoying, but so is a ridiculously fine-grained import system. After all, the whole point of a module system is to avoid namespace pollution. Otherwise, it would make sense to just import every darn module in every import path implicitly. Sometimes, when people get ridiculously crazy with hierarchical import structure and stuff, I feel like just having a more polluted namespace is the lesser of two evils, especially in D, where naming collision resolution is well-defined and sane.Your std.string.find() may carry the `algorithm' tag and the `string' tag. So perhaps if both (or either?) of those tags are imported into a project, the function would become available.Hm.. I suppose a project could import any Boolean combination of tags: import algorithm & string; import algorithm | string; import algorithm & !string; And if modules were gone, I suppose you'd want to tag all your standard-library functions with `std' as well. A bit too radical a change, I suppose. But I believe I will think about this some more. What do you think?
May 14 2009