digitalmars.D - V2 string
- Derek Parnell (12/12) Jul 04 2007 I'm converting Bud to compile using V2 and so far its been a very hard
- Walter Bright (5/15) Jul 04 2007 First of all, if you were returning string literals as char[] and trying...
- Derek Parnell (23/39) Jul 04 2007 But I'm not, and never have been, returning string literals anywhere.
- Vladimir Panteleev (13/21) Jul 04 2007 Is SomeTextFunc allocating a copy of the string which it is returning? I...
- Derek Parnell (28/52) Jul 04 2007 Yes, I realize this and I'm not saying its doing the wrong thing, and
- Walter Bright (5/28) Jul 05 2007 If you're needing to guard against inadvertent modification, that's just...
-
Regan Heath
(4/36)
Jul 05 2007
Aaargh! You're confusing empty and non-existant (null) again!
- Walter Bright (4/6) Jul 05 2007 The only case is when you're extending into a preallocated buffer. Such
- James Dennett (9/16) Jul 05 2007 But a way of emptying something was asked for, and you showed
- Walter Bright (2/6) Jul 05 2007 I'd like to know of such cases.
- Derek Parnell (13/20) Jul 05 2007 char[] Option;
- Derek Parnell (16/36) Jul 05 2007 And if you must nitpick that one can code this a different way then here...
- Bill Baxter (6/41) Jul 05 2007 In databases NULL being different from empty seems to a big deal too.
- Sean Kelly (3/7) Jul 06 2007 Either that or it's important to a non-null set of programmers.
- Walter Bright (6/16) Jul 06 2007 Of course, if a function is documented to behave that way, and you have
- Regan Heath (12/32) Jul 06 2007 The first argument which I think holds water is that it is trivial to
- Bruno Medeiros (16/36) Jul 07 2007 Uh, unlike tab stops, I think it is widely recognized by the developer
- Leandro Lucarella (17/37) Jul 06 2007 Basically is the same issue as NULL and NOT NULL on SQL...
- James Dennett (14/21) Jul 06 2007 Any time you need a difference between "specified, and
- Serg Kovrov (13/20) Jul 07 2007 I used to this pattern:
- Derek Parnell (22/53) Jul 05 2007 There is no issue. I'm not raising an issue. I'm just making some
- Walter Bright (6/20) Jul 05 2007 Such a distinction is critical in C code, but is not of much use in D
- Regan Heath (13/41) Jul 05 2007 Question; Do these functions keep a copy of the returned string? Or, t...
- Bruno Medeiros (9/36) Jul 05 2007 Why is 'text.length = 0;' or 'text = text.init;' better than the idiom:
- Sean Kelly (4/10) Jul 05 2007 So just use char[] instead of 'string'. I don't plan to use the aliases...
- Derek Parnell (26/36) Jul 05 2007 It's not so clear cut. Firstly, a lot of phobos routines now return
- Walter Bright (9/22) Jul 05 2007 If you write it like this:
- Regan Heath (15/41) Jul 05 2007 Because tolower does it for you, but it still returns string and if for ...
- Bruno Medeiros (23/75) Jul 05 2007 Indeed, I think this illustrates that some standard library functions
- Frits van Bommel (11/42) Jul 05 2007 Sorry, but you seem to have missed a bit above: if the string doesn't
- Bruno Medeiros (20/67) Jul 05 2007 Oops, sorry, that's right, I missed that part about tolower not
- Regan Heath (18/42) Jul 06 2007 True.. but it's unfortunate that the most efficient case, where no
- Bruno Medeiros (11/46) Jul 07 2007 Algoritms should care about worst-case performance, or average-case
- Regan Heath (3/6) Jul 05 2007 I was hoping for something clever'er ;)
- Bruno Medeiros (10/47) Jul 05 2007 It doesn't make sense to template it, because you'd still have two
- Regan Heath (17/23) Jul 06 2007 If the template is
- Walter Bright (4/20) Jul 05 2007 tolower only dups the string if it needs to. It won't dup a string that
- Regan Heath (11/33) Jul 06 2007 opCatAssign does. (dup #2)
- Regan Heath (88/88) Jul 06 2007 Proof of concept.
- Derek Parnell (19/45) Jul 05 2007 If you have any failing Walter, its your ability to focus on insignifacn...
- Oskar Linde (8/21) Jul 05 2007 What you are doing there is mixing two styles of functions. Functional
- Walter Bright (9/22) Jul 05 2007 My point is that the way the snippet is written is inside out. Do not
- Derek Parnell (8/32) Jul 05 2007 Thanks. This is what I meant by taking rethinking the design of my
- BCS (29/54) Jul 05 2007 The one issue I can see with this is where an input is const but may be ...
- Walter Bright (9/21) Jul 05 2007 My experience with this is:
- Sean Kelly (9/48) Jul 05 2007 I'd argue that the parameters should be "const char[]" rather than
- Kristian Kilpi (51/61) Jul 05 2007 =
- Walter Bright (12/25) Jul 05 2007 No, because then they must always dup the string. If they don't need to
- Kristian Kilpi (21/39) Jul 06 2007 ng =
I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable. I might have to rethink of the design of the application to avoid the performance hit of all these dups. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 04 2007
Derek Parnell wrote:I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable. I might have to rethink of the design of the application to avoid the performance hit of all these dups.First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments). Second, you can use char[] instead of string.
Jul 04 2007
On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:Derek Parnell wrote:But I'm not, and never have been, returning string literals anywhere.I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable. I might have to rethink of the design of the application to avoid the performance hit of all these dups.First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).Second, you can use char[] instead of string.The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup; Another commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Jul 04 2007
On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wro= te:This leads to constructs like ... char[] result; result =3D SomeTextFunc(data).dup;Is SomeTextFunc allocating a copy of the string which it is returning? I= f it is, then there's no reason why it should return a "string" type. If= it isn't, then modifying the data in the returned char[] could have unf= oreseen consequences.Another commonly used idiom that I had to stop using was ... char[] text; text =3D getvalue(); if (wrongvalue(text)) text =3D ""; // Reset to an empty stringSince empty string literals don't really point to data, I'd suggest that= empty string and array literals shouldn't be const/invariant in favor o= f the above example. It breaks some consistency, but "a foolish consiste= ncy is the hobgoblin of little minds" ;) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 04 2007
On Thu, 05 Jul 2007 04:44:41 +0300, Vladimir Panteleev wrote:On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek psych.ward> wrote:Yes, I realize this and I'm not saying its doing the wrong thing, and actually I'm not even complaining. I'm just letting people know some of the observations I've had in moving to v2. In this case, someone has to copy the resulting data - either the function that created it or the routine that called the function. If the called function does the duplication, it could be a waste if the calling function is not going to further modify it, that is why I elected to pass a 'const' reference to the new data. The calling function can then decide if it needs a copy (to modify it) or not. string result; result = SomeTextFunc(data); // no need to dup if I'm not changing it. I've got a set of alias to help me ... alias char[] text; alias wchar[] wtext; alias dchar[] dtext; so now I see 'text' as mutable and 'string' as immutable.This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup;Is SomeTextFunc allocating a copy of the string which it is returning? If it is, then there's no reason why it should return a "string" type. If it isn't, then modifying the data in the returned char[] could have unforeseen consequences.Nice idea, but I can't see it happening because of the inconsistency angle. Instead I've decided to use the idiom ... text txt; txt = getvalue(); if (wrongvalue(txt)) txt = text.init; // Reset to an empty string -- Derek (skype: derek.j.parnell) Melbourne, Australia 5/07/2007 3:52:27 PMAnother commonly used idiom that I had to stop using was ... char[] txt; txt = getvalue(); if (wrongvalue(txt)) txt = ""; // Reset to an empty stringSince empty string literals don't really point to data, I'd suggest that empty string and array literals shouldn't be const/invariant in favor of the above example. It breaks some consistency, but "a foolish consistency is the hobgoblin of little minds" ;)
Jul 04 2007
Derek Parnell wrote:The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup;If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.Another commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable.This should do it nicely: text = null;
Jul 05 2007
Walter Bright Wrote:Derek Parnell wrote:Aaargh! You're confusing empty and non-existant (null) again! <g> In some cases there is an important difference between the two. In this case maybe not I don't really know. ReganThe idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup;If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.Another commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable.This should do it nicely: text = null;
Jul 05 2007
Regan Heath wrote:Aaargh! You're confusing empty and non-existant (null) again! <g>In this case, no.In some cases there is an important difference between the two.The only case is when you're extending into a preallocated buffer. Such cannot be the case with string literals.
Jul 05 2007
Walter Bright wrote:Regan Heath wrote:But a way of emptying something was asked for, and you showed a way to make it null, not empty -- can you explain your "In this case, no"?Aaargh! You're confusing empty and non-existant (null) again! <g>In this case, no.I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist. -- JamesIn some cases there is an important difference between the two.The only case is when you're extending into a preallocated buffer.
Jul 05 2007
James Dennett wrote:I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 05 2007
On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:James Dennett wrote:char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one. -- Derek Parnell Melbourne, Australia skype: derek.j.parnellI've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 05 2007
On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:And if you must nitpick that one can code this a different way then here is another example. Let's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation. -- Derek (skype: derek.j.parnell) Melbourne, Australia 6/07/2007 2:54:45 PMJames Dennett wrote:char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 05 2007
Derek Parnell wrote:On Fri, 6 Jul 2007 14:23:43 +1000, Derek Parnell wrote:In databases NULL being different from empty seems to a big deal too. Anyway googling for "null versus empty" turns up a bevy of hits, so from that I think we can presume that the distinction is important to a non-empty subset of programmers. --bbOn Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:And if you must nitpick that one can code this a different way then here is another example. Let's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation.James Dennett wrote:char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 05 2007
Bill Baxter wrote:Anyway googling for "null versus empty" turns up a bevy of hits, so from that I think we can presume that the distinction is important to a non-empty subset of programmers.Either that or it's important to a non-null set of programmers. ;-) Sean
Jul 06 2007
Derek Parnell wrote:Let's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation.Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.
Jul 06 2007
Walter Bright wrote:Derek Parnell wrote:The first argument which I think holds water is that it is trivial to represent empty and non existant in C, eg. char *empty = ""; char *non-existant = NULL; The other argument is the one made earlier about databases. In a database empty and non-existant are important distinct states a value could have. Currently, D can model these but it worries me that you don't seem to think that it's important. So, perhaps in future you might decide to get rid of this, or do so accidently. ReganLet's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation.Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.
Jul 06 2007
Walter Bright wrote:Derek Parnell wrote:Uh, unlike tab stops, I think it is widely recognized by the developer community that it is useful to have a distinction between *valid* and *invalid* values of something. Why is there a NAN for floats (and in D NAN is the default value for floats) ? What if NAN was equal to zero? Didn't you yourself, Walter, said once that if there was a way to have an actual invalid value for ints (without sacrificing precision) you would like to have that, and you would place it as the default value for int, instead of -1 (which is a valid int)? So why shouldn't arrays (who are already reference types) have a value that means "invalid array", especially if we can get that for free (unlike ints)? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DLet's say that there is this library routine, which is closed source and I don't have access to its source, that accepts a string as its argument. Further more, if that passed string is null the routine uses a default value - whatever that is because I don't know it. Now in my code I call it with ... SomeFunc(""); -- Use an empty string to do its magic SomeFunc(null); -- But this time, use the default value Remember, I have no control over the SomeFunc routine's implementation.Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation. There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.
Jul 07 2007
Derek Parnell, el 6 de julio a las 14:23 me escribiste:On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:Basically is the same issue as NULL and NOT NULL on SQL... -- LUCA - Leandro Lucarella - Usando Debian GNU/Linux Sid - GNU Generation ------------------------------------------------------------------------ E-Mail / JID: luca lugmen.org.ar GPG Fingerprint: D9E1 4545 0F4B 7928 E82C 375D 4B02 0FE0 B08B 4FB2 GPG Key: gpg --keyserver pks.lugmen.org.ar --recv-keys B08B4FB2 ------------------------------------------------------------------------ Sé que tu me miras, pero yo me juraría que, en esos ojos negros que tenés, hay un indio sensible que piensa: "Qué bárbaro que este tipo blanco esté tratando de comunicarse conmigo que soy un ser inferior en la escala del homo sapiens". Por eso, querido indio, no puedo dejar de mirarte como si fueras un cobayo de mierda al que puedo pisar cuando quiera. -- Ricardo Vaporeso. Carta a los aborígenes, ed. Gredos, Barcelona, 1912, página 102.James Dennett wrote:char[] Option; Option = getOptionFromUser(); if (Option.ptr = 0) { Option = DefaultOption; } However, if the user sets the option to "" then that is what they want and not the default one.I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 06 2007
Walter Bright wrote:James Dennett wrote:Any time you need a difference between "specified, and known to be empty" and "unspecified or unknown", which is very common. The alternative is to carry a boolean around to say whether the string is in use. Others have raised the case of null meaning "use default" (but let's not spend too much time on that specific case), and the fact that the database world often (though not always) distinguishes null from empty. Many people have found good reason to do this. The "Maybe" or "Fallible" type constructors used in other languages also cover cases where "absent" can usefully be handled separately from "empty" (in more general cases than just strings). -- JamesI've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 06 2007
Walter Bright wrote:James Dennett wrote:I used to this pattern: void foo(char[] bar=null) { if (bar is null) m_bar = "default_value"; else m_bar = bar; // even if it's empty } often as one-liner: m_bar = (bar is null) ? "default_value" : bar; This is most used one (at least by me), but of course there are more. -- serg.I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all. I'd be interested to know why you assert that no such cases exist.I'd like to know of such cases.
Jul 07 2007
On Thu, 05 Jul 2007 00:42:25 -0700, Walter Bright wrote:Derek Parnell wrote:There is no issue. I'm not raising an issue. I'm just making some observations about my exerience so far in moving to V2. I'm not surprised by the effort that I'm having. I expected it. Why? Because I knew that most of the strings I work with are text (mutable things) and by using the D 'string', an immutable thing, for function signatures was going to mean I'd have to changes things to suit. I choose to use 'string' it safe guard myself from making stupid errors in coding. And its working. My next pass through the application code will be to find places where I can safely return a 'text' thing instead of a 'string' thing, which is a performance turning exercise.The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup;If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.Not really. I want an empty text and not a non-text. Also, it doesn't fit right with other data types - the consistency thing again. text = typeof(text).init; works better for me because I can also use this construct in templates without problems. But really, this thread can die now. I didn't mean to go off into weird tangental subects. -- Derek Parnell Melbourne, Australia skype: derek.j.parnellAnother commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable.This should do it nicely: text = null;
Jul 05 2007
Derek Parnell wrote:Such a distinction is critical in C code, but is not of much use in D code. What do you need the distinction for?This should do it nicely: text = null;Not really. I want an empty text and not a non-text.Also, it doesn't fit right with other data types - the consistency thing again. text = typeof(text).init; works better for me because I can also use this construct in templates without problems.The .init for char[] is null, not "".But really, this thread can die now. I didn't mean to go off into weird tangental subects.I think you've raised a couple of very important stylistic issues, and it is worth pursuing.
Jul 05 2007
Derek Parnell Wrote:On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:Yep, makes sense.Derek Parnell wrote:But I'm not, and never have been, returning string literals anywhere.I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable. I might have to rethink of the design of the application to avoid the performance hit of all these dups.First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).Second, you can use char[] instead of string.The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own).Question; Do these functions keep a copy of the returned string? Or, to re-phrase, after returning the string do they still 'own' it, or have they washed their hands of it? Are they in a sense passing ownership to the calling function perhaps? If they no longer 'own' the string then they can return it as a char[] instead of string and all your problems are solved, right? I imagine that if they return a slice of the input string, and that string was 'string' not char[] then they would also return string (because doing otherwise would be claiming ownership of the input string and giving it away to the caller, which may not be valid) Maybe you have a lot of functions returning slices to the input string? Maybe you need to template them? i.e. T function(T)(T param) { } so if you pass string you get string, if you pass char[] you get char[]. Maybe all string routines which return slices of the input should be so templated? Regan
Jul 05 2007
Derek Parnell wrote:On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote: The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own). This leads to constructs like ... char[] result; result = SomeTextFunc(data).dup; Another commonly used idiom that I had to stop using was ... char[] text; text = getvalue(); if (wrongvalue(text)) text = ""; // Reset to an empty string I now code ... text.length = 0; // Reset to an empty string which is slightly less readable.Why is 'text.length = 0;' or 'text = text.init;' better than the idiom: str = "".dup; , which also works for any kind of string, not just empty strings? I found however, that there is a bug with that code: http://d.puremagic.com/issues/show_bug.cgi?id=1314 -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
Derek Parnell wrote:I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.So just use char[] instead of 'string'. I don't plan to use the aliases much either. Sean
Jul 05 2007
On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:Derek Parnell wrote:It's not so clear cut. Firstly, a lot of phobos routines now return 'string' results and expect 'string' inputs. Secondly, I like the idea of general purpose functions returning 'const' data, because it helps guard against inadvertent modifications by the calling routines. It is up to the calling function to explicitly decide if it is going to modify returned stuff or not. For example, if I know that I'll not need to modify the 'fullpath' then I might do this ... string fullpath; fullpath = CanonicalPath(shortname); However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller. -- Derek (skype: derek.j.parnell) Melbourne, Australia 5/07/2007 5:17:33 PMI'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.So just use char[] instead of 'string'. I don't plan to use the aliases much either.
Jul 05 2007
Derek Parnell wrote:However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .
Jul 05 2007
Walter Bright Wrote:Derek Parnell wrote:Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back. However... tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it). To achieve that we'd need to overload on return type, or something clever... but then, how do we call it? auto s = tolower(input); tolower cannot be selected at compile time, and the type of s cannot be known either, so that's an impossible situation, yes? ReganHowever, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .
Jul 05 2007
Regan Heath wrote:Walter Bright Wrote:Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable). The second case, more specific, is simply syntactic sugar for making that array invariant: invariant(char)[] tolowerinv(const(char)[] str) { return cast(invariant) tolower(str); } The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].Derek Parnell wrote:Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back. However... tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it).However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .To achieve that we'd need to overload on return type, or something clever... but then, how do we call it? auto s = tolower(input); tolower cannot be selected at compile time, and the type of s cannot be known either, so that's an impossible situation, yes? ReganThe 'something clever' to distinguish both cases is simply naming two different functions, like tolower or tolowerinv (if the second function is needed at all). -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 05 2007
Bruno Medeiros wrote:Regan Heath wrote:Sorry, but you seem to have missed a bit above: if the string doesn't contain any uppercase characters tolower returns the input without .dup-ing it (aka copy-on-write).tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it).Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable).The second case, more specific, is simply syntactic sugar for making that array invariant: invariant(char)[] tolowerinv(const(char)[] str) { return cast(invariant) tolower(str); }Yes, but only if it actually needs to modify the string. You seem to have missed that the two cases can't (in general) be distinguished at compile time; it's only at run time when a choice is made between a copy and no copy.The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.
Jul 05 2007
Frits van Bommel wrote:Bruno Medeiros wrote:Oops, sorry, that's right, I missed that part about tolower not modifying the string if it wasn't necessary. :(Regan Heath wrote:Sorry, but you seem to have missed a bit above: if the string doesn't contain any uppercase characters tolower returns the input without ..dup-ing it (aka copy-on-write).tolower is an interesting case. As a caller I expect it to modify the string, or perhaps give a modified copy back (both options are valid and should perhaps be supported?). So, the 'string tolower(string)' version has 2 cases, the first case where it doesn't need to modify the input and can simply return it, no problem. But case 2, where it does modify it should dup and return char[]. My reasoning being that after it has completed and returned the copy, the caller now 'owns' the string (as it's the only copy in existance and no-one else has a reference to it).Indeed, I think this illustrates that some standard library functions may not have the correct signature, and I tolower is likely one of them. The most general case for tolower is: char[] tolower(const(char)[] s); Since tolower creates a new array, but does not keep it, it can give away it's ownership of the the array (ie, return a mutable).You're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup); -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DThe second case, more specific, is simply syntactic sugar for making that array invariant: invariant(char)[] tolowerinv(const(char)[] str) { return cast(invariant) tolower(str); }Yes, but only if it actually needs to modify the string. You seem to have missed that the two cases can't (in general) be distinguished at compile time; it's only at run time when a choice is made between a copy and no copy.The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.
Jul 05 2007
Bruno Medeiros wrote:True.. but it's unfortunate that the most efficient case, where no duplication is needed, is no longer possible :( If we template the function, eg. T tolower(T)(T input) { } and we have some way to check whether the input is const or not (at runtime is(string) or something?) perhaps we can code the existing efficient solution (no dup of const data) as well as the general case where it mutates. In the mutate case it can dup if the input is const and not dup if it isn't (adding an efficient solution which doesn't currently exist). The only problem is that the case where you pass const data and it has to dup, you get back a const reference to a piece of data with no other owner (meaning it doesn't need to be const) which might cause another dup in your code at a later point. ReganYou're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup);The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.
Jul 06 2007
Regan Heath wrote:Bruno Medeiros wrote:Algoritms should care about worst-case performance, or average-case performance. That most efficient "case", where a string is already tolower, is a minority case in most applications, and is never a worst-case scenario. So why bother? Also, doing this tolower like that would give other performance problems like these:True.. but it's unfortunate that the most efficient case, where no duplication is needed, is no longer possible :(You're right, if a copy is not made *every* time (which is the case after all), then the above doesn't hold. But then, what I think is happening is that Phobo's current tolower is suboptimal in terms of usefulness, because the fact that we don't know if a new copy is made or not. I'm wondering now what would be the more useful form, or forms, of tolower (and similar functions) to have. Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version: char[] tolower(char[] str); And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call: char[] str = tolower("FOO".dup);The current signature: const(char)[] tolower(const(char)[] str) is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.The only problem is that the case where you pass const data and it has to dup, you get back a const reference to a piece of data with no other owner (meaning it doesn't need to be const) which might cause another dup in your code at a later point. ReganIndeed, with such scenario, you would end up with worse performance overall. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 07 2007
Bruno Medeiros wrote:The 'something clever' to distinguish both cases is simply naming two different functions, like tolower or tolowerinv (if the second function is needed at all).I was hoping for something clever'er ;) Regan
Jul 05 2007
Regan Heath wrote:Walter Bright Wrote:It doesn't make sense to template it, because you'd still have two different function versions, that would work differently. The one that receives a string does a dup, the one that receives a char[] does not dup. The return type of tolower(string str) might also be char[] and not string, if tolower(string str) would allways does a dup, even if no character modifications are necessary. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DDerek Parnell wrote:Because tolower does it for you, but it still returns string and if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere. I think the solution may be to template all functions which return the input string, or part of the input string, eg. T tolower(T)(T input) { } That way if you call it with char[] you get a char[] back, if you call it with string you get a string back.However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .
Jul 05 2007
Bruno Medeiros wrote:It doesn't make sense to template it, because you'd still have two different function versions, that would work differently. The one that receives a string does a dup, the one that receives a char[] does not dup. The return type of tolower(string str) might also be char[] and not string, if tolower(string str) would allways does a dup, even if no character modifications are necessary.If the template is T tolower(T)(T input) {} then you have string tolower(string input) {} char[] tolower(char[] input) {} and you cases are: 1. input string, output same string (no dup) 2. input string, output string (dup) 3. input char[], output same char[] (no dup) call to dup. I think the above is better than the current implementation as it avoids Regan
Jul 06 2007
Regan Heath wrote:Walter Bright Wrote:tolower only dups the string if it needs to. It won't dup a string that is already in lower case.string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .Because tolower does it for you, but it still returns stringand if for example you need to add something to the end of the path, like a filename you will end up doing yet another dup somewhere.Concatenating strings does not require a .dup.
Jul 05 2007
Walter Bright wrote:Regan Heath wrote:Walter Bright Wrote:tolower only dups the string if it needs to. It won't dup a string that is already in lower case.string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .Because tolower does it for you, but it still returns string> and if for example > you need to add something to the end of the path, like a filename you > will end up > doing yet another dup somewhere. Concatenating strings does not require a .dup.OR newString = constString ~ bitToAdd; (is a copy of constString to So, the worst case scenario is that 2 dups are done. Further if the input is char[] you can still get this worst case scenario because tolower returns string instead of char[]. With a templated version you get a much more efficient tolower for char[]. Regan
Jul 06 2007
Proof of concept. Only duplicate when the input is 'string' allowing for more efficient handling of char[] parameters and allowing callers to pass mutable char[] parameter, recieve the result as a mutable char[] and avoid future dup calls on the returned data. Output: sStringM: 0x 416080 becomes 0x 880FD0 DUP sCharM : 0x 880FE0 becomes 0x 880FE0 SAME sString : 0x 416110 becomes 0x 416110 SAME sChar : 0x 880FC0 becomes 0x 880FC0 SAME Code: rStringM.ptr, (sStringM.ptr!=rStringM.ptr)?"DUP":"SAME"); rCharM.ptr, (sCharM.ptr!=rCharM.ptr)?"DUP":"SAME"); rString.ptr, (sString.ptr!=rString.ptr)?"DUP":"SAME"); (sChar.ptr!=rChar.ptr)?"DUP":"SAME");
Jul 06 2007
On Thu, 05 Jul 2007 01:06:45 -0700, Walter Bright wrote:Derek Parnell wrote:If you have any failing Walter, its your ability to focus on insignifacnt minutia as a form of distraction from the point that people are really trying to make. I was not talking about how to do efficient lower case conversion. I'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case. -- Derek Parnell Melbourne, Australia skype: derek.j.parnellHowever, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.If you write it like this: string fullpath; fullpath = CanonicalPath(shortname); version(Windows) { fullpath = std.string.tolower(fullpath); } you won't need to do the .dup .
Jul 05 2007
Derek Parnell wrote:I'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case.What you are doing there is mixing two styles of functions. Functional (KJHGF) and in-place modifying functions (MNBVC). Walter's modification was making both use a common style (functional). Mixing those two function styles will naturally require different types of constness. -- Oskar
Jul 05 2007
Derek Parnell wrote:I'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case.My point is that the way the snippet is written is inside out. Do not use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.
Jul 05 2007
On Thu, 05 Jul 2007 11:51:30 -0700, Walter Bright wrote:Derek Parnell wrote:Thanks. This is what I meant by taking rethinking the design of my routines. I'll strongly consider your suggestion even though it does complicate the algorirhm for readers of the code. -- Derek Parnell Melbourne, Australia skype: derek.j.parnellI'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case.My point is that the way the snippet is written is inside out. Do not use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.
Jul 05 2007
Reply to Walter,Derek Parnell wrote:The one issue I can see with this is where an input is const but may be changed (and .duped) at any of a number of points. The data though only needs to be .duped once. |char[] Whatever(const char[] str) |{ | if(c1) str = Mod1(str.dup); | if(c2) str = Mod2(str.dup); | if(c3) str = Mod3(str.dup); | return str; |} // causes exces duping I can't think of a better solution than this (and this is BAD): |char[] Whatever(const char[] str) |{ | sw: switch(-1) | { | foreach(bool b; T!(true, false)) | { | if(c1) {static if(b){str = str.dup; goto case 1;} else {case 1: str = Mod1(str.dup);}} | if(c2) {static if(b){str = str.dup; goto case 2;} else {case 2: str = Mod2(str.dup);}} | if(c3) {static if(b){str = str.dup; goto case 3;} else {case 3: str = Mod3(str.dup);}} | return str; | } | } |}I'll make my code example more free from assumed functionality. char[] qwerty; qwerty = KJHGF(poiuy).dup; version(xyzzy) { MNBVC(qwerty); } As you can see, my point is made without regard to converting stuff to lower case.My point is that the way the snippet is written is inside out. Do not use .dup to preemptively make a copy in case it gets changed somewhere later one. The style is to make a .dup *only if* the contents will be changed and do the .dup *at the site* of the modification. In other words, dups should be done from the bottom up, not from the top down. I think such a style helps fit things together nicely and avoids strange .dups appearing in inexplicable places.
Jul 05 2007
BCS wrote:The one issue I can see with this is where an input is const but may be changed (and .duped) at any of a number of points. The data though only needs to be .duped once. |char[] Whatever(const char[] str) |{ | if(c1) str = Mod1(str.dup); | if(c2) str = Mod2(str.dup); | if(c3) str = Mod3(str.dup); | return str; |} // causes exces dupingMy experience with this is: 1) Such cases are unusual 2) The few cases where they do happen, they are not in that 5% of the code that is a bottleneck 3) If such code is performance critical, there's usually a better way to write it that will yield even better performance than taking repeated passes over the same string. Best performance usually comes by merging all the operations into one pass.
Jul 05 2007
Derek Parnell wrote:On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:I'd argue that the parameters should be "const char[]" rather than "string", and it's hard to say for the return values.Derek Parnell wrote:It's not so clear cut. Firstly, a lot of phobos routines now return 'string' results and expect 'string' inputs.I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.So just use char[] instead of 'string'. I don't plan to use the aliases much either.Secondly, I like the idea of general purpose functions returning 'const' data, because it helps guard against inadvertent modifications by the calling routines. It is up to the calling function to explicitly decide if it is going to modify returned stuff or not. For example, if I know that I'll not need to modify the 'fullpath' then I might do this ... string fullpath; fullpath = CanonicalPath(shortname);I would say that whether the return value is const/invariant indicates ownership. If the called function/class owns the data then it is const or invariant. If it does not then it is not const/invariant. This seems to largely limit "string" as a return value to property methods.However, if I might need to update it ... char[] fullpath; fullpath = CanonicalPath(shortname).dup; version(Windows) { setLowerCase(fullpath); } The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.Right. See above. Sean
Jul 05 2007
On Thu, 05 Jul 2007 01:18:28 +0300, Derek Parnell <derek psych.ward> wro= te:I'm converting Bud to compile using V2 and so far its been a very hard=thing to do. I'm finding that I'm now having to use '.dup' and '.idup'==all over the place, which is exactly what I thought would happen. Bud does=alot of text manipulation so having 'string' as invariant means that ca=llsto functions that return string need to often be .dup'ed because I nee=d =to assign the result to a malleable variable. I might have to rethink of the design of the application to avoid the performance hit of all these dups.That got me thinking about string functions in general. First, I am wondering why some functions are formed as follows: (but I'm sure someone will (hopefully) enlight me about that ;) ) string foo(string bar); That is, if they return something else than 'bar' (they do some string = manipulation). Shouldn't they return char[] instead? For example: char[] foo(string bar) { return bar ~ "blah"; } And this brings us to the 'tolower()' function (for instance). Sometimes it .dups and sometimes it doesn't. So, if I don't know if the = = input string contains upper cased chars, I have to .dup the return value, even if it = = may already been .dupped by 'tolower()'... char[] a =3D "abc".dup; char[] b =3D tolower(a).dub; //.dupped once ('tolower()' returns pla= in = 'a') char[] a =3D "ABC".dup; char[] b =3D tolower(a).dub; //.dupped twice! So 'tolower()' is a hybrid of two function groups: (1) functions that modify the input string, (2) functions that returns a (modified) copy of the input string. (If the input string doesn't contains upper cased chars it behaves like = (1) (even if it doesn't actually modify the input string), otherwise it = behaves like (2).) I don't think this is a good thing. There should be two different functions, one for each group: char[] tolower(char[] str); //modifies and returns 'str' char[] getlower(string str); //returns a copy If one likes the copy-on-write behaviour of 'tolower(), I think it would= work only by using reference counting. For example (the 'String' class uses reference counting): String a, b; a =3D "abc"; b =3D tolower(a); //'b' points to 'a' ('tolower()' simply returns 'a= ') b[0] =3D 'x'; //'b' .dups its contents before modification, so 'a' i= s not = changed
Jul 05 2007
Kristian Kilpi wrote:First, I am wondering why some functions are formed as follows: (but I'm sure someone will (hopefully) enlight me about that ;) ) string foo(string bar); That is, if they return something else than 'bar' (they do some string manipulation). Shouldn't they return char[] instead?No, because then they must always dup the string. If they don't need to dup the string, they can return a reference to the parameter, and if so, it must be const.There should be two different functions, one for each group: char[] tolower(char[] str); //modifies and returns 'str' char[] getlower(string str); //returns a copyWhen one would use a mutating tolower, one is already manipulating the contents of a string character by character. In such cases, one can tolower the characters in that process, instead of doing it later (the former will be more efficient anyway, and the only advantage to a mutating tolower is an efficiency improvement). Using the functional-style copy-on-write string functions will result in easy to understand, less buggy programs. Doing strings in this manner is a proven success in just about every programming language.
Jul 05 2007
On Thu, 05 Jul 2007 22:11:37 +0300, Walter Bright = <newshound1 digitalmars.com> wrote:Kristian Kilpi wrote:ng =First, I am wondering why some functions are formed as follows: (but I'm sure someone will (hopefully) enlight me about that ;) ) string foo(string bar); That is, if they return something else than 'bar' (they do some stri=o =manipulation). Shouldn't they return char[] instead?No, because then they must always dup the string. If they don't need t=dup the string, they can return a reference to the parameter, and if s=o, =it must be const.=There should be two different functions, one for each group: char[] tolower(char[] str); //modifies and returns 'str' char[] getlower(string str); //returns a copyWhen one would use a mutating tolower, one is already manipulating the=contents of a string character by character. In such cases, one can =tolower the characters in that process, instead of doing it later (the==former will be more efficient anyway, and the only advantage to a =mutating tolower is an efficiency improvement).That makes sense (especially with strings). Of course, as said, it's not a perfect solution because unnecessary .dupping can occur. For example: s =3D "blah " ~ foo(tolower(str).dup); 'foo()' modifies its input string and returns it. If 'foo' would be a copy-on-write function, you could just do: s =3D "blah " ~ foo(tolower(str)); That's much nicer, but 'str' could be copied twice in both the cases abo= ve. If both 'foo()' and 'tolower()' would modify 'str', no copying had been done (by these functions). Well, it's just how you like to code and build things. Both the ways have their own pros and cons.
Jul 06 2007