digitalmars.D - Newbie Question about strings
- hellcatv hotmail.com (17/17) May 10 2004 does the following result in undefined behavior (as if I had realloc'd t...
- Ben Hinkle (6/23) May 10 2004 The "~=" operator will reallocate if there isn't space already. That is ...
- Sean Kelly (4/7) May 10 2004 But D is a GC language. Would there even be a dangling reference in
- Ben Hinkle (7/13) May 10 2004 why
- Daniel Horn (12/37) May 10 2004 right the docs say "you" but I wasn't sure if it means I must do it or
- Ben Hinkle (16/28) May 10 2004 copy-on-write is not enforced by the compiler but it is a technique used...
- hellcatv hotmail.com (19/47) May 10 2004 you have some good points
- Norbert Nemec (13/24) May 10 2004 You should always assume that it may do it, unless it is explicitely
- Walter (7/13) May 11 2004 COW coupled with gc enables D string processing programs to smoke C++
- Sean Kelly (5/11) May 10 2004 COW is great in many cases but it can be a nightmare with multithreaded
- Norbert Nemec (7/10) May 10 2004 Why is that? If you know, that no other part of the program may have a
- Sean Kelly (7/16) May 10 2004 By GC I meant that the string is effectively passed by reference, so a
does the following result in undefined behavior (as if I had realloc'd the char * inp in C?) i.e. could the inp[0]='A' also affect the char[] s; import std.string; void mod (char [] inp) { inp~="8"; inp[0]='A'; printf ("\n%s ",std.string.toStringz(inp)); printf ("%d ",inp.length); } int main () { char [] s = "1234567"; printf ("%s\n",std.string.toStringz(s)); mod(s); printf ("%s\n",std.string.toStringz(s)); return 0; }
May 10 2004
The "~=" operator will reallocate if there isn't space already. That is why the std.string uses "copy-on-write" semantics - meaning if you don't "own" an array you make a copy before changing it. <hellcatv hotmail.com> wrote in message news:c7of0m$c15$1 digitaldaemon.com...does the following result in undefined behavior (as if I had realloc'd thechar* inp in C?) i.e. could the inp[0]='A' also affect the char[] s; import std.string; void mod (char [] inp) { inp~="8"; inp[0]='A'; printf ("\n%s ",std.string.toStringz(inp)); printf ("%d ",inp.length); } int main () { char [] s = "1234567"; printf ("%s\n",std.string.toStringz(s)); mod(s); printf ("%s\n",std.string.toStringz(s)); return 0; }
May 10 2004
Ben Hinkle wrote:The "~=" operator will reallocate if there isn't space already. That is why the std.string uses "copy-on-write" semantics - meaning if you don't "own" an array you make a copy before changing it.But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect. Sean
May 10 2004
"Sean Kelly" <sean f4.ca> wrote in message news:c7osgt$10f4$1 digitaldaemon.com...Ben Hinkle wrote:whyThe "~=" operator will reallocate if there isn't space already. That is"own"the std.string uses "copy-on-write" semantics - meaning if you don'tumm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.an array you make a copy before changing it.But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect.
May 10 2004
right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write. so I must specifically make a copy of it in order to guarantee that my function will not result in side effects? could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)? in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function. is there anything similar in D for char[] arrays? Ben Hinkle wrote:"Sean Kelly" <sean f4.ca> wrote in message news:c7osgt$10f4$1 digitaldaemon.com...Ben Hinkle wrote:whyThe "~=" operator will reallocate if there isn't space already. That is"own"the std.string uses "copy-on-write" semantics - meaning if you don'tumm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.an array you make a copy before changing it.But D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect.
May 10 2004
Daniel Horn wrote:right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write.copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.so I must specifically make a copy of it in order to guarantee that my function will not result in side effects?If you write a statement like str[3] = 'a'; then you should think about using COW. If you want to guarantee your function has no side effect then you should make a copy. If you write str = tolower(str); in your function then you don't have to make a copy since tolower uses COW already and it will make a copy if it needs to.could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)? in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function. is there anything similar in D for char[] arrays?I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.
May 10 2004
you have some good points but what is a good way for a new person (or someone reading someone else's code) to know if the function is exhibiting copy on write if C++ had the same feature for strings then I would assume that a const string would not be modified and a non const string would be... can I assume all phobos-related string functions that need to perform copy on write then? it's a potential pitfall for new programmers to have the opCat function ~= sometimes copy on write yet the tolower function copies on write perhaps this just needs to be mentioned carefully in the documentation...preferably in a consistent manner I also noticed that char [] blah="1234567"; char [] bleh=blah; bleh~=""; bleh[0]='A'; blah[0] is still '1' perhaps ~= also guarantees copy-on-write semantics? :-) that would make phobos a quite consistent library then In article <c7pat1$1kk4$1 digitaldaemon.com>, Ben Hinkle says...Daniel Horn wrote:right the docs say "you" but I wasn't sure if it means I must do it or by modifying it, the lib does a copy-on-write.copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.so I must specifically make a copy of it in order to guarantee that my function will not result in side effects?If you write a statement like str[3] = 'a'; then you should think about using COW. If you want to guarantee your function has no side effect then you should make a copy. If you write str = tolower(str); in your function then you don't have to make a copy since tolower uses COW already and it will make a copy if it needs to.could I make a wrapper struct that guaranteed it would copy when passed into a function (like C++ strings)? in C I could wrap a static array into a struct in order to get pass-by-value semantics (of course the size of this array was known) and in C++ I could make a copy constructor that would implicitly get called when I passed the string into a wrapper function. is there anything similar in D for char[] arrays?I suppose you could wrap the array in a struct that overloads opIndex assignment and do something funky, but I haven't really thought about it. Seems like a lot of trouble to avoid using strings.
May 10 2004
hellcatv hotmail.com wrote:you have some good points but what is a good way for a new person (or someone reading someone else's code) to know if the function is exhibiting copy on writeYou should always assume that it may do it, unless it is explicitely documented as dowing something "in place". You should be careful to expect that a routine guarantees to do a copy. Like the tolower example (as I understand from Ben's post): it will make a copy if there were any uppercase letters in the original. Otherwise, there is no reason to do so, and it will just return a reference to the original string. If you want to make sure to have a unique copy, you have to call .dup yourself. I hope, the compiler is intelligent enough to detect and drop unnecessary .dupsI also noticed that char [] blah="1234567"; char [] bleh=blah; bleh~=""; bleh[0]='A'; blah[0] is still '1'perhaps ~= also guarantees copy-on-write semantics? :-) that would make phobos a quite consistent library thenThe implementation is likely to do so, but currently, the language spec does not guarantee it. Actually, in your example, bleh~="" might be optimized away completely. (At least, that's how I understand the specs.)
May 10 2004
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c7psqp$2f7r$1 digitaldaemon.com...You should always assume that it may do it, unless it is explicitely documented as dowing something "in place". You should be careful to expect that a routine guarantees to do a copy. Like the tolower example (as I understand from Ben's post): it will make a copy if there were any uppercase letters in the original. Otherwise, there is no reason to do so, and it will just return a reference to the original string.COW coupled with gc enables D string processing programs to smoke C++ std.string ones in the performance department. The combination of the two enables one to do things like mix slices, static data, and gc'd data without worrying about which is which. C++ std.string has to worry, and the implementations I've looked at resolve the problem by always copying.
May 11 2004
Ben Hinkle wrote:copy-on-write is not enforced by the compiler but it is a technique used by std.string (and probably the rest of phobos). If you look at std.string.tolower, for example, you will see how it delays making a copy until it absolutely has to. I'm not sure which part of the doc you are looking at.COW is great in many cases but it can be a nightmare with multithreaded programming. It almost makes me wish that we could specify the behavior with a template parameter. Sean
May 10 2004
Sean Kelly wrote:COW is great in many cases but it can be a nightmare with multithreaded programming. It almost makes me wish that we could specify the behavior with a template parameter.Why is that? If you know, that no other part of the program may have a reference to some string, then you may write to it. Otherwise, you just have to copy the string first. I see no difference whether "other part" is a local variable in the same routine or some part in another thread. Of course, if the reference itself is shared between threads, you have to lock it before writing anything, but that is the same with any variable.
May 10 2004
Ben Hinkle wrote:"Sean Kelly" <sean f4.ca> wrote in message news:c7osgt$10f4$1 digitaldaemon.com...By GC I meant that the string is effectively passed by reference, so a reallocation would not leave the passed variable pointing to bad memory as may happen in C using pointers. I just wanted to clarify the semantics that the result is not "undefined" but rather merely that the function has a side-effect. SeanBut D is a GC language. Would there even be a dangling reference in this case? I assumed that this would just result in a side-effect.umm, I'm not sure what the GC has to do with it, but yeah, the GC will collect the copy if all the references go away. COW is to prevent side-effects.
May 10 2004