digitalmars.D - Const, invariant, strings and "constant data need never be copied"

Stewart Gordon (133/133) Nov 01 2007 In DMD 2.006, the definition of string was changed from const(char)[] to...

Nathan Reed (29/50) Nov 02 2007 As I've remarked in another thread, it makes absolutely no sense to me

Stewart Gordon (14/50) Nov 02 2007 "Nathan Reed" wrote in message

Janice Caron (10/33) Nov 03 2007 I agree with Stewart here. (Actually, reading what's been read, I

Janice Caron (38/67) Nov 02 2007 That's not true. If /all/ strings are invariant, throughout, then

Stewart Gordon (31/89) Nov 02 2007 Because your D2 code doesn't manipulate strings?

Janice Caron (34/50) Nov 03 2007 No, because I declared all my strings as string or wstring, not char[]

Stewart Gordon (17/52) Nov 03 2007

Janice Caron (17/23) Nov 03 2007 Well, it is now. :-)

Walter Bright (13/20) Nov 02 2007 Well, no. Take a look at the source to std.string.replace(). It does not...

Stewart Gordon (16/38) Nov 03 2007 That's basically what I said.

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

In DMD 2.006, the definition of string was changed from const(char)[] to 
invariant(char)[] (and similarly wstring and dstring).  This change has no 
doubt broken a fair amount of D 2.x code.  While at it, I've found that the 
functions in std.string use a mix of char[] and string, but only one uses 
const(char)[].

It's worth thinking about what's actually best, from both coding and runtime 
efficiency POVs.

Let's first look at string manipulation functions such as those in 
std.string.  These merely look at a passed-in string and return something - 
they don't keep the string for later.  They therefore need only a read-only 
view of a string - they don't need to know that the string is never going to 
change.  Declaring these with invariant parameters therefore means that it 
is often necessary to .idup a string just to pass it to one of these 
functions.  Moreover, if a piece of code manipulates strings with a mixture 
of direct modification and calls to std.string functions, it necessitates 
quite a bit of copying of strings.

Consider, for example, a program that reads in a text file, normalises the 
line breaks and then ROT13 encodes the result.  Under D 1.x, this works:

----------
import std.file, std.string, std.cstream;

void main(string[] a) {
    char[] text = cast(char[]) read(a[1]);

    text = text.replace("\r\n", "\n").replace("\r", "\n");

    foreach (ref char c; text) {
        if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
            c += 13;
        } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
            c -= 13;
        }
    }

    dout.writeString(text);
}
----------

Note that the text is never copied after it is read in, except by 
std.string.replace if it actually makes any change.  In D 2.x, it's 
necessary to change one line, to something like

    text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;

which consequently adds two copy operations.

There are a few caveats to this example:
- the .idup is only because std.file.read currently returns a mutable 
void[] - we could actually cast it to an invariant as nothing else is going 
to use it
- there's no significant reason to normalise the line breaks before, rather 
than after, ROT13ing it
- if all we're going to do is output it, we could output on the fly rather 
than trying to modify the text in memory

but these won't be true in the more general case.  There are probably plenty 
of more involved examples in which there's more difference than this between 
the 1.x and 2.x code.

A fairly recent discussion
http://tinyurl.com/2kgpqg
touched on the question of whether functions that generate a string and 
return it should return mutable or immutable references.  And the discussion 
was by no means conclusive.

If a public library function returns a mutable array reference, it can mean 
either:
(a) it is giving up ownership of the memory the array occupies
(b) it is giving the caller direct access to data it holds for some purpose 
(std.mmfile is an example of this)

In case (a), if there's no risk of there being other references to the same 
memory (as is the case if the function always allocates it) then it would 
make sense to give the caller the choice of whether it should be mutable. 
Indeed, the choice is already there, but in no way that protects against 
inadvertently trying it on something of case (b).

Unfortunately, constness doesn't play well with copy-on-write.  Needless to 
reiterate, std.string's functions want at least a read-only view of the 
array.  But the caller might still want what's returned to be mutable.  If 
the function is going to return the passed-in string, it can only (sensibly) 
return a read-only view since that's what it received.  While the caller 
could try the D&D trick of casting away the constness, there is a risk of 
catastrophic failure if some std.string implementation caches strings for 
reuse.

But invariant clearly has some use.  It enables the claim that "constant 
data need never be copied" to work, as long as invariant is used well.  So 
if something receives a string from the caller and wants to save it for 
later use/retrieval, declaring the parameter as invariant will mean that the 
callee won't need to copy the data.  So there's the benefit that, by making 
it the caller's responsibility to .idup the data if necessary, it'll save 
the overhead of unnecessary copying.  Similarly, when something later wants 
to retrieve the data, if it is invariant then there's no need to copy it.  A 
library and an application that uses it can share one copy of the data, and 
believe that the data will never change.


To round things up, the D 2.x const/final/invariant thing is certainly 
useful, but not perfect.  The different storage classes/type modifiers are 
good for different things.  It might be worth a good think about how Phobos 
uses them (or in some cases doesn't), and what is best practically for the 
definitions of string/wstring/dstring.  (Was there a discussion I missed?)

Of course, it would also be worth a good think about whether anything can be 
added or changed in the language to improve matters.  Ideas that come to my 
mind are:

1. A property .invariant, which just returns the reference as an invariant 
if it's already invariant, otherwise does the same as .idup.  For this to 
work, the runtime would have to keep a record of whether each piece of 
allocated memory is invariant, which would interfere with the current 
ability to cast invariance in or out - but at what cost?

2. Some type modifier such as 'unique' that would indicate that only one 
reference to the data exists.  I'm not sure what rules there should be to 
enforce this, or if we should just go on trust.  But it would be implicitly 
convertible to mutable, const or invariant, enabling something like
----------
unique(int)[] rep(int i) {
    unique(int)[] result;
    result.length = i % 10;
    result[] = i;
    return result;
}

int[] twos = rep(2);
const(int) fives = rep(5);
invariant(int)[] twelves = rep(12);
----------

3. Some concept of const-transparent functions.  One approach would be to 
enable a type modifier (or the lack thereof) to be used as a template 
parameter, with something like
----------
T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
    ...
}

char[] str;
const(char)[] cstr;
invariant(char)[] istr;

str = doSomethingWith(str);
cstr = doSomethingWith(cstr);
istr = doSomethingWith(istr);
----------

This would enable copy on write to work well.  As long as nothing _within_ 
the function so templated relies on the distinction, the compiler could 
optimise by generating only one instance, since it affects only compile-time 
type checking and not code generation.


Comments?

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies
on the 'group where everybody may benefit.

Nov 01 2007

Nathan Reed <nathaniel.reed gmail.com> writes:

Stewart Gordon wrote:
 Let's first look at string manipulation functions such as those in 
 std.string.  These merely look at a passed-in string and return 
 something - they don't keep the string for later.  They therefore need 
 only a read-only view of a string - they don't need to know that the 
 string is never going to change.  Declaring these with invariant 
 parameters therefore means that it is often necessary to .idup a string 
 just to pass it to one of these functions.  Moreover, if a piece of code 
 manipulates strings with a mixture of direct modification and calls to 
 std.string functions, it necessitates quite a bit of copying of strings.

As I've remarked in another thread, it makes absolutely no sense to me 
to use invariant for the library string functions, for exactly this 
reason.  They used to be const, which enables them to work on any kind 
of string the user might want to call them on (mutable, const, or 
invariant).

 1. A property .invariant, which just returns the reference as an 
 invariant if it's already invariant, otherwise does the same as .idup.  
 For this to work, the runtime would have to keep a record of whether 
 each piece of allocated memory is invariant, which would interfere with 
 the current ability to cast invariance in or out - but at what cost?

Actually, this wouldn't need to have any runtime consequences, as the 
invariantness-or-not of a thing is part of its static typing, so could 
be determined at compile time.  Of course, something can be invariant 
even if it's not typed as invariant (and undecidably so), but do we 
really need to worry about that?  The .invariant property could simply 
return the array if the array is typed as invariant, and return .idup 
otherwise.

 3. Some concept of const-transparent functions.  One approach would be 
 to enable a type modifier (or the lack thereof) to be used as a template 
 parameter, with something like
 ----------
 T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
    ...
 }

A proposal for doing something very much like this is already planned 
for D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple 
months back).  It's called the 'return' storage class, which would make 
the return value of a function take on the same constness or 
invariantness as a parameter:

const(char)[] doSomethingWith(return const (char)[] param) {
     ...
}

What this does is makes the constness of the return value the same as 
the constness of the argument passed to 'param', each time the function 
is called.  You can do something similar with templates already, but you 
  can't make template functions virtual.  The function is type-checked 
with the declared types for the parameter and return (in this case, 
const(char)[]).

Thanks,
Nathan Reed

Nov 02 2007

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

"Nathan Reed" <nathaniel.reed gmail.com> wrote in message 
news:fgg8hs$hon$1 digitalmars.com...
<snip>
 As I've remarked in another thread, it makes absolutely no sense to me to 
 use invariant for the library string functions, for exactly this reason. 
 They used to be const, which enables them to work on any kind of string 
 the user might want to call them on (mutable, const, or invariant).

Exactly what I was thinking.

 1. A property .invariant, which just returns the reference as an 
 invariant if it's already invariant, otherwise does the same as .idup. 
 For this to work, the runtime would have to keep a record of whether each 
 piece of allocated memory is invariant, which would interfere with the 
 current ability to cast invariance in or out - but at what cost?

 Actually, this wouldn't need to have any runtime consequences, as the 
 invariantness-or-not of a thing is part of its static typing, so could be 
 determined at compile time.  Of course, something can be invariant even if 
 it's not typed as invariant (and undecidably so), but do we really need to 
 worry about that?  The .invariant property could simply return the array 
 if the array is typed as invariant, and return .idup otherwise.

I was thinking about the possibility of .invariant being able to detect 
whether a pointer or array reference typed as const refers to data that was 
created as invariant or not.

 3. Some concept of const-transparent functions.  One approach would be to 
 enable a type modifier (or the lack thereof) to be used as a template 
 parameter, with something like
 ----------
 T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
    ...
 }

 A proposal for doing something very much like this is already planned for 
 D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple months 
 back).  It's called the 'return' storage class, which would make the 
 return value of a function take on the same constness or invariantness as 
 a parameter:

 const(char)[] doSomethingWith(return const (char)[] param) {
     ...
 }

 What this does is makes the constness of the return value the same as the 
 constness of the argument passed to 'param', each time the function is 
 called.  You can do something similar with templates already, but you 
 can't make template functions virtual.  The function is type-checked with 
 the declared types for the parameter and return (in this case, 
 const(char)[]).

This looks odd to me - you change the code declaring the _parameter_ type in 
order to effect a variation in the _return_ type?  And how would you use the 
type of parameterised constness within the body of the function?

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit.

Nov 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 11/3/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 T(char)[] doSomethingWith(typemod T)(T(char)[] param) {
    ...
 }

 A proposal for doing something very much like this is already planned for
 D 2.0 (it's in the WalterAndrei.pdf from the D conference a couple months
 back).  It's called the 'return' storage class, which would make the
 return value of a function take on the same constness or invariantness as
 a parameter:

 const(char)[] doSomethingWith(return const (char)[] param) {
     ...
 }

 What this does is makes the constness of the return value the same as the
 constness of the argument passed to 'param', each time the function is
 called.  You can do something similar with templates already, but you
 can't make template functions virtual.  The function is type-checked with
 the declared types for the parameter and return (in this case,
 const(char)[]).

 This looks odd to me - you change the code declaring the _parameter_ type in
 order to effect a variation in the _return_ type?  And how would you use the
 type of parameterised constness within the body of the function?

I agree with Stewart here. (Actually, reading what's been read, I
agree with everybody, except possibly Walter and Andrei).

It makes much more sense to me to allow some kind of template-like
parameter whose value can be const, invariant or mutable.

Also useful would be stuff like
   is(a : const)
   is(a : invariant)
   is(a : mutable)
for compile-time decision-making.

Nov 03 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 11/1/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 In DMD 2.006, the definition of string was changed from const(char)[] to
 invariant(char)[] (and similarly wstring and dstring).  This change has no
 doubt broken a fair amount of D 2.x code.

All of my D2 code compiled without change.


 Declaring these with invariant parameters therefore means that it
 is often necessary to .idup a string just to pass it to one of these
 functions.

That's not true. If /all/ strings are invariant, throughout, then
everything works.


 Moreover, if a piece of code manipulates strings with a mixture
 of direct modification and calls to std.string functions, it necessitates
 quite a bit of copying of strings.

No it doesn't, it merely means ensuring that the reference is unique
and then calling assumeUnique().


 void main(string[] a) {
     char[] text = cast(char[]) read(a[1]);

Well that line's wrong for a start. It should be
    string text = cast(string)read(a[1]);

There's your problem right there.


     foreach (ref char c; text) {
         if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
             c += 13;
         } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
             c -= 13;
         }
     }

I believe that should be

    bool willChange = false;
    foreach(char c;text) if (inPattern["A-Za-z"]) { willChange = true; break }
    if (willChange)
    {
        char[] s = text.dup;
        foreach (ref char c; s) {
           if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
               c += 13;
           } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
               c -= 13;
        }
        text = assumeUnique(s);
    }


(Before this release, I would have written
    text = cast(string)s;
That still compiles without complaint, but assumeUnique() is better).

The test to see if the string will change is good copy-on-write
behavior. The rest is your code, adapted to how you're supposed to do
things in D2.006. First you dup text, because that string /might/ be
in ROM. Then you make your changes. When you've got what you want, you
use assumeUnique() to turn it back into a string. This does /not/ make
a copy.



 In D 2.x, it's
 necessary to change one line, to something like

     text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;

I don't think that's right. You just declare text to be string instead
of char[] and those dups become unnecessary.


 There are a few caveats to this example:
 - the .idup is only because std.file.read currently returns a mutable
 void[] - we could actually cast it to an invariant as nothing else is going
 to use it

Not could. Should.


 There are probably plenty
 of more involved examples in which there's more difference than this between
 the 1.x and 2.x code.

If every string function you write obeys the copy-(only)-on-write
protocol, then I don't see that.


 3. Some concept of const-transparent functions.

I believe that's in the planning stage.

Nov 02 2007

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.521.1194042761.16939.digitalmars-d puremagic.com...
 On 11/1/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 In DMD 2.006, the definition of string was changed from const(char)[] to
 invariant(char)[] (and similarly wstring and dstring).  This change has 
 no
 doubt broken a fair amount of D 2.x code.

 All of my D2 code compiled without change.

Because your D2 code doesn't manipulate strings?

 Declaring these with invariant parameters therefore means that it
 is often necessary to .idup a string just to pass it to one of these
 functions.

 That's not true. If /all/ strings are invariant, throughout, then
 everything works.

If /all/ strings are invariant, then you're very limited in what 
manipulations you can perform.

 Moreover, if a piece of code manipulates strings with a mixture
 of direct modification and calls to std.string functions, it necessitates
 quite a bit of copying of strings.

 No it doesn't, it merely means ensuring that the reference is unique
 and then calling assumeUnique().

Only in the cases where ensuring that the reference is unique is possible.

 void main(string[] a) {
     char[] text = cast(char[]) read(a[1]);

 Well that line's wrong for a start. It should be
    string text = cast(string)read(a[1]);

 There's your problem right there.

Firstly, that was D1 code.  In D1, string is simply an alias of char[].

Secondly, if it were string, the rest of my code wouldn't work under D2, 
because there the string type denotes immutable data.

     foreach (ref char c; text) {
         if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
             c += 13;
         } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
             c -= 13;
         }
     }

 I believe that should be

    bool willChange = false;
    foreach(char c;text) if (inPattern["A-Za-z"]) { willChange = true; 
 break }
    if (willChange)
    {
        char[] s = text.dup;
        foreach (ref char c; s) {
           if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
               c += 13;
           } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
               c -= 13;
        }
        text = assumeUnique(s);
    }

There's the problem.  You've made the code more complicated to make the 
final copy conditional on something actually changing.  In an ideal world, 
it would be unnecessary to make that final copy at all (as far as the way my 
example uses it is concerned).

Moreover, your code loops twice, first to see if there's anything to change 
and then to perform the conversion.  This in itself would take a performance 
hit.

 (Before this release, I would have written
    text = cast(string)s;
 That still compiles without complaint, but assumeUnique() is better).

 The test to see if the string will change is good copy-on-write
 behavior. The rest is your code, adapted to how you're supposed to do
 things in D2.006. First you dup text, because that string /might/ be
 in ROM. Then you make your changes. When you've got what you want, you
 use assumeUnique() to turn it back into a string. This does /not/ make
 a copy.

You miss the point.  My example is of ad-hoc code to perform the conversion 
in place, because it is the most efficient mechanism with the constraints 
under which the application will ever perform it.  Data always loaded into 
RAM immediately before the conversion, and no desire to keep the 'before' 
data once the conversion has happened.

<snip>
 There are probably plenty
 of more involved examples in which there's more difference than this 
 between
 the 1.x and 2.x code.

 If every string function you write obeys the copy-(only)-on-write
 protocol, then I don't see that.

<snip>

Well, I wasn't writing a string function there, so that's beside the point. 
If you're implementing a complicated string-manipulating algorithm, you're 
not necessarily going to separate every little step of the algorithm into a 
separate function.

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit.

Nov 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 11/3/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 All of my D2 code compiled without change.

 Because your D2 code doesn't manipulate strings?

No, because I declared all my strings as string or wstring, not char[]
or wchar[]. And because I use, and assume, D's copy-on-write protocol.


 If /all/ strings are invariant, then you're very limited in what
 manipulations you can perform.

That's not true. You can do as much manipulation as you want at
creation-time. Only once the manipulation is "finished" do you cast
the result to string. (And that behavior is identical between const()
and invariant(), by the way, so the change to invariant() makes zero
difference to the source code at this point, except that you now have
the option of using the assumeUnique() function).


 Only in the cases where ensuring that the reference is unique is possible.

It's always possible. Just write an opening brace, do all your string
creation, assign the string to a string variable declared outside the
scope, then write a closing brace. Viola - invariance guaranteed,
because all the other references used in creation just went out of
scope.

Note that it is perfectly permissible to have multiple references to
an an invariant string anyway - providing that all of those references
are themselves declared invariant. It's only non-invariant references
which are prohibited, which is why they're the ones you have to lose
at the scope boundary.


 There's the problem.  You've made the code more complicated to make the
 final copy conditional on something actually changing.

Of course. That /is/ the copy-on-write protocol. If nothing changes,
return the original.


 Moreover, your code loops twice, first to see if there's anything to change
 and then to perform the conversion.  This in itself would take a performance
 hit.

I could have used the new munch() function instead of the first loop,
but I didn't think of it at the time I wrote the example.


 You miss the point.  My example is of ad-hoc code to perform the conversion
 in place, because it is the most efficient mechanism with the constraints
 under which the application will ever perform it.  Data always loaded into
 RAM immediately before the conversion, and no desire to keep the 'before'
 data once the conversion has happened.

Well then there's no problem anyway. For the whole time that your
string is "under construction", then it's not a string, it's a
(mutable) array of chars. Just keep it as such, until you've finished
building the string. Then do can do everything in place.

But note that if you want to do in-place manipulation of chars, then
std.string.replace() is NOT the function to use, because that
(possibly) makes a copy. Instead, you would have a replacing loop, or
write your own in-place-replace function which operates on char arrays
(or templatized for arrays in general)


 Well, I wasn't writing a string function there.

Then you shouldn't be calling std.string functions. What you need are
array functions.

Nov 03 2007

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.522.1194074475.16939.digitalmars-d puremagic.com...
 On 11/3/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:

<snip>
 If /all/ strings are invariant, then you're very limited in what
 manipulations you can perform.

 That's not true. You can do as much manipulation as you want at
 creation-time. Only once the manipulation is "finished" do you cast
 the result to string. (And that behavior is identical between const()
 and invariant(), by the way, so the change to invariant() makes zero
 difference to the source code at this point, except that you now have
 the option of using the assumeUnique() function).

So effectively, you're using the word "string" to refer specifically to the 
invariant kind, making "if all strings are invariant" a null condition.

 Only in the cases where ensuring that the reference is unique is 
 possible.

 It's always possible. Just write an opening brace, do all your string
 creation, assign the string to a string variable declared outside the
 scope, then write a closing brace. Viola - invariance guaranteed,
 because all the other references used in creation just went out of
 scope.

Maybe you're right ... but I'll have to see.

<snip>
 You miss the point.  My example is of ad-hoc code to perform the 
 conversion
 in place, because it is the most efficient mechanism with the constraints
 under which the application will ever perform it.  Data always loaded 
 into
 RAM immediately before the conversion, and no desire to keep the 'before'
 data once the conversion has happened.

 Well then there's no problem anyway. For the whole time that your
 string is "under construction", then it's not a string, it's a
 (mutable) array of chars. Just keep it as such, until you've finished
 building the string. Then do can do everything in place.

So in other words, my code was more or less right in the first place.

 But note that if you want to do in-place manipulation of chars, then
 std.string.replace() is NOT the function to use, because that
 (possibly) makes a copy. Instead, you would have a replacing loop, or
 write your own in-place-replace function which operates on char arrays
 (or templatized for arrays in general)

Having the std.string functions is useful even if they create copies.  A 
little bit of copying where it makes coding easier is OK for apps that 
aren't performance-critical, but it's still nice not to be made to do even 
more copying or down-and-dirty casting away invariant.

 Well, I wasn't writing a string function there.

 Then you shouldn't be calling std.string functions. What you need are
 array functions.

I'm not sure what you mean....

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit.

Nov 03 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 11/3/07, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 So effectively, you're using the word "string" to refer specifically to the
 invariant kind, making "if all strings are invariant" a null condition.

Well, it is now. :-)

I guess I should have clarified that all my strings were immutable
even back when string was const(char)[].


 So in other words, my code was more or less right in the first place.

Basically, yes. /Except/ in your expectations of std.string. The
functions in std.string are for fully constructed strings, not for
strings-under-construction. I'll get back to that in a minute.

 Then you shouldn't be calling std.string functions. What you need are
 array functions.

 I'm not sure what you mean....

I suppose I'm saying we need some extra functions. Maybe even a
library std.array. (It's not a big deal as it's relatively easy to
write these things yourself). But for functionality like in-place
replace, we should be using some specialized function like (if it
existed) std.array.replace - but definitely not std.string.replace.
Have you ever programmed in PHP? I'd like to see (almost) all the PHP
array functions available as standard in D. That way, it would be so
much easier to build char arrays in the manner that you suggest, and
then turn them into strings when they're fully built.

See http://uk2.php.net/manual/en/ref.array.php

Nov 03 2007

Walter Bright <newshound1 digitalmars.com> writes:

Stewart Gordon wrote:
 Note that the text is never copied after it is read in, except by 
 std.string.replace if it actually makes any change.  In D 2.x, it's 
 necessary to change one line, to something like
 
    text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;
 
 which consequently adds two copy operations.

Well, no. Take a look at the source to std.string.replace(). It does not 
modify the input in place - it returns the input if there are no 
changes, if there are changes, it returns a *copy*. Second, text should 
be declared as a string, so you do not need either of the dup's. Two 
copies are made, just as with the 1.0 version, in that line of code.

You will need a third copy to do the loop which modifies the string in 
place. I feel that, with strings, the advantages of invariant strings 
outweigh the disadvantages.

Note that one can still do the modify-in-place D 1.0 code, and do it 
very fast, by putting the tests for \r inside the loop rather than as 
separate loops. The D 1.0 version isn't what you'd write if you wanted 
speed, anyway.

Nov 02 2007

"Stewart Gordon" <smjg_1998 yahoo.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:fgguis$1hsk$1 digitalmars.com...
 Stewart Gordon wrote:
 Note that the text is never copied after it is read in, except by 
 std.string.replace if it actually makes any change.  In D 2.x, it's 
 necessary to change one line, to something like

    text = text.idup.replace("\r\n", "\n").replace("\r", "\n").dup;

 which consequently adds two copy operations.

 Well, no. Take a look at the source to std.string.replace(). It does not 
 modify the input in place - it returns the input if there are no changes, 
 if there are changes, it returns a *copy*.

That's basically what I said.

 Second, text should be declared as a string, so you do not need either of 
 the dup's.  Two copies are made, just as with the 1.0 version, in that 
 line of code.

 You will need a third copy to do the loop which modifies the string in 
 place.

In the first of these two paragraphs, you suggest omitting the final .dup, 
and then in the next, you effectively tell me to put it back in.  Therein 
lies my point - the "third copy" ought not to be necessary for my ad hoc 
code.  (I know I could cast away the invariant, but that's a rather down and 
dirty trick.)

 I feel that, with strings, the advantages of invariant strings outweigh 
 the disadvantages.

When it comes to string manipulation functions, giving the programmer the 
choice, with my proposal of const-transparency, would AISI bring even more 
advantages and alleviate some of the disadvantages.

 Note that one can still do the modify-in-place D 1.0 code, and do it very 
 fast, by putting the tests for \r inside the loop rather than as separate 
 loops. The D 1.0 version isn't what you'd write if you wanted speed, 
 anyway.

Yes, that's another way to do it....

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit.

Nov 03 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Const, invariant, strings and "constant data need never be copied"