digitalmars.D.bugs - [Issue 2093] New: string concatenation modifies original
- d-bugmail puremagic.com (34/34) May 10 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (6/6) May 10 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- Jarrett Billingsley (5/9) May 10 2008 ~ always creates a copy, but ~= will attempt to expand the array in-plac...
- d-bugmail puremagic.com (17/17) Nov 21 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (30/30) Nov 21 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (32/32) Nov 21 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (6/6) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- Denis Koroskin (3/8) Nov 22 2008 No, string is a mutable array of immutable chars:
- d-bugmail puremagic.com (5/5) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (14/20) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (33/33) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (9/9) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (12/12) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (13/13) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (13/13) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (10/20) Nov 22 2008 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (4/4) Feb 19 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2093
- d-bugmail puremagic.com (12/12) Mar 16 2010 http://d.puremagic.com/issues/show_bug.cgi?id=2093
http://d.puremagic.com/issues/show_bug.cgi?id=2093
           Summary: string concatenation modifies original
           Product: D
           Version: 2.014
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: bartosz relisoft.com
I will attach source code for this example. It's an XML parser. It should
produce the following output:
c:\D\Work>xml
root
    child
     color=red
         Text=foo bar baz
Instead it produces this:
c:\D\Work>xml
root
    rootd
     rootd=red
         Text=rootdar baz
The problem is that strings are modified after being copied, when the original
is concatenated upon. The problem goes away if I idup strings:
  _name = name.idup;
  _value = value.idup;
or when I replace 
  a ~= b;
with
  a = a ~ b;
-- 
 May 10 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093 Created an attachment (id=256) --> (http://d.puremagic.com/issues/attachment.cgi?id=256&action=view) Test case --
 May 10 2008
<d-bugmail puremagic.com> wrote in message news:bug-2093-3 http.d.puremagic.com/issues/...or when I replace a ~= b; with a = a ~ b;~ always creates a copy, but ~= will attempt to expand the array in-place. Now, if this is D2, and ~= is expanding an invariant(char)[] in-place, then _that_ is definitely an issue.
 May 10 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
smjg iname.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com
Welcome to the world of bug reporting.
The way to report a bug isn't to attach a 695-line program that contains some
functionality somewhere that exhibits the problem.
The correct manner is to post a small example that illustrates the problem,
typically either by writing a test program from scratch or by simplifying
little by little the program in which you found it.
If done well, the result will be small enough to post straight into the bug
report rather than attaching it.  DMD's code coverage analysis is a useful tool
for identifying unused parts of a program in order to cut them out, among other
things.
-- 
 Nov 21 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
smjg iname.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
I think I've finally managed to figure out what was going on.
----------
import std.stdio;
void main() {
    string s1, s2;
    s1 ~= "hello";
    s2 = s1;
    writefln(s1);
    writefln(s2);
    s1.length = 0;
    s1 ~= "Hi";
    writefln(s1);
    writefln(s2);
}
----------
hello
hello
Hi
Hillo
----------
This is the kind of testcase we like here.  Walter is more likely to fix a bug
if you make life easier for him by supplying something on which the cause can
easily be seen.
-- 
 Nov 21 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
This is a known bug and is a major array design flow. Arrays has no determined
owner (the only one who can grow without a reallocation if capacity permits):
import std.stdio;
void main()
{
    char[] s1, s2;
    s1.length = 100; // reserve the capacity
    s1.length = 0;
    s2 = s1; // both are pointing to an empty string with the capacity of 100
    s1 ~= "Hello"; // array is not reallocated, it is grown in-place
    writefln(s1);
    writefln(s2); // prints empty string. s2 still points to the same string
(which is now "Hello") and carries length of 0
    s2 ~= "Hi"; // overwrites s1
    writefln(s2); // "Hi"
    writefln(s1); // "Hillo"
}
s1 is the array owner and s2 is a slice (even though it really points to the
entire array), i.e. it should reallocate and take the ownership of the
reallocated array on append, but it doesn't happen.
Currently an 'owner' is anyone who has a pointer to array's beginning:
char[] s = "hello".dup;
char[] s1 = s[0..4];
s1 ~= "!";
assert(s != s1); // fails, both are "hell!", s is overwritten
s = "_hello".dup;
char[] s2 = s[1..5];
s2 ~= "!";
assert(s != s1); // succeeds, s1 is not changed
-- 
 Nov 21 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093 I thought 'string' types were immutable and thus ... s1.length = 0; should fail as it updates the string (trucates it to zero characters). --
 Nov 22 2008
22.11.08 в 12:58 в своём письме писал(а):http://d.puremagic.com/issues/show_bug.cgi?id=2093 I thought 'string' types were immutable and thus ... s1.length = 0; should fail as it updates the string (trucates it to zero characters).No, string is a mutable array of immutable chars: string == const(char)[]
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093 No, string is aliased to invariant(char)[], i.e. an array of invariant characters. You can change its length (usually, decreasing) but not contents. --
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093Currently an 'owner' is anyone who has a pointer to array's beginning: char[] s = "hello".dup; char[] s1 = s[0..4]; s1 ~= "!"; assert(s != s1); // fails, both are "hell!", s is overwrittenA simple char[] is fully mutable, so that doesn't violate any established rule, but whether it's desirable is another matter. With const(char)[] or invariant(char)[], obviously this isn't going to work, so ~= should always reallocate (unless the optimiser can be sure that no other reference to the data can possibly exist). Alternatively, the GC could maintain a note of the actual length of every heap-allocated array. Ownership would be determined by matching in both start pointer and length. When the length is increased, whether by .length or ~=, either update this actual length (if it's the owner that we're extending, IWC all other references to the same data lose ownership) or reallocate the array. --
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
schveiguy yahoo.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX
Note that this behavior is defined in the spec.  See
http://www.digitalmars.com/d/2.0/arrays.html#resize
"To maximize efficiency, the runtime always tries to resize the array in place
to avoid extra copying. It will always do a copy if the new size is larger and
the array was not allocated via the new operator or a previous resize
operation.
This means that if there is an array slice immediately following the array
being resized, the resized array could overlap the slice"
The fact that it violates invariantness is a side effect that Walter has not
yet dealt with.  There have been proposals to fix this, two of which I have
proposed:
1. As you said, store the requested length along with the block length in the
GC.  Only appending to an array that ends at the end of the allocated memory
will realloc in place.  Original proposal:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146
Note nobody responded to this one
2. Store the length of the allocated array in the first element of the array. 
Then modify the meaning of the length member of the array struct to flag
whether it is pointing to the beginning of the array or not.  Original
proposal:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437
Some people had questions, but nobody proved the proposal wouldn't work.
I don't think Walter is interested in fixing this issue, as it has been a
'feature' for a while, and he never has responded positively to any decent
proposals to fix this.
-- 
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
smjg iname.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |
Steven, you have no authority to mark this WONTFIX.
-- 
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
schveiguy yahoo.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com
           Severity|normal                      |enhancement
           Keywords|wrong-code                  |
Sorry, I was thinking wontfix because the compiler functions as designed.  I
marked it as an enhancment instead.  Removing wrong-code keyword also, as this
is intended behavior.
-- 
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093
smjg iname.com changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal
           Keywords|                            |accepts-invalid, spec
So the problem is that it _always_ leaves the decision to resize in place or
reallocate to the runtime.  The only way in which this can coexist with the
principle of invariant is that it is illegal to increase the length of a
const/invariant array.  Therefore, going by the current spec, the bug is that
DMD accepts the code.
-- 
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093 It seems to me then that this is a design choice - does the string length belong to the string or to the reference? For slices it must be the reference but for arrays? hmmm... Curently in D, a dynamic array and a slice are indistinguishable and I'm not so sure that should be the case. There are good arguments for the current design and also for the separation of slices and dynamic arrays. Common sense seems to say that if I change the length of a string that therefore every other reference to the same string should also honour the new length, and that this should also have no effect on previously captured slices of the string. --
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093It seems to me then that this is a design choice - does the string length belong to the string or to the reference? For slices it must be the reference but for arrays? hmmm... Curently in D, a dynamic array and a slice are indistinguishable and I'm not so sure that should be the case. There are good arguments for the current design and also for the separation of slices and dynamic arrays. Common sense seems to say that if I change the length of a string that therefore every other reference to the same string should also honour the new length, and that this should also have no effect on previously captured slices of the string.Arrays should not be typed differently than slices IMO, they should be able to be passed to the same functions. I think one of the two solutions I proposed would place the 'allocated length' of an array on the heap with the array data, thereby having the length stored in a shared location. Slices should respect this length, and if they cannot see the length, they should be reallocated as a full-blown array. --
 Nov 22 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2093 see also bug 2095 comment 6 --
 Feb 19 2009
http://d.puremagic.com/issues/show_bug.cgi?id=2093
Steven Schveighoffer <schveiguy yahoo.com> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED
19:55:38 PDT ---
This is fixed by the patch in bug 3637.  It is in dmd 2.041.
Compiling the attached file results in the desired output.
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Mar 16 2010








 
  
  
 
 d-bugmail puremagic.com
 d-bugmail puremagic.com 