www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - String Slice/Concatenate bug

reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5. 
  As it says in the comment, it will work if you .dup the slice.

 [russ russ dmd_bugs]$ cat slice.d
 import std.stdio;
 
 void main() {
   char[] foo = "1234567890".dup;
   char[] bar;
   bar = foo[0..1];  // works if you append .dup here
   foo = foo[1..$];
   writefln(foo," ",bar);
   bar ~= ","~foo[0..3];
   foo  = foo[3..$];
   writefln(foo," ",bar);
 }
 
 [russ russ dmd_bugs]$ dmd slice.d
 gcc slice.o -o slice -lphobos -lpthread -lm
 [russ russ dmd_bugs]$ ./slice
 234567890 1
 467890 1,234
 [russ russ dmd_bugs]$
Aug 18 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
news:de376q$lmr$1 digitaldaemon.com...
 DMD 0.129, Linux (Fedora Core 3)

 Notice that in the output, the 2nd line starts with a 4 instead of a 5. As 
 it says in the comment, it will work if you .dup the slice.

 [russ russ dmd_bugs]$ cat slice.d
 import std.stdio;

 void main() {
   char[] foo = "1234567890".dup;
   char[] bar;
   bar = foo[0..1];  // works if you append .dup here
   foo = foo[1..$];
   writefln(foo," ",bar);
   bar ~= ","~foo[0..3];
   foo  = foo[3..$];
   writefln(foo," ",bar);
 }

 [russ russ dmd_bugs]$ dmd slice.d
 gcc slice.o -o slice -lphobos -lpthread -lm
 [russ russ dmd_bugs]$ ./slice
 234567890 1
 467890 1,234
 [russ russ dmd_bugs]$
I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
Aug 18 2005
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Ben Hinkle wrote:
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
 news:de376q$lmr$1 digitaldaemon.com...
 
DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5. As 
it says in the comment, it will work if you .dup the slice.


[russ russ dmd_bugs]$ cat slice.d
import std.stdio;

void main() {
  char[] foo = "1234567890".dup;
  char[] bar;
  bar = foo[0..1];  // works if you append .dup here
  foo = foo[1..$];
  writefln(foo," ",bar);
  bar ~= ","~foo[0..3];
  foo  = foo[3..$];
  writefln(foo," ",bar);
}

[russ russ dmd_bugs]$ dmd slice.d
gcc slice.o -o slice -lphobos -lpthread -lm
[russ russ dmd_bugs]$ ./slice
234567890 1
467890 1,234
[russ russ dmd_bugs]$
I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
I don't remember if this is documented or not, either. My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
Aug 18 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
news:de38os$nba$1 digitaldaemon.com...
 Ben Hinkle wrote:
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
 news:de376q$lmr$1 digitaldaemon.com...

DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5. 
As it says in the comment, it will work if you .dup the slice.


[russ russ dmd_bugs]$ cat slice.d
import std.stdio;

void main() {
  char[] foo = "1234567890".dup;
  char[] bar;
  bar = foo[0..1];  // works if you append .dup here
  foo = foo[1..$];
  writefln(foo," ",bar);
  bar ~= ","~foo[0..3];
  foo  = foo[3..$];
  writefln(foo," ",bar);
}

[russ russ dmd_bugs]$ dmd slice.d
gcc slice.o -o slice -lphobos -lpthread -lm
[russ russ dmd_bugs]$ ./slice
234567890 1
467890 1,234
[russ russ dmd_bugs]$
I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
I don't remember if this is documented or not, either. My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
I think the problem is it can't tell that foo is using the memory following bar. All bar knows is that it's a pointer to the start of an allocation block that can hold the requested addition. Thinking about it some more, it seems like users of ~= must know if the memory following the array is "in use". If it could be (or is) then the user must dup explicitly. That means the +1 that Walter had to add to all memory allocations can go away because slicing off the end of an array shouldn't be extended using ~=. That would mean your bug actually has a silver lining since I've never liked that +1. Put the burden on the user to know if they can extend safely - just like COW the rule is "don't ~= in memory you don't own".
Aug 18 2005
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Ben Hinkle wrote:
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
 news:de38os$nba$1 digitaldaemon.com...
 
Ben Hinkle wrote:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
news:de376q$lmr$1 digitaldaemon.com...


DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5. 
As it says in the comment, it will work if you .dup the slice.



[russ russ dmd_bugs]$ cat slice.d
import std.stdio;

void main() {
 char[] foo = "1234567890".dup;
 char[] bar;
 bar = foo[0..1];  // works if you append .dup here
 foo = foo[1..$];
 writefln(foo," ",bar);
 bar ~= ","~foo[0..3];
 foo  = foo[3..$];
 writefln(foo," ",bar);
}

[russ russ dmd_bugs]$ dmd slice.d
gcc slice.o -o slice -lphobos -lpthread -lm
[russ russ dmd_bugs]$ ./slice
234567890 1
467890 1,234
[russ russ dmd_bugs]$
I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
I don't remember if this is documented or not, either. My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
I think the problem is it can't tell that foo is using the memory following bar. All bar knows is that it's a pointer to the start of an allocation block that can hold the requested addition. Thinking about it some more, it seems like users of ~= must know if the memory following the array is "in use". If it could be (or is) then the user must dup explicitly. That means the +1 that Walter had to add to all memory allocations can go away because slicing off the end of an array shouldn't be extended using ~=. That would mean your bug actually has a silver lining since I've never liked that +1. Put the burden on the user to know if they can extend safely - just like COW the rule is "don't ~= in memory you don't own".
I hear you, but it seems to me that if you are in the middle of an allocation, and you're only using part of it, then you should *assume* that the rest of the string is being used by somebody else. It's not always true, but it often will be. Just my opinion, though. I'd love to hear what the official word is.
Aug 18 2005
parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:de3ino$vld$1 digitaldaemon.com...
 Ben Hinkle wrote:
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:de38os$nba$1 digitaldaemon.com...

Ben Hinkle wrote:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:de376q$lmr$1 digitaldaemon.com...


DMD 0.129, Linux (Fedora Core 3)

Notice that in the output, the 2nd line starts with a 4 instead of a 5.
As it says in the comment, it will work if you .dup the slice.

[russ russ dmd_bugs]$ cat slice.d
import std.stdio;

void main() {
 char[] foo = "1234567890".dup;
 char[] bar;
 bar = foo[0..1];  // works if you append .dup here
 foo = foo[1..$];
 writefln(foo," ",bar);
 bar ~= ","~foo[0..3];
 foo  = foo[3..$];
 writefln(foo," ",bar);
}

[russ russ dmd_bugs]$ dmd slice.d
gcc slice.o -o slice -lphobos -lpthread -lm
[russ russ dmd_bugs]$ ./slice
234567890 1
467890 1,234
[russ russ dmd_bugs]$
I don't think that's a bug. ~= is behaving as expected (though I don't know if it's actually documented when ~= dups and when it doesn't). Are you suggesting ~= always dup? I'm not sure what you are expecting.
I don't remember if this is documented or not, either. My asumption was that ~= would do something analogous to realloc(); if the memory immediately following the buffer is already unallocated, then just extend the buffer; otherwise, duplicate it to a new location where there is space.
I think the problem is it can't tell that foo is using the memory following bar. All bar knows is that it's a pointer to the start of an allocation block that can hold the requested addition. Thinking about it some more, it seems like users of ~= must know if the memory following the array is "in use". If it could be (or is) then the user must dup explicitly. That means the +1 that Walter had to add to all memory allocations can go away because slicing off the end of an array shouldn't be extended using ~=. That would mean your bug actually has a silver lining since I've never liked that +1. Put the burden on the user to know if they can extend safely - just like COW the rule is "don't ~= in memory you don't own".
I hear you, but it seems to me that if you are in the middle of an allocation, and you're only using part of it, then you should *assume* that the rest of the string is being used by somebody else. It's not always true, but it often will be. Just my opinion, though. I'd love to hear what the official word is.
I don't follow who "you" is. Are you saying the compiler should do something different for your example than what it is doing now? Or by "you" do you mean the programmer? Just to be clear, what I'm saying is that 1) ~= do what it does today but document the duping behavior 2) functions that use ~= on inputs should document it so callers can take action to avoid extending into live memory 3) the +1 that is added to gc allocations should be removed since it is the user's responsibility to manage safe extensions. The alternative of having ~= dup every time will mean building arrays in a loop using ~= will waste lots of memory (and time spent duping). The doc http://www.digitalmars.com/d/arrays.html#resize does say that you should avoid setting length or cat'ing with slices.
Aug 19 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 18 Aug 2005 16:56:09 -0700, Russ Lewis wrote:

 DMD 0.129, Linux (Fedora Core 3)
 
 Notice that in the output, the 2nd line starts with a 4 instead of a 5. 
   As it says in the comment, it will work if you .dup the slice.
 
 [russ russ dmd_bugs]$ cat slice.d
 import std.stdio;
 
 void main() {
   char[] foo = "1234567890".dup;
   char[] bar;
   bar = foo[0..1];  // works if you append .dup here
   foo = foo[1..$];
   writefln(foo," ",bar);
   bar ~= ","~foo[0..3];
   foo  = foo[3..$];
   writefln(foo," ",bar);
 }
 
 [russ russ dmd_bugs]$ dmd slice.d
 gcc slice.o -o slice -lphobos -lpthread -lm
 [russ russ dmd_bugs]$ ./slice
 234567890 1
 467890 1,234
 [russ russ dmd_bugs]$
This is a surprise. I was under the impression that *all* concatenations caused an automatic dup operation. It will also work if you have ... bar = bar~","~foo[0..3]; Damn ... now I have to go back and check my existing code to make sure I didn't use this construct anywhere. -- Derek (skype: derek.j.parnell) Melbourne, Australia 19/08/2005 1:48:42 PM
Aug 18 2005