www.digitalmars.com         C & C++   DMDScript  

D - Arrays as references, or by value, or copy on write?! (BUG)

reply davepermen <davepermen_member pathlink.com> writes:
class A {
this() { myArray.length = 10; myArray[0] = 100; }
ubyte[] get() { return myArray; }
void print() {
printf("class A:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
private ubyte[] myArray;
}
class B {
void set(ubyte[] x) { myArray = x; }
void print() {
printf("class B:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
private ubyte[] myArray;
}

void test() {
A a = new A;
B b = new B;
b.set(a.get());
a.print();
b.print();
b.reset();
a.print();
b.print();
}

if i don't resize B.myArray, and write to it, it manipulates A.myArray, too =>
it gets returned by A.get() as reference, passed into B.set() as reference, and
A.myArray gets set as a reference to A.myArray. i can write to it, and
manipulate both.

if i change the length of B.myArray (by removing the /++/ comments), B.myArray
gets resized, and the reference to A.myArray is lost. from this moment on,
B.myArray is a copy of A.myArray. they are not in sync anymore.


this behaviour is highly confusing. looks like i have to do it like this in the
end to be save it is a reference:

class ArrayRef { ubyte[] data; }

and have an ArrayRef in A and B, and pass that around..


as far as i know, it should always be copy on write. but it isn't. its copy on
resize. this is highly unintuitive, a.k.a. buggy.

and actually.. how to return by reference?
Dec 02 2003
next sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
davepermen wrote:

class A {
this() { myArray.length = 10; myArray[0] = 100; }
ubyte[] get() { return myArray; }
void print() {
printf("class A:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
private ubyte[] myArray;
}
class B {
void set(ubyte[] x) { myArray = x; }
void print() {
printf("class B:" \n);
printf("myArray.length = %i" \n,myArray.length);
printf("myArray[0] = %i" \n,myArray[0]);
}
void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
private ubyte[] myArray;
}

void test() {
A a = new A;
B b = new B;
b.set(a.get());
a.print();
b.print();
b.reset();
a.print();
b.print();
}

if i don't resize B.myArray, and write to it, it manipulates A.myArray, too =>
it gets returned by A.get() as reference, passed into B.set() as reference, and
A.myArray gets set as a reference to A.myArray. i can write to it, and
manipulate both.

if i change the length of B.myArray (by removing the /++/ comments), B.myArray
gets resized, and the reference to A.myArray is lost. from this moment on,
B.myArray is a copy of A.myArray. they are not in sync anymore.


this behaviour is highly confusing. looks like i have to do it like this in the
end to be save it is a reference:

class ArrayRef { ubyte[] data; }
  
Or you could use the ancient art of pointers.
and have an ArrayRef in A and B, and pass that around..


as far as i know, it should always be copy on write. but it isn't. its copy on
resize. this is highly unintuitive, a.k.a. buggy.

and actually.. how to return by reference?
  
Dec 02 2003
parent reply davepermen <davepermen_member pathlink.com> writes:
Or you could use the ancient art of pointers.
never. i want a shared dynamic array. with full array functionality in both shares. without any pointer mess why can't array be by reference by default. they have the .dup to copy.. eighter make them copy behaving, or not. but this is a buggy copy-on-write. that doesn't work out right
Dec 02 2003
parent Ilya Minkov <minkov cs.tum.edu> writes:
davepermen wrote:
Or you could use the ancient art of pointers.
never. i want a shared dynamic array. with full array functionality in both shares. without any pointer mess
How about inout specifier?
 why can't array be by reference by default. they have the .dup to copy..
Let's get down to it. Array is a struct of length and pointer. If you change the contents, the change is safely propagated back. However, if you re-settle the array, the change doesn't propagate back since all you have is a pointer and length by *value*. One possible solution would be to make an array simply a pointer into length + data. But then, if you slice into an array, a copy of the contents must be made to be able to set the length just before it. Slow. Another solution would be to make arrays behave *really* by value, that is copy always. BTW, this should also apply to objects then for symmetry reasons, and Java programmers won't like that... Yet another solution would be double indirection of a pointer to a current array struct. Agree that's ugly and slow, but if you need it it's there in form of the inout specifier. Big idea: i think all the "in" parameters must give a compiler error, if someone tries to make a change that doesn't propagate to caller. That is, modify array elements is not an error, but reseat an array is an error. Modify the input by-value integer or float or struct be also an error. Thus people would be made think what they really need, and copy into locals as desired.
 eighter make them copy behaving, or not. but this is a buggy copy-on-write.
that
 doesn't work out right
It's not copy on write by semantics, but by convention. BTW, do you recall you cannot resize an array from a function in C or C++ either? -eye
Dec 03 2003
prev sibling parent reply "Ben Hinkle" <bhinkle4 juno.com> writes:
I was surprised to learn that D arrays are pretty different than C arrays
but once I learned they were just structs with a length and a pointer to the
data it all made sense and I use that mental model to figure stuff out. The
length of the array isn't *really* part of the array in the C sense. In Java
the length is read-only so it doesn't run into problems about resizing the
array since it never needs to re-allocate the data pointer.

-Ben

"davepermen" <davepermen_member pathlink.com> wrote in message
news:bqibpf$252n$1 digitaldaemon.com...
 class A {
 this() { myArray.length = 10; myArray[0] = 100; }
 ubyte[] get() { return myArray; }
 void print() {
 printf("class A:" \n);
 printf("myArray.length = %i" \n,myArray.length);
 printf("myArray[0] = %i" \n,myArray[0]);
 }
 private ubyte[] myArray;
 }
 class B {
 void set(ubyte[] x) { myArray = x; }
 void print() {
 printf("class B:" \n);
 printf("myArray.length = %i" \n,myArray.length);
 printf("myArray[0] = %i" \n,myArray[0]);
 }
 void reset() { /+myArray.length = 20;+/ myArray[0] = 200; }
 private ubyte[] myArray;
 }

 void test() {
 A a = new A;
 B b = new B;
 b.set(a.get());
 a.print();
 b.print();
 b.reset();
 a.print();
 b.print();
 }

 if i don't resize B.myArray, and write to it, it manipulates A.myArray,
too =>
 it gets returned by A.get() as reference, passed into B.set() as
reference, and
 A.myArray gets set as a reference to A.myArray. i can write to it, and
 manipulate both.

 if i change the length of B.myArray (by removing the /++/ comments),
B.myArray
 gets resized, and the reference to A.myArray is lost. from this moment on,
 B.myArray is a copy of A.myArray. they are not in sync anymore.


 this behaviour is highly confusing. looks like i have to do it like this
in the
 end to be save it is a reference:

 class ArrayRef { ubyte[] data; }

 and have an ArrayRef in A and B, and pass that around..


 as far as i know, it should always be copy on write. but it isn't. its
copy on
 resize. this is highly unintuitive, a.k.a. buggy.

 and actually.. how to return by reference?
Dec 02 2003
parent reply davepermen <davepermen_member pathlink.com> writes:
this behaviour is simply not simple. its bad. eighter copy all around, or
reference all around. but this way is very unintuitive imho. teach THAT to a
newbie and then tell D is a great language, because it works simple and logical.

In article <bqjblv$hmf$1 digitaldaemon.com>, Ben Hinkle says...
I was surprised to learn that D arrays are pretty different than C arrays
but once I learned they were just structs with a length and a pointer to the
data it all made sense and I use that mental model to figure stuff out. The
length of the array isn't *really* part of the array in the C sense. In Java
the length is read-only so it doesn't run into problems about resizing the
array since it never needs to re-allocate the data pointer.

-Ben
Dec 03 2003
parent Brad Beveridge <brad clear.net.nz> writes:
I agree Dave.
I think that D should always handle arrays by reference, except where 
explicitly dup'ed.

That's easy to understand and consistant.  What about slicing though? 
Even the current method confuses me a little.
So a[] = b[2..10] - is a reference, but what happens if you resize a. 
Because you can resize in place, but that may mess with elements within 
b.  I would submit that if you are slicing out part of an array, the new 
slice size is now fixed.  If you need to resize a, you will need to dup 
it first.
So a.length = x throws an exception (or a compile time check may pick it 
up?)  Actually, it should be easy enough to make a a static array.

My major grip with being able to slice out an array & then resize that 
new slice - that sounds like it would cause seriously subtle bugs.

Cheers
Brad


davepermen wrote:
 this behaviour is simply not simple. its bad. eighter copy all around, or
 reference all around. but this way is very unintuitive imho. teach THAT to a
 newbie and then tell D is a great language, because it works simple and
logical.
 
 In article <bqjblv$hmf$1 digitaldaemon.com>, Ben Hinkle says...
 
I was surprised to learn that D arrays are pretty different than C arrays
but once I learned they were just structs with a length and a pointer to the
data it all made sense and I use that mental model to figure stuff out. The
length of the array isn't *really* part of the array in the C sense. In Java
the length is read-only so it doesn't run into problems about resizing the
array since it never needs to re-allocate the data pointer.

-Ben
lid Date