digitalmars.D - Arrays passed by almost reference?

Ali Cehreli (21/21) Nov 05 2009 I haven't started reading Andrei's chapter on arrays yet. I hope I won't...

dsimcha (20/41) Nov 05 2009 This is one of those areas where the low-level details of how arrays are

Travis Boucher (18/66) Nov 05 2009 main.a starts as:
Andrei Alexandrescu (13/61) Nov 05 2009 I don't think it's that bad. Bartosz tried to get me into a diatribe

Frank Benoit (9/40) Nov 05 2009 int[] a;

Nick Sabalausky (11/52) Nov 05 2009 Or you could force totally-by-value semantics with:

Ali Cehreli (27/27) Nov 05 2009 Thanks for all the responses.

Andrei Alexandrescu (3/44) Nov 05 2009 The ball is in your court to define better semantics.

Leandro Lucarella (13/57) Nov 05 2009 Just make arrays a reference value, like classes!

Travis Boucher (8/55) Nov 05 2009 You mean dynamic arrays, but what about static arrays? Sometimes it

Leandro Lucarella (15/27) Nov 05 2009 That's why they already are value types.

Travis Boucher (33/42) Nov 05 2009 I just wonder if that would be confusing.
Bob Jones (13/20) Nov 05 2009 Thats the whole problem. Dynamic arrays and slices are not the same thin...

Yigal Chripun (12/34) Nov 06 2009 I agree with the above.

Travis Boucher (14/65) Nov 05 2009 You can create DynamicArray and RandomAccessRange already now.

gzp (67/86) Nov 12 2009 I think problem is that, dynamic arrays and slices are NOT the same.

Ali Cehreli (37/107) Nov 12 2009 I don't think so: that is a reference to a slice.

gzp (56/75) Nov 13 2009 It's okay to change/create new semantics for new languages. It's a must...

Ali Cehreli (25/31) Nov 08 2009 I thought I passed the ball back to you in this thread:

Saaa (3/3) Nov 05 2009 Ali Cehreli wrote...

Ali Cehreli <acehreli yahoo.com> writes:

I haven't started reading Andrei's chapter on arrays yet. I hope I won't find
out that the following behavior is expected. :)

import std.cstream;

void modify(int[] a)
{
    a[0] = 1;
    a ~= 2;

    dout.writefln("During: ", a);
}

void main()
{
    int[] a = [ 0 ];

    dout.writefln("Before: ", a);
    modify(a);
    dout.writefln("After : ", a);
}

The output with dmd 2.035 is

Before: [0]
During: [1,2]
After : [1]

I don't understand arrays. :D

Ali

Nov 05 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Ali Cehreli (acehreli yahoo.com)'s article
 I haven't started reading Andrei's chapter on arrays yet. I hope I won't find

out that the following behavior is expected. :)
 import std.cstream;
 void modify(int[] a)
 {
     a[0] = 1;
     a ~= 2;
     dout.writefln("During: ", a);
 }
 void main()
 {
     int[] a = [ 0 ];
     dout.writefln("Before: ", a);
     modify(a);
     dout.writefln("After : ", a);
 }
 The output with dmd 2.035 is
 Before: [0]
 During: [1,2]
 After : [1]
 I don't understand arrays. :D
 Ali

This is one of those areas where the low-level details of how arrays are
implemented arrays leak out.  This is unfortunate, but in a close-to-the-metal
language it's sometimes a necessary evil.

(Dynamic) Arrays are structs that consist of a pointer to the first element and
a
length.  Essentially, the memory being pointed to by the array is passed by
reference, but the pointer to the memory and the length of the array are passed
by
value.  While this may seem ridiculous at first, it's a tradeoff that allows for
the extremely convenient slicing syntax we have to be implemented efficiently.

When you do the a[0] = 1, what you're really doing is:

*(a.ptr) = 1;

When you do the a ~= 2, what you're really doing is:

// Make sure the block of memory pointed to by a.ptr
// has enough capacity to be appended to.
a.length += 1;
*(a.ptr + 1) = 2;

Realistically, the only way to understand D arrays and use them effectively is
to
understand the basics of how they work under the hood.  If you try to memorize a
bunch of abstract rules, it will seem absurdly confusing.

Nov 05 2009

Travis Boucher <boucher.travis gmail.com> writes:

dsimcha wrote:
 == Quote from Ali Cehreli (acehreli yahoo.com)'s article
 I haven't started reading Andrei's chapter on arrays yet. I hope I won't find

 out that the following behavior is expected. :)
 import std.cstream;
 void modify(int[] a)
 {
     a[0] = 1;
     a ~= 2;
     dout.writefln("During: ", a);
 }
 void main()
 {
     int[] a = [ 0 ];
     dout.writefln("Before: ", a);
     modify(a);
     dout.writefln("After : ", a);
 }
 The output with dmd 2.035 is
 Before: [0]
 During: [1,2]
 After : [1]
 I don't understand arrays. :D
 Ali

 
 This is one of those areas where the low-level details of how arrays are
 implemented arrays leak out.  This is unfortunate, but in a close-to-the-metal
 language it's sometimes a necessary evil.
 
 (Dynamic) Arrays are structs that consist of a pointer to the first element
and a
 length.  Essentially, the memory being pointed to by the array is passed by
 reference, but the pointer to the memory and the length of the array are
passed by
 value.  While this may seem ridiculous at first, it's a tradeoff that allows
for
 the extremely convenient slicing syntax we have to be implemented efficiently.
 
 When you do the a[0] = 1, what you're really doing is:
 
 *(a.ptr) = 1;
 
 When you do the a ~= 2, what you're really doing is:
 
 // Make sure the block of memory pointed to by a.ptr
 // has enough capacity to be appended to.
 a.length += 1;
 *(a.ptr + 1) = 2;
 
 Realistically, the only way to understand D arrays and use them effectively is
to
 understand the basics of how they work under the hood.  If you try to memorize
a
 bunch of abstract rules, it will seem absurdly confusing.

main.a starts as:
struct {
   int length = 1;
   int *data = 0x12345; // some address pointing to [ 0 ]
}

inside of modify, a is:
struct { // different then main.a
    int length = 2;
    int *data = 0x12345; // same as main.a data [ 1, 2]
}

back in main:
struct { // same as original main.a
   int length = 1;
   int *data = 0x12345; // hasn't changed address, but data has to [ 1 ]
}


To get the expected results, pass a as a reference:

void modify(ref int[] a);

Nov 05 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Ali Cehreli (acehreli yahoo.com)'s article
 I haven't started reading Andrei's chapter on arrays yet. I hope I won't find

 out that the following behavior is expected. :)
 import std.cstream;
 void modify(int[] a)
 {
     a[0] = 1;
     a ~= 2;
     dout.writefln("During: ", a);
 }
 void main()
 {
     int[] a = [ 0 ];
     dout.writefln("Before: ", a);
     modify(a);
     dout.writefln("After : ", a);
 }
 The output with dmd 2.035 is
 Before: [0]
 During: [1,2]
 After : [1]
 I don't understand arrays. :D
 Ali

 
 This is one of those areas where the low-level details of how arrays are
 implemented arrays leak out.  This is unfortunate, but in a close-to-the-metal
 language it's sometimes a necessary evil.
 
 (Dynamic) Arrays are structs that consist of a pointer to the first element
and a
 length.  Essentially, the memory being pointed to by the array is passed by
 reference, but the pointer to the memory and the length of the array are
passed by
 value.  While this may seem ridiculous at first, it's a tradeoff that allows
for
 the extremely convenient slicing syntax we have to be implemented efficiently.
 
 When you do the a[0] = 1, what you're really doing is:
 
 *(a.ptr) = 1;
 
 When you do the a ~= 2, what you're really doing is:
 
 // Make sure the block of memory pointed to by a.ptr
 // has enough capacity to be appended to.
 a.length += 1;
 *(a.ptr + 1) = 2;
 
 Realistically, the only way to understand D arrays and use them effectively is
to
 understand the basics of how they work under the hood.  If you try to memorize
a
 bunch of abstract rules, it will seem absurdly confusing.

I don't think it's that bad. Bartosz tried to get me into a diatribe 
about how array behavior can't be defined formally. Of course it can.

The chunk and the limits of the chunk are part of D's array abstraction. 
The limits are passed by value. The ~= operation may 
nondeterministically choose to bind the limits to a different chunk. The 
right to modify members as they want is a fundamental right of any 
non-const member, so no confusion there. The decision is encapsulated. 
User code must write code that works according to that specification.

Could there be a better array specification? No doubt. But much as he 
tried, Bartosz couldn't come up with one. We couldn't come up with one. 
So if you could come up with one, speak up or forever use the existing one.


Andrei

Nov 05 2009

Frank Benoit <keinfarbton googlemail.com> writes:

Ali Cehreli schrieb:
 I haven't started reading Andrei's chapter on arrays yet. I hope I won't find
out that the following behavior is expected. :)
 
 import std.cstream;
 
 void modify(int[] a)
 {
     a[0] = 1;
     a ~= 2;
 
     dout.writefln("During: ", a);
 }
 
 void main()
 {
     int[] a = [ 0 ];
 
     dout.writefln("Before: ", a);
     modify(a);
     dout.writefln("After : ", a);
 }
 
 The output with dmd 2.035 is
 
 Before: [0]
 During: [1,2]
 After : [1]
 
 I don't understand arrays. :D
 
 Ali
 

int[] a;
a is kind of a pointer, one with the extra length information.
When passed to modify(), a is passed by-value, the contained data is
certainly passed by-reference since a points to the data.

This is why the a.length was not updated.

If you change "modify" to :
void modify(ref int[] a){...

it should work as you expected.

Nov 05 2009

"Nick Sabalausky" <a a.a> writes:

"Frank Benoit" <keinfarbton googlemail.com> wrote in message 
news:hcvff9$9cr$1 digitalmars.com...
 Ali Cehreli schrieb:
 I haven't started reading Andrei's chapter on arrays yet. I hope I won't 
 find out that the following behavior is expected. :)

 import std.cstream;

 void modify(int[] a)
 {
     a[0] = 1;
     a ~= 2;

     dout.writefln("During: ", a);
 }

 void main()
 {
     int[] a = [ 0 ];

     dout.writefln("Before: ", a);
     modify(a);
     dout.writefln("After : ", a);
 }

 The output with dmd 2.035 is

 Before: [0]
 During: [1,2]
 After : [1]

 I don't understand arrays. :D

 Ali

 int[] a;
 a is kind of a pointer, one with the extra length information.
 When passed to modify(), a is passed by-value, the contained data is
 certainly passed by-reference since a points to the data.

 This is why the a.length was not updated.

 If you change "modify" to :
 void modify(ref int[] a){...

 it should work as you expected.

Or you could force totally-by-value semantics with:

void modify(const(int)[] a){
int[] _a = a.dup;
...

(Anyone know if scope can be used on that to allocate it on the stack, or is 
that just for classes?)

I do agree it can sometimes be a bit weird though. But like others 
mentioned, it's kind of a necissary evil, and once you understand how the 
arrays work under-the-hood, it becomes a bit easier.

Nov 05 2009

Ali Cehreli <acehreli yahoo.com> writes:

Thanks for all the responses.

And yes, I know that 'ref' is what works for me here. I am trying to figure out
whether I should develop a guideline like "always pass arrays with 'ref', or
you may face surprises."

I understand it very well now and was able to figure out a way to cause some
bugs. :)

What can be said about the output of the following program? Will main.a[0] be
printed as 1 or 111?

import std.cstream;

void modify(int[] a)
{
    a[0] = 1;

    // ... more operations ...

    a[0] = 111;
}

void main()
{
    int[] a;
    a ~= 0;
    modify(a);

    dout.writefln(a[0]);
}

It depends on the operations in between the two assignments to a[0] in 'modify':

- if we leave the comment in place, main.a[0] is 111

- if we replace the comment with this code

    foreach (i; 0 .. 10) {
        a ~= 2;
    }

then main.a[0] is 1. In a sense, modify.a caused only "some" side effects in
main.a. If we shorten the foreach, then main.a[0] is again 111. To me, this is
at an unmanagable level. Unless we always pass with 'ref'.

I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

Ali

Nov 05 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ali Cehreli wrote:
 Thanks for all the responses.
 
 And yes, I know that 'ref' is what works for me here. I am trying to figure
out whether I should develop a guideline like "always pass arrays with 'ref',
or you may face surprises."
 
 I understand it very well now and was able to figure out a way to cause some
bugs. :)
 
 What can be said about the output of the following program? Will main.a[0] be
printed as 1 or 111?
 
 import std.cstream;
 
 void modify(int[] a)
 {
     a[0] = 1;
 
     // ... more operations ...
 
     a[0] = 111;
 }
 
 void main()
 {
     int[] a;
     a ~= 0;
     modify(a);
 
     dout.writefln(a[0]);
 }
 
 It depends on the operations in between the two assignments to a[0] in
'modify':
 
 - if we leave the comment in place, main.a[0] is 111
 
 - if we replace the comment with this code
 
     foreach (i; 0 .. 10) {
         a ~= 2;
     }
 
 then main.a[0] is 1. In a sense, modify.a caused only "some" side effects in
main.a. If we shorten the foreach, then main.a[0] is again 111. To me, this is
at an unmanagable level. Unless we always pass with 'ref'.
 
 I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

The ball is in your court to define better semantics.

Andrei

Nov 05 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  5 de noviembre a las 16:10 me escribiste:
 Ali Cehreli wrote:
Thanks for all the responses.

And yes, I know that 'ref' is what works for me here. I am trying to figure out
whether I should develop a guideline like "always pass arrays with 'ref', or
you may face surprises."

I understand it very well now and was able to figure out a way to cause some
bugs. :)

What can be said about the output of the following program? Will main.a[0] be
printed as 1 or 111?

import std.cstream;

void modify(int[] a)
{
    a[0] = 1;

    // ... more operations ...

    a[0] = 111;
}

void main()
{
    int[] a;
    a ~= 0;
    modify(a);

    dout.writefln(a[0]);
}

It depends on the operations in between the two assignments to a[0] in 'modify':

- if we leave the comment in place, main.a[0] is 111

- if we replace the comment with this code

    foreach (i; 0 .. 10) {
        a ~= 2;
    }

then main.a[0] is 1. In a sense, modify.a caused only "some" side effects in
main.a. If we shorten the foreach, then main.a[0] is again 111. To me, this is
at an unmanagable level. Unless we always pass with 'ref'.

I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

 
 The ball is in your court to define better semantics.

Just make arrays a reference value, like classes!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
A lo que Peperino respondióles: aquel que tenga sabañones que se los
moje, aquel que padece calvicie no padece un osito, no es bueno comer
lechón en día de gastritis, no mezcleis el vino con la sandía, sacad la
basura después de las ocho, en caso de emergencia rompa el vidrio con
el martillo, a cien metros desvio por Pavón.
	-- Peperino Pómoro

Nov 05 2009

Travis Boucher <boucher.travis gmail.com> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  5 de noviembre a las 16:10 me escribiste:
 Ali Cehreli wrote:
 Thanks for all the responses.

 And yes, I know that 'ref' is what works for me here. I am trying to figure
out whether I should develop a guideline like "always pass arrays with 'ref',
or you may face surprises."

 I understand it very well now and was able to figure out a way to cause some
bugs. :)

 What can be said about the output of the following program? Will main.a[0] be
printed as 1 or 111?

 import std.cstream;

 void modify(int[] a)
 {
    a[0] = 1;

    // ... more operations ...

    a[0] = 111;
 }

 void main()
 {
    int[] a;
    a ~= 0;
    modify(a);

    dout.writefln(a[0]);
 }

 It depends on the operations in between the two assignments to a[0] in
'modify':

 - if we leave the comment in place, main.a[0] is 111

 - if we replace the comment with this code

    foreach (i; 0 .. 10) {
        a ~= 2;
    }

 then main.a[0] is 1. In a sense, modify.a caused only "some" side effects in
main.a. If we shorten the foreach, then main.a[0] is again 111. To me, this is
at an unmanagable level. Unless we always pass with 'ref'.

 I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

 The ball is in your court to define better semantics.

 
 Just make arrays a reference value, like classes!
 

You mean dynamic arrays, but what about static arrays?  Sometimes it 
makes more sense to send a static array as a value rather then a 
reference (think in the case of small vectors).

Then we'd have 2 semantics for arrays, one for static arrays and one for 
dynamic arrays.

I am not fully against pass-by-ref arrays, I just think in passing by 
reference all of the time could have some performance implications.

Nov 05 2009

Leandro Lucarella <llucax gmail.com> writes:

Travis Boucher, el  5 de noviembre a las 20:44 me escribiste:
I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

The ball is in your court to define better semantics.

Just make arrays a reference value, like classes!

 
 You mean dynamic arrays, but what about static arrays?

I would say "make them value types", but they already are ;)

 Sometimes it makes more sense to send a static array as a value rather
 then a reference (think in the case of small vectors).

That's why they already are value types.

 Then we'd have 2 semantics for arrays, one for static arrays and one
 for dynamic arrays.

Yes. They should have different semantics because they are different.

 I am not fully against pass-by-ref arrays, I just think in passing by
 reference all of the time could have some performance implications.

OK, make 2 different types then: slices (value types, can't append, they
are only a view on other's data) and dynamic arrays (reference type, can
append, but a little slower to manipulate).

It's a shame this idea didn't came true after all...

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
- i bet microsoft's developers were on diet when they had to do win95
- microsoft has developers?

Nov 05 2009

Travis Boucher <boucher.travis gmail.com> writes:

 I am not fully against pass-by-ref arrays, I just think in passing by
 reference all of the time could have some performance implications.

 
 OK, make 2 different types then: slices (value types, can't append, they
 are only a view on other's data) and dynamic arrays (reference type, can
 append, but a little slower to manipulate).
 
 It's a shame this idea didn't came true after all...
 

I just wonder if that would be confusing.

Static arrays of 2 different sizes are 2 different types.

Another example of how it is already confusing:

--
int[2] a = [1, 2];
int[] b = [11, 22, 33];

b = a;
a[0] = 111;

/*
  Now both a and b == [111, 2], instead of the intuitive b == [1,2], c 
== [111, 2].  They point at the same data.
*/
b.length = b.length + 1; // now at different data.

a[1] = 222;
/* a == [111, 222], b == [111,2,0] as expected */
--

Something that is nice about dynamic arrays is how they can intermix 
with static arrays (int[] b = int[2]) in an efficient (and lazy copying) 
manor.  It makes functions like this fast and efficient:

int addThemAll(int[] data) {
     int rv = 0;
     foreach (i,v; data) rv += v;
     return rv;
}

Since an implicit case from a static array to a dynamic array is cheap, 
and slicing an array to a dynamic array is cheap (as long as you are 
only reading from the array).

I don't see how separating them to have different call semantics solves 
the problem.  However making a clearer definition of each (in 
documentation for example) might be helpful.

Me, being new D, I am glad this thread exists because I can see how I 
could have shot myself in the foot in the future without playing around 
and learning the difference.

Nov 05 2009

"Bob Jones" <me not.com> writes:

"Leandro Lucarella" <llucax gmail.com> wrote in message 
news:20091106035612.GI3748 llucax.com.ar...
 I am not fully against pass-by-ref arrays, I just think in passing by
 reference all of the time could have some performance implications.

 OK, make 2 different types then: slices (value types, can't append, they
 are only a view on other's data) and dynamic arrays (reference type, can
 append, but a little slower to manipulate).

 It's a shame this idea didn't came true after all...

Thats the whole problem. Dynamic arrays and slices are not the same thing, 
and having a syntax that allows code to be ignorant of which it is dealing 
with is always going to have problems imo. Being able to resize or append to 
slices is fubar imo.

I'd go with slices being value types, no concentenation, or resizing / 
reallocating, etc..

Dynamic arrays could be a library type. A templated struct that has a 
pointer, length, or whatever. They can have operator overloads for implicit 
convertion to slices, so any code that accepts slice can take dynamic 
arrays, and prevent side effects. Code that is going to reallocate, has to 
take a dynamic array. So at least whats happening is more obvious/explicit.

Nov 05 2009

Yigal Chripun <yigal100 gmail.com> writes:

On 06/11/2009 07:07, Bob Jones wrote:
 "Leandro Lucarella"<llucax gmail.com>  wrote in message
 news:20091106035612.GI3748 llucax.com.ar...
 I am not fully against pass-by-ref arrays, I just think in passing by
 reference all of the time could have some performance implications.

 OK, make 2 different types then: slices (value types, can't append, they
 are only a view on other's data) and dynamic arrays (reference type, can
 append, but a little slower to manipulate).

 It's a shame this idea didn't came true after all...

 Thats the whole problem. Dynamic arrays and slices are not the same thing,
 and having a syntax that allows code to be ignorant of which it is dealing
 with is always going to have problems imo. Being able to resize or append to
 slices is fubar imo.

 I'd go with slices being value types, no concentenation, or resizing /
 reallocating, etc..

 Dynamic arrays could be a library type. A templated struct that has a
 pointer, length, or whatever. They can have operator overloads for implicit
 convertion to slices, so any code that accepts slice can take dynamic
 arrays, and prevent side effects. Code that is going to reallocate, has to
 take a dynamic array. So at least whats happening is more obvious/explicit.

I agree with the above.

the semantics should be:
DynamicArray!(T) as a dynamic array
int[x] is a static array
RandomAccessRange!(T) is a slice

int[] a; // compile error

(names are not important ATM)

I don't think there's a need for a dedicated array slice type and 
instead they should be range types.
It should be easy to change underlining containers with compatible range 
types.

Nov 06 2009

Travis Boucher <boucher.travis gmail.com> writes:

Yigal Chripun wrote:
 On 06/11/2009 07:07, Bob Jones wrote:
 "Leandro Lucarella"<llucax gmail.com>  wrote in message
 news:20091106035612.GI3748 llucax.com.ar...
 I am not fully against pass-by-ref arrays, I just think in passing by
 reference all of the time could have some performance implications.

 OK, make 2 different types then: slices (value types, can't append, they
 are only a view on other's data) and dynamic arrays (reference type, can
 append, but a little slower to manipulate).

 It's a shame this idea didn't came true after all...

 Thats the whole problem. Dynamic arrays and slices are not the same 
 thing,
 and having a syntax that allows code to be ignorant of which it is 
 dealing
 with is always going to have problems imo. Being able to resize or 
 append to
 slices is fubar imo.

 I'd go with slices being value types, no concentenation, or resizing /
 reallocating, etc..

 Dynamic arrays could be a library type. A templated struct that has a
 pointer, length, or whatever. They can have operator overloads for 
 implicit
 convertion to slices, so any code that accepts slice can take dynamic
 arrays, and prevent side effects. Code that is going to reallocate, 
 has to
 take a dynamic array. So at least whats happening is more 
 obvious/explicit.

 
 I agree with the above.
 
 the semantics should be:
 DynamicArray!(T) as a dynamic array
 int[x] is a static array
 RandomAccessRange!(T) is a slice
 
 int[] a; // compile error
 
 (names are not important ATM)
 
 I don't think there's a need for a dedicated array slice type and 
 instead they should be range types.
 It should be easy to change underlining containers with compatible range 
 types.
 

You can create DynamicArray and RandomAccessRange already now.

Currently int[] a is very intuitive in its purpose, its just some of the 
implementation details that get confusing.

int doSomething(in int[]) a)
tells me doSomething is going to process an  int array of any size and 
not modify it.

int doSomething(int[] a)
tells me doSomething is going to process an int array of any size and 
possibly modify it.

An explicit 'out int[] a' would make it even more obvious what the 
function is going to do.

The thing is, dynamic arrays and slices are pretty much the same thing, 
its just hard to track what the underlying store points to.

Nov 05 2009

gzp <galap freemail.hu> writes:

 
 You can create DynamicArray and RandomAccessRange already now.
 
 Currently int[] a is very intuitive in its purpose, its just some of the 
 implementation details that get confusing.
 
 int doSomething(in int[]) a)
 tells me doSomething is going to process an  int array of any size and 
 not modify it.
 
 int doSomething(int[] a)
 tells me doSomething is going to process an int array of any size and 
 possibly modify it.
 
 An explicit 'out int[] a' would make it even more obvious what the 
 function is going to do.
 
 The thing is, dynamic arrays and slices are pretty much the same thing, 
 its just hard to track what the underlying store points to.

I think problem is that, dynamic arrays and slices are NOT the same. 
They have a common subset of interfaces (length, at, slice(maybe)), but 
they are just different. An array owns it's element, it can 
resize/remove, etc. the underlying structure. Ex A special array that 
stores the elements in a tree can add remove nodes at any time, but a 
slice of this "array" cannot alter the tree - only the elements (For 
example a AVL-tree cannot have a slice that modifies the element as it 
would require a restructuring of the tree, element modification can be 
performed only through the array itself)
I know int[] is a much simpler storage, but it makes my point more 
understandable.

So now back to int[]
According to the current implementation

foo(ref int[]) is an array: It can add/remove elements (restructure the 
tree)

foo(int[]) is a mixed thing. It is a slice with an automatic copy 
feature. It is not an array nor a slice.
If it would be a slice, than the underling struct could not be altered, 
but here it can be.
It does not own its elements as it cannot resize the underlying 
structure without penalty.
Actually it is a slice + copy_on_resize. When you modify the elements, 
it alters the element of the referred array, but when you resize it, it 
copies (or not, depending on the DMD implementation!!!) the elements 
into another array and creates a new slice+copy_on_resize object for 
this array.

Thus foo(int[]) is worse than the dangling pointers or buffer overwrite 
errors from C(C++). Semantically they are correct, so bug produced from 
resized int[] cannot be detected using tricks like patterns in the 
memory. It is something that can cause really-really nasty bugs. 
Especially when the copy of the original array on resizing depend on the 
dmd implementation. (depends on how much extra datas are allocated for 
an array before they are really reallocated).


 From my point of view it's quite rare to have an array that is 
temporally extended with elements (partially enabling to modify to old 
ones), then  forget the new ones. So please don't favor a feature in the 
core language that is either hardly used and causes bugs, that can be 
hardly  detected.

So my proposal is that (as been already mentioned by others):
  - have array those contains the element + structure,
	ex RandomAccessArray!(int) AVLTree
  - have slices (ranges) of array, where the structure cannot be altered
  - decide if int[] stands for a short version for either 
RandomAccessArray!(int) or Range!(int), but DO NOT have a mixed meaning, 
that can be altered/modified with const/immutable/ref qualifiers.



Some random thoughts:
[1,2,3,4] literal is the array itself. It could alter the structure, if 
it would not be an immutable object.

int[] a = [1,2,3,4] a is a slice of the array, cannot be resized.

int[] a = new int[100]; is a slice too,
new int[100] is a short version for new RandomAccassArray!(int)(100)

int[100] b;
int[] a = b; is a short version for a copy: a = RandomAccassArray!(int)( 
b ).opRange() or a much better solution'd be a slice to a static array 
if that's possible.

a.array is the referred array, thus a.array ~= 2 could resize the array, 
1. the slice a itself is not modified, it still points to the original 
subset - but then what about the removed elements ???
2. the slice a is automatically resized to point to the altered structure
3. slices of static arrays they cannot be resized

const int[] cannot resize the underlying array
int[] can

int[new] is the array itself (i'm not sure)
int[new] a = new int[100];
int[] slice_a = a;
assert( slice_a.array.ptr == a.ptr );

Gzp.

Nov 12 2009

Ali Cehreli <acehreli yahoo.com> writes:

gzp Wrote:

 I think problem is that, dynamic arrays and slices are NOT the same. 

I agree with most of what you wrote, but I can't see that in the current
implementation.

 They have a common subset of interfaces (length, at, slice(maybe)), but 
 they are just different. An array owns it's element, it can 
 resize/remove, etc. the underlying structure. Ex A special array that 
 stores the elements in a tree can add remove nodes at any time, but a 
 slice of this "array" cannot alter the tree - only the elements (For 
 example a AVL-tree cannot have a slice that modifies the element as it 
 would require a restructuring of the tree, element modification can be 
 performed only through the array itself)
 I know int[] is a much simpler storage, but it makes my point more 
 understandable.
 
 So now back to int[]
 According to the current implementation
 
 foo(ref int[]) is an array: It can add/remove elements (restructure the 
 tree)

I don't think so: that is a reference to a slice.

 foo(int[]) is a mixed thing. It is a slice with an automatic copy 
 feature. It is not an array nor a slice.

It is a pass-by-value slice. Now there is one more slice that provides access
to what the argument has been providing access to.

 If it would be a slice, than the underling struct could not be altered, 
 but here it can be.

D2's "slice" is different than some other languages'. :)

 It does not own its elements as it cannot resize the underlying 
 structure without penalty.

Agreed.

 Actually it is a slice + copy_on_resize.

Yes.

 When you modify the elements, 
 it alters the element of the referred array, but when you resize it, it 
 copies (or not, depending on the DMD implementation!!!) the elements 
 into another array and creates a new slice+copy_on_resize object for 
 this array.

That is the "discretionary" part in the semantics.

But, we must find a different entity (GC? druntime?) that makes the copies;
becaus that entity is the owner of the elements.

 Thus foo(int[]) is worse than the dangling pointers or buffer overwrite 
 errors from C(C++). Semantically they are correct, so bug produced from 
 resized int[] cannot be detected using tricks like patterns in the 
 memory. It is something that can cause really-really nasty bugs. 
 Especially when the copy of the original array on resizing depend on the 
 dmd implementation. (depends on how much extra datas are allocated for 
 an array before they are really reallocated).
 
 
  From my point of view it's quite rare to have an array that is 
 temporally extended with elements (partially enabling to modify to old 
 ones), then  forget the new ones.

I don't understand the use either. But I can see how it is important for
performance.

But we may not know at that time whether the new ones will not be used. The new
slice may be copied to another one.

I agree though that I can't see any use case.

 So please don't favor a feature in the 
 core language that is either hardly used and causes bugs, that can be 
 hardly  detected.
 
 So my proposal is that (as been already mentioned by others):
   - have array those contains the element + structure,
 	ex RandomAccessArray!(int) AVLTree
   - have slices (ranges) of array, where the structure cannot be altered
   - decide if int[] stands for a short version for either 
 RandomAccessArray!(int) or Range!(int), but DO NOT have a mixed meaning, 
 that can be altered/modified with const/immutable/ref qualifiers.

Good proposals.

My views on the current semantics:

I recently took it a challenge to define the semantics of the current dmd
implementation of "dynamic consequtive objects" (I don't want to call them
slices or dynamic arrays.) I've posted my views this weeek...

First two objections (not to you, but to the current nomenclature):

1) I disagree that D2 provides dynamic arrays to the programmer. The dynamic
nature of the elements are maintained on the background; but the programmers
never lay their hands on dynamic arrays.

2) D2's slices are not the same thing as in other languages. Still, I will call
what the programmer receives "slices" below

To illustrate, let's have a look at the following definition:

  int[] slice = new int[10];

- side effect: 10 objects are created
- returned value: a slice to all of those objects

It gets interesting:

  int[] slice2 = slice[1..$-1];

Now we have two entities that provide access to the underlying objects. This is
a "sharing relationship." In this sense, the two share the access to those
objects.

The interesting part is that, either party can leave this relationship at
will... As soon as they see unfit, they will go elsewhere and start providing
access to copies of these object.

Neither party owns these objects. The garbage collector does.

Because, if we say that 'slice' was the owner and now went away, then is
'slice2' owning the objects? Has it been promoted to a "dynamic array?"

I think not. For that reason, I see no difference between "dynamic arrays" and
"slices" in D2. Neither owns the objects; they provide access.

I describe this as "discretionary sharing semantics."

 Some random thoughts:

I am not sure whether the following are your proposals for change, but I tested
them with the current implementation and they fit in the semantics as I
understand.

 [1,2,3,4] literal is the array itself. It could alter the structure, if
 it would not be an immutable object.
 
 int[] a = [1,2,3,4] a is a slice of the array, cannot be resized.

It can be resized with 2.036 and fits my definition. As we append objects to
'a', it may get new copies to provide access to.

 int[] a = new int[100]; is a slice too,

Agreed: side effect is 100 element creation, return value is a slice.

 int[100] b;
 int[] a = b; is a short version for a copy: a = RandomAccassArray!(int)(

Disagreed: b is a fixed-sized array and 'a' is a slice that provides access to
its objects. 'a' may terminate this sharing contract at will as it sees unfit.

 a.array is the referred array, thus a.array ~= 2 could resize the array,
 1. the slice a itself is not modified, it still points to the original
 subset - but then what about the removed elements ???
 2. the slice a is automatically resized to point to the altered structure
 3. slices of static arrays they cannot be resized

Reading those, I think you've been proposing. My attempt is to define the
*current* semantics as of 2.036.

 const int[] cannot resize the underlying array
 int[] can
 
 int[new] is the array itself (i'm not sure)
 int[new] a = new int[100];

I haven't learned about T[new] yet, but I think it is discontinued. (?)

Ali

Nov 12 2009

gzp <galap freemail.hu> writes:

 
 D2's "slice" is different than some other languages'. :)
 

It's okay to change/create new semantics for new languages. It's  a must 
have to develop new features as long as they make sense.
But think as a newbie to programming for a while.
If you've just learned of arrays and slices and hardly know anything 
about memory layouts and pointers, would you understand these strange 
behaviours?

OT: Actually I wouldn't even let a newbie to use GC either, as it hides 
the ownership questions (as the case here with slices).
During program design one of the most crucial question is the role of 
each module/class. When you have a clear view of ownerships, roles, the 
design gets much better. With a GC this ownership question is postponed 
or even omitted and the barriers of the modules may become very thin; 
modularity/code reuse is gone. (I'm not saying that GC is bad! But more 
attention must be taken during program design)

 But, we must find a different entity (GC? druntime?) that makes the copies;
becaus that entity is the owner of the elements.

Exactly. And I'd prefer to distinct the owner, and the view of the array 
more. Don't let the view alter the structure (int[]), and allow the 
programmer to have access to the actual array (that's hidden now by the 
GC/druntime ).

 1) I disagree that D2 provides dynamic arrays to the programmer. The dynamic
nature of the elements are maintained on the background; but the programmers
never lay their hands on dynamic arrays.

Just as my comment above.

 It gets interesting:
 
   int[] slice2 = slice[1..$-1];
 
 Now we have two entities that provide access to the underlying objects. This
is a "sharing relationship." In this sense, the two share the access to those
objects.
 
 The interesting part is that, either party can leave this relationship at
will... As soon as they see unfit, they will go elsewhere and start providing
access to copies of these object.

If I want to leave the ownership let's make it explicit and write it 
down in the code. Don't let the program reviewer think for hours whether 
it's still the original array or some other different copy and view of it.
int[] slice2 = 
slice.I_really_want_to_create_a_copy_with_2_additional_elements;


 Some random thoughts:

 
 I am not sure whether the following are your proposals for change, but I
tested them with the current implementation and they fit in the semantics as I
understand.

Actually it was a kind of proposal suggestions those fit the current 
syntax. I'm not sure what's been implemented of them. They were just 
some ideas I'd like to see in D2.

 int[new] is the array itself (i'm not sure)
 int[new] a = new int[100];

 
 I haven't learned about T[new] yet, but I think it is discontinued. (?)

I haven't learned of T[new] either, maybe it's a different thing. I 
don't know, I was not following that thread. It just simply seemed 
natural after I've seen the syntax in the newsgroup.


One final comment why I really don't like the current slice implementation.
What is the undefined behaviour of a program? When you cannot tell what 
is the outcome of a function knowing all the inputs. (the random seeds 
is an input too :) )
And foo(int[]) is undefined, since it depends on the state of memory 
fragmentation, the state of the moon, etc.
The outcome depends on weather the GC can resize the array in place or 
not. Thus D has a built in feature that's undefined by nature, thus D is 
undefined.

Or slice resizing creates a copy of the underlying array all the time? 
Then why can't we access this array directly, i'd be much clearer and 
readable what was the goal of the programmer.

ex. foo( const C classRef) tells the programmer won't thange the class,
Than why can't we have:
foo( slice ) to indicate I want to change the items in the array
foo( const slice ) for reading the items only
foo( array ) I want to alter the structure of the elements
foo( const array ) just for completeness, as it should have the same 
effect as the const slice.

And for parallel programming they might also help, since for slice 
access the array structure cannot change so a read access is sufficient 
for the array structure (sometimes it's better to calculate something 
twice, and write it twice from different threads).
And for the array use, we know, the structure might also be changed, 
thus it have to be guarded by critical sections as well.

Gzp

Nov 13 2009

Ali Cehreli <acehreli yahoo.com> writes:

Andrei Alexandrescu Wrote:

 Ali Cehreli wrote:

 I don't think that this is easy to explain to a learner; and I think that is a
good indicator that there is a problem with these semantics.

 
 The ball is in your court to define better semantics.
 
 Andrei

I thought I passed the ball back to you in this thread:

  http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=100318

It didn't attract any interest... :)

Here are some points as teasers:

1) The term "dynamic array" and its distinction from "slice" is detrimental to
understanding D's arrays, and is against their nature.

For example, is 's2' valid below?

void main()
{
    int[] a = new int[11];
    int[] s = a[2..8];

    a = new int[55];

    int[] s2 = s[2..6];
}

If so, has 's' been "promoted" to a dynamic array? What happens is, even 'a'
was a slice to begin with. It was providing access to the 11 consecutive
objects that is being owned by the garbage collector.

My point is, even the left hand of the first initialization is a slice:

    int[] slice = new int[11];

The expression 'new int[11]' creates 11 elements as a side effect, and returns
"a slice to all of those elements."


2) The best that I can describe the semantics of slices is "discretionary
share." It is like sharing a number of resources by a number of entities. Like,
two companies sharing a cubicle space, where both are free to change the
contents of cubicles.

They can both add a new cubicle that is not shared by the other (s~=1). They
can both leave the "sharing" of cubicles at any time an soon as they see unfit.

This is a totally at-will arrangement between the two parties.

3) By accepting the above view, the exception of "slices are passed by
reference" disappears too: Slices are passed by value as well. What happens is,
the slice parameters starts "sharing" (or provides access to) all of the
elements that the original slice is sharing.

Same with assignment: It creates a sharing contract.

I appreciate any comments.

Ali

Nov 08 2009

"Saaa" <empty needmail.com> writes:

Ali Cehreli wrote...

This helps me with the meaning of in, out & ref.
http://bayimg.com/NaeOgaaCC

Nov 05 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Arrays passed by almost reference?