digitalmars.D.bugs - [Issue 5603] New: Initialization syntax for dynamic arrays

d-bugmail puremagic.com (102/102) Feb 16 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603

d-bugmail puremagic.com (39/39) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
d-bugmail puremagic.com (32/64) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
d-bugmail puremagic.com (30/70) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603
d-bugmail puremagic.com (14/34) Feb 17 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5603

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5603

           Summary: Initialization syntax for dynamic arrays
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



Fixed-sized arrays allow to specify an initialization value, or to not specify
one, or to leave the stack memory untouched, for special situations where
performance matters a lot:


void main() {
    int[5] a2 = void;

    int[5] a1 = 1;

    int[5][5] m2 = void;

    int[5][5] m1 = 1;
}


Dynamic arrays don't allow to specify an initialization value (expecially after
the deprecation of 'typedef', that used to allow the definition of a new int
type with a different init value). DMD has no syntax to allocate an unitialized
array, and currently it is not able to avoid double initializations of dynamic
arrays, an example:



void main() {
    auto a1 = new int[5];
    a1[] = 1;
}



initialization to zero, __memset32 performs a second initialization:

__Dmain comdat
L0:     sub ESP,01Ch
        mov EAX,offset FLAT:_D11TypeInfo_Ai6__initZ
        push 5
        push EAX
        call near ptr __d_newarrayT
        mov 0Ch[ESP],EAX
        mov 010h[ESP],EDX
        push dword ptr 0Ch[ESP]
        push 1
        push EDX
        call near ptr __memset32
        add ESP,014h
        add ESP,01Ch
        xor EAX,EAX
        ret



(I am not sure this is fully correct. GC.disable are used because m1/m2 contain
uninitialized pointers):



import core.memory: GC;

void main() {
    uint ba1 = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE;
    int n1 = 5;
    int[] a1 = (cast(int*)GC.malloc(int.sizeof * n1, ba1))[0 .. n1];

    uint ba2 = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE;
    int n2 = 5;
    int[] a2 = (cast(int*)GC.malloc(int.sizeof * n2, ba2))[0 .. n2];
    a2[] = 1;

    uint ba3a = GC.BlkAttr.APPENDABLE;
    uint ba3b = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE;
    int n3 = 5;
    GC.disable();
    int[][] m1 = (cast(int[]*)GC.malloc((int[]).sizeof * n3, ba3a))[0 .. n3];
    foreach (ref row; m1)
        row = (cast(int*)GC.malloc(int.sizeof * n3, ba3b))[0 .. n3];
    GC.enable();

    uint ba4a = GC.BlkAttr.APPENDABLE;
    uint ba4b = GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE;
    int n4 = 5;
    GC.disable();
    int[][] m2 = (cast(int[]*)GC.malloc((int[]).sizeof * n4, ba4a))[0 .. n4];
    foreach (ref row; m2) {
        row = (cast(int*)GC.malloc(int.sizeof * n4, ba4b))[0 .. n4];
        row[] = 1;
    }
    GC.enable();
}


So to avoid all that bug-prone mess I suggest to allow the fixed-sized array
syntax for dynamic arrays too:


void main() {
    auto a2 = new int[5] = void;

    auto a1 = new int[5] = 1;

    auto m2 = new int[][](5, 5) = void;

    auto m1 = new int[][](5, 5) = 1;
}



An usage of unitialized memory:
http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html

"An Efficient Representation for Sparse Sets" (1993), by Preston Briggs, Linda
Torczon:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.7319

From programming pearls book:
http://books.google.it/books?id=kse_7qbWbjsC&pg=PA207&lpg=PA207&dq=programming+pearls+uninitialized&source=bl&ots=DfAXDLwT5z&sig=X53xYgD0wdn_Rwl7tFNeCiRt4No&hl=en&ei=HWVcTa35EYOdOsLI5OYL&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBUQ6AEwAA

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 16 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5603


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com



04:30:23 PST ---
This does not need to be a language thing, library could suffice:

auto a = createArray!(int[][])(5, 5, 1); // initialize 5x5 array with the value
1 in each cell.

auto a = createUninitArray!(int[][])(5, 5); // name needs work...

In which we can hide your shown implementation (this can be factored out a
bit).

BTW, your code does not work properly for array appending.  It does not
initialize the hidden "allocated length" field, which would likely result in
reallocation on append.

Some other functions probably needed:

a.extendUninit(size_t newlength);

which is like a.length = newlength but does not initialize the new area.

----------------

I agree a syntax change would be more in line with current array allocation
operations (which are currently all syntax based), but I don't really like your
proposed syntax.

I would propose if we wanted to do a syntax change to do:

auto a = new int[][](5, 5, 1);
auto a = new int[][](5, 5, void);

Where the optional final argument determines the initial value.

This fits perfectly with the current array creation syntax:

new T(dim1, dim2, ..., dimN)

where T is a N dimensional array.  We can just add an extra parameter for the
value.

------------------

One problem with this whole proposal is the issue with struct semantics.  That
is, let's say a struct has a postblit, and you wanted to create an array of
those structs with a default value.  Should the runtime call the postblit for
each element?

I'd say it should.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 17 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5603





 This does not need to be a language thing, library could suffice:

Of course, you may create template functions that do what I have shown (and
better).


 BTW, your code does not work properly for array appending.  It does not
 initialize the hidden "allocated length" field, which would likely result in
 reallocation on append.

I see, thank you. That code I have written is clearly bug-prone, that's why a
built-in syntax (or functions in Phobos) are useful.



 Some other functions probably needed:
 
 a.extendUninit(size_t newlength);
 
 which is like a.length = newlength but does not initialize the new area.

This is a possible thing to add. But it looks less useful because when you want
uninitialized memory, you want max performance, so you probably don't want to
change the array length.


 I agree a syntax change would be more in line with current array allocation
 operations (which are currently all syntax based), but I don't really like your
 proposed syntax.
 
 I would propose if we wanted to do a syntax change to do:
 
 auto a = new int[][](5, 5, 1);
 auto a = new int[][](5, 5, void);
 
 Where the optional final argument determines the initial value.
 
 This fits perfectly with the current array creation syntax:
 
 new T(dim1, dim2, ..., dimN)
 
 where T is a N dimensional array.  We can just add an extra parameter for the
 value.

I am strongly against this idea because it's too much bug-prone. It's too much
easy to add or remove a [] by mistake, or add or remove the initialization
value by mistake, so you may end with the wrong number of dimensions, etc.
auto a = new int[][](5, 5, 2);
auto a = new int[][][](5, 5, 2);
auto a = new int[][][](5, 5, 5, 2);



 One problem with this whole proposal is the issue with struct semantics.  That
 is, let's say a struct has a postblit, and you wanted to create an array of
 those structs with a default value.  Should the runtime call the postblit for
 each element?
 
 I'd say it should.

This exactly same problem is present in the initialization syntax for
fixed-sized arrays, so the best solution is to just copy that semantics:

struct Foo {
    int x;
    this(this) {
        x++;
    }
}
void main() {
    Foo[2] foos = Foo(1);
    assert(foos[0].x == 2);
    assert(foos[1].x == 2);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 17 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5603




05:36:45 PST ---


 BTW, your code does not work properly for array appending.  It does not
 initialize the hidden "allocated length" field, which would likely result in
 reallocation on append.

 
 I see, thank you. That code I have written is clearly bug-prone, that's why a
 built-in syntax (or functions in Phobos) are useful.

I agree, a method to do this correctly would be good to have to avoid people
doing it incorrectly.

 Some other functions probably needed:
 
 a.extendUninit(size_t newlength);
 
 which is like a.length = newlength but does not initialize the new area.

 
 This is a possible thing to add. But it looks less useful because when you want
 uninitialized memory, you want max performance, so you probably don't want to
 change the array length.

I'm thinking of the case where I want to add N elements, but I'm going to
assign them one at a time.  This saves the initialization of the N elements
before I write them (a useless operation).

 I agree a syntax change would be more in line with current array allocation
 operations (which are currently all syntax based), but I don't really like your
 proposed syntax.
 
 I would propose if we wanted to do a syntax change to do:
 
 auto a = new int[][](5, 5, 1);
 auto a = new int[][](5, 5, void);
 
 Where the optional final argument determines the initial value.
 
 This fits perfectly with the current array creation syntax:
 
 new T(dim1, dim2, ..., dimN)
 
 where T is a N dimensional array.  We can just add an extra parameter for the
 value.

 
 I am strongly against this idea because it's too much bug-prone. It's too much
 easy to add or remove a [] by mistake, or add or remove the initialization
 value by mistake, so you may end with the wrong number of dimensions, etc.
 auto a = new int[][](5, 5, 2);
 auto a = new int[][][](5, 5, 2);
 auto a = new int[][][](5, 5, 5, 2);

First, this only really happens when the type is numerical.  For example, a
string array would fail to compile with an integral initializer.  Also, a void
initializer cannot be mistaken for a dimension size.

Second, I can see what you are saying, but I don't think this error will affect
much in practice.  It isn't often that one changes the number of dimensions. 
Readability-wise, however, it's not obvious whether the last element is an
initializer (an IDE might make this clearer with syntax coloring).

Your proposal clearly separates the value from the dimensions, but it probably
is unacceptable due to parsing requirements.  Plus it looks very bizarre.

If we are doing syntax changes, I think we need something unorthodox if we want
to make this clear.  What about:

auto a = new int[][](5, 5; 2);
auto a = new int[][](5, 5, =2);
auto a = new int[][](5, 5 : 2);
auto a = new int[][](5, 5) : 2;

I still think the original syntax I proposed is not different from functions
that contain default parameters, it should be able to be dealt with for most
people.  It is also advantageous to try and come up with a reasonable solution
that would be acceptable to the language author.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 17 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=5603






 First, this only really happens when the type is numerical.  For example, a
 string array would fail to compile with an integral initializer.  Also, a void
 initializer cannot be mistaken for a dimension size.
 
 Second, I can see what you are saying, but I don't think this error will affect
 much in practice.  It isn't often that one changes the number of dimensions. 
 Readability-wise, however, it's not obvious whether the last element is an
 initializer (an IDE might make this clearer with syntax coloring).
 
 Your proposal clearly separates the value from the dimensions, but it probably
 is unacceptable due to parsing requirements.  Plus it looks very bizarre.

It looks somewhat like the fixed-sized initialization syntax.


 If we are doing syntax changes, I think we need something unorthodox if we want
 to make this clear.  What about:
 
 auto a = new int[][](5, 5; 2);
 auto a = new int[][](5, 5, =2);
 auto a = new int[][](5, 5 : 2);
 auto a = new int[][](5, 5) : 2;

The last line is very close to my suggested syntax, and it has the advantage to
be intuitive for D programmers, because it's a copy of the fixed-sized
initialization syntax:

auto a = new int[][](5, 5) = void;
instead of:
int[5][5] a = void;


 I still think the original syntax I proposed is not different from functions
 that contain default parameters,

Named arguments (as Python ones) are safer than default ones.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 17 2011

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 5603] New: Initialization syntax for dynamic arrays