digitalmars.D - "The total size of a static array cannot exceed 16Mb."

Vladimir Panteleev (8/8) Oct 01 2007 Why?

Oskar Linde (5/13) Oct 01 2007 This is a reply from Walter last year:

Vladimir Panteleev (17/19) Oct 02 2007 If you ask me, most of those reasons are excuses for 6). Putting restric...

Vladimir Panteleev (9/14) Oct 01 2007 I think I found the cause of this limitation - OPTLINK hangs, crashes an...

Christopher Wright (16/35) Oct 02 2007 T[][] SquareArray (T) (int x, int y) {

Vladimir Panteleev (22/37) Oct 02 2007 ne

Janice Caron (22/27) Oct 02 2007 For that matter, I think you could also do:

Jarrett Billingsley (7/38) Oct 02 2007 You've basically implemented, in templates, what you can already do with...

Janice Caron (3/5) Oct 02 2007 Ooh! I didn't know you could do that!

Jarrett Billingsley (6/12) Oct 02 2007 :)

Walter Bright (4/10) Oct 02 2007 If you define an array as:

Vladimir Panteleev (14/24) Oct 02 2007 Yes, only one - when there should be none at all. *sigh*

BCS (3/24) Oct 02 2007 Sorry, no can do. Even with stack based static arrays there is a single ...

Vladimir Panteleev (8/10) Oct 02 2007 If you're talking about the current situation in D with huge arrays, the...

BCS (13/33) Oct 02 2007 Unless I'm smoking something, the code to access any array will look the...

Vladimir Panteleev (43/45) Oct 02 2007 Well, with how I understand this terminology, "dereferencing" means read...

Walter Bright (15/32) Oct 02 2007 You might be surprised. Intel has done a great deal of work optimizing

Vladimir Panteleev (19/51) Oct 02 2007 s

Bill Baxter (4/11) Oct 02 2007 For anyone whose first inclination is to hex-edit binaries when

BCS (20/54) Oct 02 2007 [...]

Vladimir Panteleev (11/48) Oct 02 2007 No, that gets the value directly and puts it in eax. To get the address ...

BCS (18/59) Oct 02 2007 yes, assuming that you have to do some pointer work to get the array (if...
BCS (3/6) Oct 02 2007

Vladimir Panteleev (5/12) Oct 03 2007 --

Walter Bright (4/14) Oct 02 2007 For accessing a big array like that, it is hard to see how a single

Jarrett Billingsley (9/12) Oct 02 2007 Then Make That Backend Generate Some Other Format Besides OMF (Like ELF,...

Christopher Wright (5/18) Oct 02 2007 There's GDC. Walter could make that the default branch (and actually

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Feb 20 2008 It usually clocks in somewhere inbetween gcc and g++, but I'm sure

naryl (5/18) Feb 20 2008 It is somewhat outdated but still:

Walter Bright (3/15) Oct 02 2007 It's not that bad, especially not this particular issue, which has a

BCS (2/18) Oct 02 2007 just new the array.

Vladimir Panteleev (12/30) Oct 02 2007 n

Janice Caron (22/24) Oct 02 2007 Why not just
Janice Caron (4/6) Oct 02 2007 Sorry. That should be
Janice Caron (5/7) Oct 02 2007 Gaah! That's too many typos for one morning!

Vladimir Panteleev (3/5) Oct 02 2007 Thanks - this indeed makes the code more readable, however now accessing...

Janice Caron (5/10) Oct 02 2007 It doesn't have to be a class. It could just as easily be a struct.

BCS (5/36) Oct 02 2007 I was thinking of this:

Vladimir Panteleev (6/9) Oct 02 2007 Why haven't I thought of that... :O

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

Why?

I posted a bug report yesterday, thinking it was just some accidentally left
over code.
But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

The check is in mtype.c, line 1835 (DMD 1.021).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 01 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Vladimir Panteleev wrote:
 Why?
 
 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.
 
 The check is in mtype.c, line 1835 (DMD 1.021).

This is a reply from Walter last year:

http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071


-- 
Oskar

Oct 01 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 09:44:22 +0300, Oskar Linde <oskar.lindeREM ovegmail=
.com> wrote:

 This is a reply from Walter last year:

 http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071

If you ask me, most of those reasons are excuses for 6). Putting restric=
tions on things which MAY hurt you IF you do things you're never going t=
o do is, to put it softly, not a very good idea. A static array in the u=
ninitialized part of the data segment allows me to randomly access any p=
art of it very quickly, because its address and the address of every of =
its elements is predetermined at compile time, so - unlike with the case=
 of dynamic arrays or pointers to static arrays - you do not need to der=
eference pointers every time. I'll say it again, D is the only compiled =
language that has such a limitation. =


As for point 5) - well, the executable grows by the size of the array re=
gardless if the array has any initialization. Heck, it adds initializati=
on data even if I use "=3Dvoid".

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev
<thecybershadow gmail.com> wrote:

 Why?

 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

 The check is in mtype.c, line 1835 (DMD 1.021).

I think I found the cause of this limitation - OPTLINK hangs, crashes and/or
generates corrupt data if I try to force it such input. It works if I use a
pointer to a huge static array, but DMD wouldn't accept that as input anyway. I
thought that putting the array in a struct or class would prevent the linker
from generating a >16MB data segment, but I forgot that D needs(!) to have an
initial value for every type, even if it's completely null. Once again, I'll
mention that Borland and Microsoft's linkers don't have such a limitation, and
they can generate data segments over 16 MB with no problems.

So, suggestions for fixing this:

1) fix or rewrite OPTLINK
   ("rewrite" because I keep hearing about how OPTLINK holds D back for one
reason or another; I know it's no small task, however D doesn't need a
multi-purpose linker, just one that can handle what the DMD compiler generates)
2) if a type doesn't contain any non-0 initial values, instead of generating a
big block of zeroes for the initializer, initialize it with a memset (rep
stosd/stosb).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 01 2007

Christopher Wright <dhasenan gmail.com> writes:

Vladimir Panteleev wrote:
 On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
 
 Why?

 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

 The check is in mtype.c, line 1835 (DMD 1.021).

 
 I think I found the cause of this limitation - OPTLINK hangs, crashes and/or
generates corrupt data if I try to force it such input. It works if I use a
pointer to a huge static array, but DMD wouldn't accept that as input anyway. I
thought that putting the array in a struct or class would prevent the linker
from generating a >16MB data segment, but I forgot that D needs(!) to have an
initial value for every type, even if it's completely null. Once again, I'll
mention that Borland and Microsoft's linkers don't have such a limitation, and
they can generate data segments over 16 MB with no problems.
 
 So, suggestions for fixing this:
 
 1) fix or rewrite OPTLINK
    ("rewrite" because I keep hearing about how OPTLINK holds D back for one
reason or another; I know it's no small task, however D doesn't need a
multi-purpose linker, just one that can handle what the DMD compiler generates)
 2) if a type doesn't contain any non-0 initial values, instead of generating a
big block of zeroes for the initializer, initialize it with a memset (rep
stosd/stosb).
 

T[][] SquareArray (T) (int x, int y) {
    T[] block = new T[x * y];
    T[][] ret = [];
    ret.length = x;
    for (int i = 0; i < x; i++) {
       ret[i] = block[i * y .. (i + 1) * y];
    }

    return ret;
}

void main () {
    int[][] array = SquareArray!(int)(9000, 9000);
}

Slightly tedious, and it gives you reference semantics and places the 
array on the heap (but do you really *want* 16MB stack frames? I imagine 
the OS might have something to say about that).

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 16:18:26 +0300, Christopher Wright <dhasenan gmail.c=
om> wrote:

 T[][] SquareArray (T) (int x, int y) {
     T[] block =3D new T[x * y];
     T[][] ret =3D [];
     ret.length =3D x;
     for (int i =3D 0; i < x; i++) {
        ret[i] =3D block[i * y .. (i + 1) * y];
     }

     return ret;
 }

 void main () {
     int[][] array =3D SquareArray!(int)(9000, 9000);
 }

 Slightly tedious, and it gives you reference semantics and places the
 array on the heap (but do you really *want* 16MB stack frames? I imagi=

ne
 the OS might have something to say about that).

Thanks, however dynamic arrays are unfortunately slow. To access a rando=
m element, you need a dereference, a multiplication+addition, another de=
reference, another multiplication+addition. Janice Caron's workaround, w=
hen modified to use a struct placed in the data segment, requires only o=
ne dereferencing/multiply+adding - while a static array placed in the da=
ta segment requires only arithmetics to access.

Also, I was planning to place the array in the uninitialized data segmen=
t (I believe it is called BSS in ELF and object files), not the stack - =
I believe the default stack size for Windows executables to be 256 kB an=
yway.

P.S. I believe your function can be reduced to this without any signific=
ant changes in behavior:

     T[][] ret =3D new T[][x];
     foreach(ref row;ret)
         row.length =3D y;

     return ret;

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 P.S. I believe your function can be reduced to this without any significant
changes in behavior:

      T[][] ret = new T[][x];
      foreach(ref row;ret)
          row.length = y;

      return ret;

For that matter, I think you could also do:

    template MultiDimArray(T,int N)
    {
        static if (N == 1)
            typedef T[] MultiDimArray;
        else
            typedef MultiDimArray!(T,N-1)[] MultiDimArray;
    }

    MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] others...)
    {
        static if (N == 1)
            return new T[dim];
        else
        {
            MultiDimArray!(T,N) t;
            t.length = dim;
            foreach(ref a;t) a =
MakeMultiDimArray!(T,N-1)(others[0],others[1..$]);
            return t;
        }
    }

Oct 02 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.353.1191333299.16939.digitalmars-d puremagic.com...
 On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 P.S. I believe your function can be reduced to this without any 
 significant changes in behavior:

      T[][] ret = new T[][x];
      foreach(ref row;ret)
          row.length = y;

      return ret;

 For that matter, I think you could also do:

    template MultiDimArray(T,int N)
    {
        static if (N == 1)
            typedef T[] MultiDimArray;
        else
            typedef MultiDimArray!(T,N-1)[] MultiDimArray;
    }

    MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] 
 others...)
    {
        static if (N == 1)
            return new T[dim];
        else
        {
            MultiDimArray!(T,N) t;
            t.length = dim;
            foreach(ref a;t) a =
 MakeMultiDimArray!(T,N-1)(others[0],others[1..$]);
            return t;
        }
    }

You've basically implemented, in templates, what you can already do with the 
language syntax.  MultiDimArray!(int, 3) will turn into int[][][], and the 
functionality of MakeMultiDimArray is already covered by array new-ing, i.e. 
new int[][][](10, 20, 30).

And it's still an array-of-arrays.

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote:
 functionality of MakeMultiDimArray is already covered by array new-ing, i.e.
 new int[][][](10, 20, 30).

Ooh! I didn't know you could do that!
Nice!

Oct 02 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.362.1191349640.16939.digitalmars-d puremagic.com...
 On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote:
 functionality of MakeMultiDimArray is already covered by array new-ing, 
 i.e.
 new int[][][](10, 20, 30).

 Ooh! I didn't know you could do that!
 Nice!

:)

Actually 'new int[5]' is exactly equivalent to 'new int[](5)' -- the former 
is sugar for the latter.  It's just most people are used to writing it like 
the latter.

Oct 02 2007

Walter Bright <newshound1 digitalmars.com> writes:

Vladimir Panteleev wrote:
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data segment,
 requires only one dereferencing/multiply+adding - while a static
 array placed in the data segment requires only arithmetics to access.

If you define an array as:

int[10000][] a = new int[10000][100];

only one dereference is required to access the data.

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright <newshound1 digitalmar=
s.com> wrote:

 Vladimir Panteleev wrote:
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data segment,=


 requires only one dereferencing/multiply+adding - while a static
 array placed in the data segment requires only arithmetics to access.=


 If you define an array as:

 int[10000][] a =3D new int[10000][100];

 only one dereference is required to access the data.

Yes, only one - when there should be none at all. *sigh*

Note that one thing which the compiler fails to prevent but still chokes=
 the linker are having more than 16 Mb of data or even distinct types (w=
hich need initial values, such as structs/classes). Example:

uint[0x800000] a,b,c,d;

void main()
{
}

So - please, Just Fix That Linker ;)

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
 Vladimir Panteleev wrote:
 
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data
 segment, requires only one dereferencing/multiply+adding - while a
 static array placed in the data segment requires only arithmetics to
 access.
 

 If you define an array as:
 
 int[10000][] a = new int[10000][100];
 
 only one dereference is required to access the data.
 

 Yes, only one - when there should be none at all. *sigh*
 

Sorry, no can do. Even with stack based static arrays there is a single
dereference. 
An array access always does a dereference. (might you be thinking about
indirects?)

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote:

 Sorry, no can do. Even with stack based static arrays there is a single
dereference.
 An array access always does a dereference. (might you be thinking about
indirects?)

If you're talking about the current situation in D with huge arrays, then it's
so (and those don't fit in the stack anyway).
If not, well, it's certainly very possible if you just declare the array in the
data segment (or with "static" anywhere else). Then, like I said earlier, "its
address and the address of every of its elements is predetermined at compile
time" - so "a static array placed in the data segment requires only arithmetics
to access" ((base address+offset into element) + index*element size).

OPTLINK's limitation is around 16 MB of data for the data segment. This also
includes initializations for structs, so you can't even have struct types with
the gross size of more than 16 MB.

Other languages certainly would allow it, and I'm sure that if I would use a
hex-edited dmd.exe with another linker - had I known a better DMD-compatible
one - everything would work.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote:
 
 If you're talking about the current situation in D with huge arrays,
 then it's so (and those don't fit in the stack anyway).
 
 If not, well, it's certainly very possible if you just declare the
 array in the data segment (or with "static" anywhere else). Then, like
 I said earlier, "its address and the address of every of its elements
 is predetermined at compile time" - so "a static array placed in the
 data segment requires only arithmetics to access" ((base
 address+offset into element) + index*element size).
 
 OPTLINK's limitation is around 16 MB of data for the data segment.
 This also includes initializations for structs, so you can't even have
 struct types with the gross size of more than 16 MB.
 
 Other languages certainly would allow it, and I'm sure that if I would
 use a hex-edited dmd.exe with another linker - had I known a better
 DMD-compatible one - everything would work.
 

Unless I'm smoking something, the code to access any array will look the 
same regardless of where the array is.

Get base aggress
compute offset
add them
*dereference* that address

the only difference is how the base address is found. with static arrays 
it can be coded as a part of the opcode (I think), with general arrays, it 
is loaded from a variable, and with an array in the stack frame, it is pulled 
from the  frame pointer*. But in all cases, their is still a dereference.


* BTW, accessing a local variable also does a dereference, just where the 
offset is known at compile time.

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:

 Unless I'm smoking something, the code to access any array will look the
 same regardless of where the array is.

Well, with how I understand this terminology, "dereferencing" means reading an
offset from a memory location, other than an immediate instruction operand.
Consider:

int i;
i++;

This assembles as:

i dd ?
...
inc i

Assuming i is in the data segment (it has a fixed address), I do not consider
this dereferencing, nor would I think anyone should.

Then we have:

int* i;
...
(*i)++;

Assuming i is in the data segment (it has a fixed address), this is a "single"
dereference. To get the address of the integer we need to read it from a fixed
memory location.

The same with arrays - except, for the first case, it's a static array, and for
the second - it is either a dynamic array, a new-allocated array, etc.

So, how does this boil down in performance? Let me show you:

int[0x4000000] bigArr;
...
int readValue(int index)
{
	return bigArr[index];
}

With -release -O, this assembles to:

readValue proc near
        mov     eax, ds:bigArr[eax*4]
        retn
readValue endp

Now, if bigArr would be a dynamic array or referenced through a pointer, the
resulting code would look like this:

readValue proc near
        mov     ecx, ds:bigArr
        mov     eax, [ecx+eax*4]
        retn
readValue endp

It gets worse if there are further nests, e.g. when using multidimensional
arrays.

That doesn't sound like a big deal, however:

1) when the array is in the data segment, we know the address of every one of
its elements - so random access is faster
2) when using multi-dimensional dynamic arrays, there are more casts
3) rectangular static arrays are much faster when you want to access it by
column - you just add (width*element_size) to the pointer, and go to the
element on the next row
4) when you want to access an address inside the element (you have an array of
structs), you must perform an addition after the multiplication (I think it can
be part of the instruction in the newer instruction sets though) - while, when
the array base address is predetermined, you can just use base_address +
element_offset as the base address, then add index*element_size to that
etc.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

Walter Bright <newshound1 digitalmars.com> writes:

Vladimir Panteleev wrote:
 That doesn't sound like a big deal, however:
 
 1) when the array is in the data segment, we know the address of
 every one of its elements - so random access is faster

You might be surprised. Intel has done a great deal of work optimizing 
and pipelining addressing modes. Furthermore, one would probably access 
such an array repeatedly, meaning the pointer to it would be cached in a 
register. I doubt you'll see a dime's worth of difference.

In any case, as has been shown time and again, where one thinks the time 
is spent in a program is always wrong. Even the experts are always 
wrong. Use a profiler!

 2) when using
 multi-dimensional dynamic arrays, there are more casts

int[1000][] a;
...
a[3][4] = 2;

involves no casts.


 3) rectangular
 static arrays are much faster when you want to access it by column -
 you just add (width*element_size) to the pointer, and go to the
 element on the next row

It's the same for int[1000][]

 4) when you want to access an address inside
 the element (you have an array of structs), you must perform an
 addition after the multiplication (I think it can be part of the
 instruction in the newer instruction sets though) - while, when the
 array base address is predetermined, you can just use base_address +
 element_offset as the base address, then add index*element_size to
 that etc.

Using the address mode offset[reg1][reg2*size] mode is not any slower 
than using offset[reg2*size].

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmar=
s.com> wrote:

 Vladimir Panteleev wrote:
 That doesn't sound like a big deal, however:

 1) when the array is in the data segment, we know the address of
 every one of its elements - so random access is faster

 You might be surprised. Intel has done a great deal of work optimizing=

 and pipelining addressing modes. Furthermore, one would probably acces=

s
 such an array repeatedly, meaning the pointer to it would be cached in=

 a
 register. I doubt you'll see a dime's worth of difference.

 In any case, as has been shown time and again, where one thinks the ti=

me
 is spent in a program is always wrong. Even the experts are always
 wrong. Use a profiler!

I will, if I won't be satisfied by the performance :)

 2) when using
 multi-dimensional dynamic arrays, there are more casts

 int[1000][] a;
 ...
 a[3][4] =3D 2;

 involves no casts.

Erm... I'm sure I wanted to write something other than "casts" there, bu=
t now I even forgot what I wanted to say originally. I think I meant tha=
t when you have imbricated dynamic arrays, there are even more dereferen=
ces.

 3) rectangular
 static arrays are much faster when you want to access it by column -
 you just add (width*element_size) to the pointer, and go to the
 element on the next row

 It's the same for int[1000][]

Right. I always overlook that case!

 4) when you want to access an address inside
 the element (you have an array of structs), you must perform an
 addition after the multiplication (I think it can be part of the
 instruction in the newer instruction sets though) - while, when the
 array base address is predetermined, you can just use base_address +
 element_offset as the base address, then add index*element_size to
 that etc.

 Using the address mode offset[reg1][reg2*size] mode is not any slower
 than using offset[reg2*size].

Yes... that's the one I mentioned (though I'm used to see it as [reg+reg=
2*size+offset]).

Thanks for replying. I wonder how hard would it be to write - or, at lea=
st, start a community project of a Windows PE linker written from scratc=
h - in D, of course :)

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Vladimir Panteleev wrote:
 On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:
 
 Yes... that's the one I mentioned (though I'm used to see it as
[reg+reg2*size+offset]).
 

 Thanks for replying. I wonder how hard would it be to write - or, at
 least, start a community project of a Windows PE linker written from
 scratch - in D, of course :)

For anyone whose first inclination is to hex-edit binaries when 
confronted with a minor annoyance, I'm guessing probably not that hard.  ;-)

--bb

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:
 
 Unless I'm smoking something, the code to access any array will look
 the same regardless of where the array is.
 

 Well, with how I understand this terminology, "dereferencing" means
 reading an offset from a memory location, other than an immediate
 instruction operand. Consider:
 

[...]

I think we are referring to the same thing, but counting different parts 
of it. What I'm counting is the times that a load is done from an address 
that is not an inline value, but comes from a register.

 readValue proc near
 mov     eax, ds:bigArr[eax*4]

If I'm reading this correctly what is happening here is "load address (bigArr 
+ eax*4)

unless IA32 has a special case op for that, your going to have to compute 
the address in another op and even if there is a special case, I rather suspect 
that it will be exactly as fast either way. Furthermore, if more than one 
detention is used you will have to do the computation in other ops no mater 
what.

 retn
 readValue endp

BTW where did you get that ASM dump? I don't recognize the format.

 
 readValue proc near
 mov     ecx, ds:bigArr
 mov     eax, [ecx+eax*4]
 retn
 readValue endp
 It gets worse if there are further nests, e.g. when using
 multidimensional arrays.

unless you are nesting dynamic arrays it's just math, not dereferences (i1 
+ d1*i2 + d2*d1*i3 ...)


But it looks like we agree on that point.

 3) rectangular static arrays are much faster when you want to access
 it by column - you just add (width*element_size) to the pointer, and
 go to the element on the next row

this can be done with a dynamic array of static sized arrays.

 
 4) when you want to access an address inside the element (you have an
 array of structs), you must perform an addition after the
 multiplication (I think it can be part of the instruction in the newer
 instruction sets though) - while, when the array base address is
 predetermined, you can just use base_address + element_offset as the
 base address, then add index*element_size to that

It looks like you are using a (constA + Constb * reg) address mode. I just 
look in the "Intel Architecture Software Developer's Manual" and didn't find 
any reference to it. Am I missing something? (I didn't actual look real hard)

 
 etc.

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:

 I think we are referring to the same thing, but counting different parts
 of it. What I'm counting is the times that a load is done from an address
 that is not an inline value, but comes from a register.

OK, so in the first array example that makes one load (only the value we need),
and in the second - two (one to get the address of the array data, and another
to get the value).

 readValue proc near
 mov     eax, ds:bigArr[eax*4]

 If I'm reading this correctly what is happening here is "load address (bigArr
 + eax*4)

No, that gets the value directly and puts it in eax. To get the address I would
use the "lea" instruction instead of "mov" (which is useful if you want to read
and then write to the address).

 unless IA32 has a special case op for that, your going to have to compute
 the address in another op and even if there is a special case, I rather suspect
 that it will be exactly as fast either way. Furthermore, if more than one
 detention is used you will have to do the computation in other ops no mater
 what.

I posted the disassembly listing of the D code I supplied, so this code works
(there are no missing instructions).

 retn
 readValue endp

 BTW where did you get that ASM dump? I don't recognize the format.

The demo version of Interactive Disassembler[1]. It's quite useful for a
Windows D programmer's arsenal, since it understands D symbolic information.
(They're loaded mangled, so I "demangled" them manually.)

 readValue proc near
 mov     ecx, ds:bigArr
 mov     eax, [ecx+eax*4]
 retn
 readValue endp
 It gets worse if there are further nests, e.g. when using
 multidimensional arrays.

 unless you are nesting dynamic arrays it's just math, not dereferences (i1
 + d1*i2 + d2*d1*i3 ...)

Yes, I meant dynamic arrays (or pointers, for the first "link")...

 3) rectangular static arrays are much faster when you want to access
 it by column - you just add (width*element_size) to the pointer, and
 go to the element on the next row

 this can be done with a dynamic array of static sized arrays.

Yeah. Forgot about those (again).

 4) when you want to access an address inside the element (you have an
 array of structs), you must perform an addition after the
 multiplication (I think it can be part of the instruction in the newer
 instruction sets though) - while, when the array base address is
 predetermined, you can just use base_address + element_offset as the
 base address, then add index*element_size to that

 It looks like you are using a (constA + Constb * reg) address mode. I just
 look in the "Intel Architecture Software Developer's Manual" and didn't find
 any reference to it. Am I missing something? (I didn't actual look real hard)

ConstB must be a power of 2 - and I don't think I can find a link more easily
than you (since I don't use any online - or any other, actually - reference).
As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo +
offset] addressing mode.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:
 
 I think we are referring to the same thing, but counting different
 parts of it. What I'm counting is the times that a load is done from
 an address that is not an inline value, but comes from a register.
 

 OK, so in the first array example that makes one load (only the value
 we need), and in the second - two (one to get the address of the array
 data, and another to get the value).
 

yes, assuming that you have to do some pointer work to get the array (if 
it's a static variable than you can do a "move from inline defined address")

 readValue proc near
 mov     eax, ds:bigArr[eax*4]

 If I'm reading this correctly what is happening here is "load address
 (bigArr + eax*4)
 

 No, that gets the value directly and puts it in eax. To get the
 address I would use the "lea" instruction instead of "mov" (which is
 useful if you want to read and then write to the address).
 

Baahhh. I need to be more careful with what I actually say

that should have been "load from address"

 unless IA32 has a special case op for that, your going to have to
 compute
 the address in another op and even if there is a special case, I
 rather suspect
 that it will be exactly as fast either way. Furthermore, if more than
 one
 detention is used you will have to do the computation in other ops no
 mater
 what.

 I posted the disassembly listing of the D code I supplied, so this
 code works (there are no missing instructions).
 

[...]
 It looks like you are using a (constA + Constb * reg) address mode. I
 just look in the "Intel Architecture Software Developer's Manual" and
 didn't find any reference to it. Am I missing something? (I didn't
 actual look real hard)
 

 ConstB must be a power of 2 - and I don't think I can find a link more
 easily than you (since I don't use any online - or any other, actually
 - reference). As WB mentioned in another reply, there's also a [reg1 +
 reg2*powerOfTwo + offset] addressing mode.
 

sweet

BTW I find these kida handy

http://www.intel.com/design/processor/manuals/253665.pdf    Volume 1: Basic 
Architecture
http://www.intel.com/design/processor/manuals/253666.pdf    Volume 2A:
Instruction 
Set Reference, A-M
http://www.intel.com/design/processor/manuals/253667.pdf    Volume 2B:
Instruction 
Set Reference, N-Z


foud here
http://www.intel.com/products/processor/manuals/index.htm

In the end I rather suspect that what really matters is how many times you 
read from memory, not how you got the address to read from.

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 The demo version of Interactive Disassembler[1]. It's quite useful for
 a Windows D programmer's arsenal, since it understands D symbolic
 information. (They're loaded mangled, so I "demangled" them manually.)

 
link?

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Wed, 03 Oct 2007 03:33:28 +0300, BCS <ao pathlink.com> wrote:

 Reply to Vladimir,

 The demo version of Interactive Disassembler[1]. It's quite useful for
 a Windows D programmer's arsenal, since it understands D symbolic
 information. (They're loaded mangled, so I "demangled" them manually.)

 
 link?

Oops.

 [1] http://datarescue.com/idabase/idadowndemo.htm


-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 03 2007

Walter Bright <newshound1 digitalmars.com> writes:

Vladimir Panteleev wrote:
 Yes, only one - when there should be none at all. *sigh*

For accessing a big array like that, it is hard to see how a single 
dereference through a register is going to affect the run time at all.


 Note that one thing which the compiler fails to prevent but still
 chokes the linker are having more than 16 Mb of data or even distinct
 types (which need initial values, such as structs/classes). Example:
 
 uint[0x800000] a,b,c,d;
 
 void main() { }
 
 So - please, Just Fix That Linker ;)

Easier said than done :-(

Oct 02 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)

 Easier said than done :-(

Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh 
Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since 
this whole OPTLINK thing is having one suboptimal consequence after another. 
I think it's time to let the poor thing die.

Yes, I am _actually_ advocating the development of a different backend, or 
linker, or object format etc. over the release of D2.0.  Yes, I am actually 
more interested in that.

Oct 02 2007

Christopher Wright <dhasenan gmail.com> writes:

Jarrett Billingsley wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)

 Easier said than done :-(

 
 Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh 
 Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since 
 this whole OPTLINK thing is having one suboptimal consequence after another. 
 I think it's time to let the poor thing die.
 
 Yes, I am _actually_ advocating the development of a different backend, or 
 linker, or object format etc. over the release of D2.0.  Yes, I am actually 
 more interested in that. 

There's GDC. Walter could make that the default branch (and actually 
mentioned that as a possibility at the conference). Though one of the 
benefits of D is a fast compiler, and I don't know how much faster gdc 
is compared to, say, gcc.

Oct 02 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Christopher Wright wrote:

 Yes, I am _actually_ advocating the development of a different 
 backend, or linker, or object format etc. over the release of D2.0.  
 Yes, I am actually more interested in that. 

 
 There's GDC. Walter could make that the default branch (and actually 
 mentioned that as a possibility at the conference). Though one of the 
 benefits of D is a fast compiler, and I don't know how much faster gdc 
 is compared to, say, gcc.

It usually clocks in somewhere inbetween gcc and g++, but I'm sure
that says more about the language complexity than about the compilers...

Not sure how LLVM does for speed, as I've only used it through GCC.
But I think the speed-above-everything DMD and OPTLINK still "win".

--anders

Feb 20 2008

naryl <cy ngs.ru> writes:

On Wed, 20 Feb 2008 11:30:40 +0300, Anders F Björklund <afb algonet.se>  
wrote:

 Christopher Wright wrote:

 Yes, I am _actually_ advocating the development of a different  
 backend, or linker, or object format etc. over the release of D2.0.   
 Yes, I am actually more interested in that.

  There's GDC. Walter could make that the default branch (and actually  
 mentioned that as a possibility at the conference). Though one of the  
 benefits of D is a fast compiler, and I don't know how much faster gdc  
 is compared to, say, gcc.

 It usually clocks in somewhere inbetween gcc and g++, but I'm sure
 that says more about the language complexity than about the compilers...

 Not sure how LLVM does for speed, as I've only used it through GCC.
 But I think the speed-above-everything DMD and OPTLINK still "win".

 --anders

It is somewhat outdated but still:  
http://www.digitalmars.com/d/1.0/cppstrings.html

There is a comparison at the end.

Feb 20 2008

Walter Bright <newshound1 digitalmars.com> writes:

Jarrett Billingsley wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)

 Easier said than done :-(

 
 Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh 
 Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since 
 this whole OPTLINK thing is having one suboptimal consequence after another. 
 I think it's time to let the poor thing die.

It's not that bad, especially not this particular issue, which has a 
painless workaround.


 Yes, I am _actually_ advocating the development of a different backend, or 
 linker, or object format etc. over the release of D2.0.  Yes, I am actually 
 more interested in that.

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 Why?
 
 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.
 
 But I noticed this statement in the documentation today. And, frankly,
 I don't understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe limitation
 for anyone wanting to work with large amounts of data (e.g. game
 maps). Of course I could work around this by calculating the positions
 manually using x*width+y, but I'd like to know the reason why this
 limitation was put there in the first place, and what's the reasoning
 behind it.

just new the array.

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:

 Reply to Vladimir,

 Why?

 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.

 But I noticed this statement in the documentation today. And, frankly=


,
 I don't understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe limitatio=


n
 for anyone wanting to work with large amounts of data (e.g. game
 maps). Of course I could work around this by calculating the position=


s
 manually using x*width+y, but I'd like to know the reason why this
 limitation was put there in the first place, and what's the reasoning=


 behind it.

 just new the array.

If you mean that I do something like:
	int[4096][4096]* arr =3D (new int[4096][4096][1]).ptr;
1) the compiler still (normally) doesn't let me do it
2) it's slower for random access due to having to dereference a pointer.=


But, yes, this is the sanest workaround so far.

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 If you mean that I do something like:
         int[4096][4096]* arr = (new int[4096][4096][1]).ptr;

Why not just
class Array2D(T)
{
    this(int width, int height)
    {
        this.width = width;
        a = new T(width * height);
    }

    T opIndex(int x, int y)
    {
        return a[x * width + y];
    }

    void opIndexAssign(T n, int x, int y)
    {
        a[x * width + y] = n;
    }

    T[] a;
    int width;
}

Then
    arr = new Array2D(4096,4096);

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:
 Then
     arr = new Array2D(4096,4096);

Sorry. That should be

    arr = new Array!(int)(4096,4096);

then you just refer to the elements as arr[x,y] as you'd expect.

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:
 Sorry. That should be

     arr = new Array!(int)(4096,4096);

Gaah! That's too many typos for one morning!

arr = new Array2D!(int)(4096,4096);

Anyway, hopefully you get the idea. I'm going for more coffee before I
mistype anything else!

Oct 02 2007

Vladimir Panteleev <thecybershadow gmail.com> writes:

Janice Caron Wrote:
 Anyway, hopefully you get the idea. I'm going for more coffee before I
 mistype anything else!

Thanks - this indeed makes the code more readable, however now accessing an
element involves two dereferences (the class, and the array pointer) - and,
possibly, a function call if the compiler won't inline it. My main concern is
the speed, otherwise I wouldn't be complaining and just use dynamic arrays :P

-- Vladimir

Oct 02 2007

"Janice Caron" <caron800 googlemail.com> writes:

On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 Janice Caron Wrote:
 Anyway, hopefully you get the idea. I'm going for more coffee before I
 mistype anything else!

 Thanks - this indeed makes the code more readable, however now accessing an
element involves two dereferences (the class, and the array pointer)

It doesn't have to be a class. It could just as easily be a struct.
Then you've only got one dereference.

 - and, possibly, a function call if the compiler won't inline it.

I would hope that the compiler would inline that. Maybe not in debug
mode, but certainly in release mode.

Oct 02 2007

BCS <ao pathlink.com> writes:

Reply to Vladimir,

 On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:
 
 Reply to Vladimir,
 
 Why?
 
 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.
 
 But I noticed this statement in the documentation today. And,
 frankly, I don't understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe
 limitation for anyone wanting to work with large amounts of data
 (e.g. game maps). Of course I could work around this by calculating
 the positions manually using x*width+y, but I'd like to know the
 reason why this limitation was put there in the first place, and
 what's the reasoning behind it.
 

 just new the array.
 

 If you mean that I do something like:
 int[4096][4096]* arr = (new int[4096][4096][1]).ptr;
 1) the compiler still (normally) doesn't let me do it
 2) it's slower for random access due to having to dereference a
 pointer.
 
 But, yes, this is the sanest workaround so far.
 

I was thinking of this:

alias int[4096] block;   // alias to be shure I get dynamic array of static 
arrays
block[] arr = new block[4096];

Oct 02 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Oct 2007 18:12:41 +0300, BCS <ao pathlink.com> wrote:

 I was thinking of this:

 alias int[4096] block;   // alias to be shure I get dynamic array of s=

tatic arrays
 block[] arr =3D new block[4096];

Why haven't I thought of that... :O

-- =

Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Oct 02 2007

D Programming

C/C++ Programming

Other

digitalmars.D - "The total size of a static array cannot exceed 16Mb."