www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - "The total size of a static array cannot exceed 16Mb."

reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
Why?

I posted a bug report yesterday, thinking it was just some accidentally left
over code.
But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

The check is in mtype.c, line 1835 (DMD 1.021).

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com
Oct 01 2007
next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Vladimir Panteleev wrote:
 Why?
 
 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.
 
 The check is in mtype.c, line 1835 (DMD 1.021).
This is a reply from Walter last year: http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 -- Oskar
Oct 01 2007
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 09:44:22 +0300, Oskar Linde <oskar.lindeREM ovegmail=
.com> wrote:

 This is a reply from Walter last year:

 http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071
If you ask me, most of those reasons are excuses for 6). Putting restric= tions on things which MAY hurt you IF you do things you're never going t= o do is, to put it softly, not a very good idea. A static array in the u= ninitialized part of the data segment allows me to randomly access any p= art of it very quickly, because its address and the address of every of = its elements is predetermined at compile time, so - unlike with the case= of dynamic arrays or pointers to static arrays - you do not need to der= eference pointers every time. I'll say it again, D is the only compiled = language that has such a limitation. = As for point 5) - well, the executable grows by the size of the array re= gardless if the array has any initialization. Heck, it adds initializati= on data even if I use "=3Dvoid". -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
prev sibling next sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev
<thecybershadow gmail.com> wrote:

 Why?

 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

 The check is in mtype.c, line 1835 (DMD 1.021).
I think I found the cause of this limitation - OPTLINK hangs, crashes and/or generates corrupt data if I try to force it such input. It works if I use a pointer to a huge static array, but DMD wouldn't accept that as input anyway. I thought that putting the array in a struct or class would prevent the linker from generating a >16MB data segment, but I forgot that D needs(!) to have an initial value for every type, even if it's completely null. Once again, I'll mention that Borland and Microsoft's linkers don't have such a limitation, and they can generate data segments over 16 MB with no problems. So, suggestions for fixing this: 1) fix or rewrite OPTLINK ("rewrite" because I keep hearing about how OPTLINK holds D back for one reason or another; I know it's no small task, however D doesn't need a multi-purpose linker, just one that can handle what the DMD compiler generates) 2) if a type doesn't contain any non-0 initial values, instead of generating a big block of zeroes for the initializer, initialize it with a memset (rep stosd/stosb). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 01 2007
parent reply Christopher Wright <dhasenan gmail.com> writes:
Vladimir Panteleev wrote:
 On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
 
 Why?

 I posted a bug report yesterday, thinking it was just some accidentally left
over code.
 But I noticed this statement in the documentation today. And, frankly, I don't
understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional arrays.
Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static
arrays - and this is a severe limitation for anyone wanting to work with large
amounts of data (e.g. game maps). Of course I could work around this by
calculating the positions manually using x*width+y, but I'd like to know the
reason why this limitation was put there in the first place, and what's the
reasoning behind it. No other compiled language that I used (C/C++/Delphi) has
such a limitation.

 The check is in mtype.c, line 1835 (DMD 1.021).
I think I found the cause of this limitation - OPTLINK hangs, crashes and/or generates corrupt data if I try to force it such input. It works if I use a pointer to a huge static array, but DMD wouldn't accept that as input anyway. I thought that putting the array in a struct or class would prevent the linker from generating a >16MB data segment, but I forgot that D needs(!) to have an initial value for every type, even if it's completely null. Once again, I'll mention that Borland and Microsoft's linkers don't have such a limitation, and they can generate data segments over 16 MB with no problems. So, suggestions for fixing this: 1) fix or rewrite OPTLINK ("rewrite" because I keep hearing about how OPTLINK holds D back for one reason or another; I know it's no small task, however D doesn't need a multi-purpose linker, just one that can handle what the DMD compiler generates) 2) if a type doesn't contain any non-0 initial values, instead of generating a big block of zeroes for the initializer, initialize it with a memset (rep stosd/stosb).
T[][] SquareArray (T) (int x, int y) { T[] block = new T[x * y]; T[][] ret = []; ret.length = x; for (int i = 0; i < x; i++) { ret[i] = block[i * y .. (i + 1) * y]; } return ret; } void main () { int[][] array = SquareArray!(int)(9000, 9000); } Slightly tedious, and it gives you reference semantics and places the array on the heap (but do you really *want* 16MB stack frames? I imagine the OS might have something to say about that).
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 16:18:26 +0300, Christopher Wright <dhasenan gmail.c=
om> wrote:

 T[][] SquareArray (T) (int x, int y) {
     T[] block =3D new T[x * y];
     T[][] ret =3D [];
     ret.length =3D x;
     for (int i =3D 0; i < x; i++) {
        ret[i] =3D block[i * y .. (i + 1) * y];
     }

     return ret;
 }

 void main () {
     int[][] array =3D SquareArray!(int)(9000, 9000);
 }

 Slightly tedious, and it gives you reference semantics and places the
 array on the heap (but do you really *want* 16MB stack frames? I imagi=
ne
 the OS might have something to say about that).
Thanks, however dynamic arrays are unfortunately slow. To access a rando= m element, you need a dereference, a multiplication+addition, another de= reference, another multiplication+addition. Janice Caron's workaround, w= hen modified to use a struct placed in the data segment, requires only o= ne dereferencing/multiply+adding - while a static array placed in the da= ta segment requires only arithmetics to access. Also, I was planning to place the array in the uninitialized data segmen= t (I believe it is called BSS in ELF and object files), not the stack - = I believe the default stack size for Windows executables to be 256 kB an= yway. P.S. I believe your function can be reduced to this without any signific= ant changes in behavior: T[][] ret =3D new T[][x]; foreach(ref row;ret) row.length =3D y; return ret; -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 P.S. I believe your function can be reduced to this without any significant
changes in behavior:

      T[][] ret = new T[][x];
      foreach(ref row;ret)
          row.length = y;

      return ret;
For that matter, I think you could also do: template MultiDimArray(T,int N) { static if (N == 1) typedef T[] MultiDimArray; else typedef MultiDimArray!(T,N-1)[] MultiDimArray; } MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] others...) { static if (N == 1) return new T[dim]; else { MultiDimArray!(T,N) t; t.length = dim; foreach(ref a;t) a = MakeMultiDimArray!(T,N-1)(others[0],others[1..$]); return t; } }
Oct 02 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.353.1191333299.16939.digitalmars-d puremagic.com...
 On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 P.S. I believe your function can be reduced to this without any 
 significant changes in behavior:

      T[][] ret = new T[][x];
      foreach(ref row;ret)
          row.length = y;

      return ret;
For that matter, I think you could also do: template MultiDimArray(T,int N) { static if (N == 1) typedef T[] MultiDimArray; else typedef MultiDimArray!(T,N-1)[] MultiDimArray; } MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] others...) { static if (N == 1) return new T[dim]; else { MultiDimArray!(T,N) t; t.length = dim; foreach(ref a;t) a = MakeMultiDimArray!(T,N-1)(others[0],others[1..$]); return t; } }
You've basically implemented, in templates, what you can already do with the language syntax. MultiDimArray!(int, 3) will turn into int[][][], and the functionality of MakeMultiDimArray is already covered by array new-ing, i.e. new int[][][](10, 20, 30). And it's still an array-of-arrays.
Oct 02 2007
parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote:
 functionality of MakeMultiDimArray is already covered by array new-ing, i.e.
 new int[][][](10, 20, 30).
Ooh! I didn't know you could do that! Nice!
Oct 02 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Janice Caron" <caron800 googlemail.com> wrote in message 
news:mailman.362.1191349640.16939.digitalmars-d puremagic.com...
 On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote:
 functionality of MakeMultiDimArray is already covered by array new-ing, 
 i.e.
 new int[][][](10, 20, 30).
Ooh! I didn't know you could do that! Nice!
:) Actually 'new int[5]' is exactly equivalent to 'new int[](5)' -- the former is sugar for the latter. It's just most people are used to writing it like the latter.
Oct 02 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data segment,
 requires only one dereferencing/multiply+adding - while a static
 array placed in the data segment requires only arithmetics to access.
If you define an array as: int[10000][] a = new int[10000][100]; only one dereference is required to access the data.
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright <newshound1 digitalmar=
s.com> wrote:

 Vladimir Panteleev wrote:
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data segment,=
 requires only one dereferencing/multiply+adding - while a static
 array placed in the data segment requires only arithmetics to access.=
 If you define an array as:

 int[10000][] a =3D new int[10000][100];

 only one dereference is required to access the data.
Yes, only one - when there should be none at all. *sigh* Note that one thing which the compiler fails to prevent but still chokes= the linker are having more than 16 Mb of data or even distinct types (w= hich need initial values, such as structs/classes). Example: uint[0x800000] a,b,c,d; void main() { } So - please, Just Fix That Linker ;) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
 Vladimir Panteleev wrote:
 
 Thanks, however dynamic arrays are unfortunately slow. To access a
 random element, you need a dereference, a multiplication+addition,
 another dereference, another multiplication+addition. Janice Caron's
 workaround, when modified to use a struct placed in the data
 segment, requires only one dereferencing/multiply+adding - while a
 static array placed in the data segment requires only arithmetics to
 access.
 
If you define an array as: int[10000][] a = new int[10000][100]; only one dereference is required to access the data.
Yes, only one - when there should be none at all. *sigh*
Sorry, no can do. Even with stack based static arrays there is a single dereference. An array access always does a dereference. (might you be thinking about indirects?)
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote:

 Sorry, no can do. Even with stack based static arrays there is a single
dereference.
 An array access always does a dereference. (might you be thinking about
indirects?)
If you're talking about the current situation in D with huge arrays, then it's so (and those don't fit in the stack anyway). If not, well, it's certainly very possible if you just declare the array in the data segment (or with "static" anywhere else). Then, like I said earlier, "its address and the address of every of its elements is predetermined at compile time" - so "a static array placed in the data segment requires only arithmetics to access" ((base address+offset into element) + index*element size). OPTLINK's limitation is around 16 MB of data for the data segment. This also includes initializations for structs, so you can't even have struct types with the gross size of more than 16 MB. Other languages certainly would allow it, and I'm sure that if I would use a hex-edited dmd.exe with another linker - had I known a better DMD-compatible one - everything would work. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote:
 
 If you're talking about the current situation in D with huge arrays,
 then it's so (and those don't fit in the stack anyway).
 
 If not, well, it's certainly very possible if you just declare the
 array in the data segment (or with "static" anywhere else). Then, like
 I said earlier, "its address and the address of every of its elements
 is predetermined at compile time" - so "a static array placed in the
 data segment requires only arithmetics to access" ((base
 address+offset into element) + index*element size).
 
 OPTLINK's limitation is around 16 MB of data for the data segment.
 This also includes initializations for structs, so you can't even have
 struct types with the gross size of more than 16 MB.
 
 Other languages certainly would allow it, and I'm sure that if I would
 use a hex-edited dmd.exe with another linker - had I known a better
 DMD-compatible one - everything would work.
 
Unless I'm smoking something, the code to access any array will look the same regardless of where the array is. Get base aggress compute offset add them *dereference* that address the only difference is how the base address is found. with static arrays it can be coded as a part of the opcode (I think), with general arrays, it is loaded from a variable, and with an array in the stack frame, it is pulled from the frame pointer*. But in all cases, their is still a dereference. * BTW, accessing a local variable also does a dereference, just where the offset is known at compile time.
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:

 Unless I'm smoking something, the code to access any array will look the
 same regardless of where the array is.
Well, with how I understand this terminology, "dereferencing" means reading an offset from a memory location, other than an immediate instruction operand. Consider: int i; i++; This assembles as: i dd ? ... inc i Assuming i is in the data segment (it has a fixed address), I do not consider this dereferencing, nor would I think anyone should. Then we have: int* i; ... (*i)++; Assuming i is in the data segment (it has a fixed address), this is a "single" dereference. To get the address of the integer we need to read it from a fixed memory location. The same with arrays - except, for the first case, it's a static array, and for the second - it is either a dynamic array, a new-allocated array, etc. So, how does this boil down in performance? Let me show you: int[0x4000000] bigArr; ... int readValue(int index) { return bigArr[index]; } With -release -O, this assembles to: readValue proc near mov eax, ds:bigArr[eax*4] retn readValue endp Now, if bigArr would be a dynamic array or referenced through a pointer, the resulting code would look like this: readValue proc near mov ecx, ds:bigArr mov eax, [ecx+eax*4] retn readValue endp It gets worse if there are further nests, e.g. when using multidimensional arrays. That doesn't sound like a big deal, however: 1) when the array is in the data segment, we know the address of every one of its elements - so random access is faster 2) when using multi-dimensional dynamic arrays, there are more casts 3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next row 4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that etc. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 That doesn't sound like a big deal, however:
 
 1) when the array is in the data segment, we know the address of
 every one of its elements - so random access is faster
You might be surprised. Intel has done a great deal of work optimizing and pipelining addressing modes. Furthermore, one would probably access such an array repeatedly, meaning the pointer to it would be cached in a register. I doubt you'll see a dime's worth of difference. In any case, as has been shown time and again, where one thinks the time is spent in a program is always wrong. Even the experts are always wrong. Use a profiler!
 2) when using
 multi-dimensional dynamic arrays, there are more casts
int[1000][] a; ... a[3][4] = 2; involves no casts.
 3) rectangular
 static arrays are much faster when you want to access it by column -
 you just add (width*element_size) to the pointer, and go to the
 element on the next row
It's the same for int[1000][]
 4) when you want to access an address inside
 the element (you have an array of structs), you must perform an
 addition after the multiplication (I think it can be part of the
 instruction in the newer instruction sets though) - while, when the
 array base address is predetermined, you can just use base_address +
 element_offset as the base address, then add index*element_size to
 that etc.
Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmar=
s.com> wrote:

 Vladimir Panteleev wrote:
 That doesn't sound like a big deal, however:

 1) when the array is in the data segment, we know the address of
 every one of its elements - so random access is faster
You might be surprised. Intel has done a great deal of work optimizing=
 and pipelining addressing modes. Furthermore, one would probably acces=
s
 such an array repeatedly, meaning the pointer to it would be cached in=
a
 register. I doubt you'll see a dime's worth of difference.

 In any case, as has been shown time and again, where one thinks the ti=
me
 is spent in a program is always wrong. Even the experts are always
 wrong. Use a profiler!
I will, if I won't be satisfied by the performance :)
 2) when using
 multi-dimensional dynamic arrays, there are more casts
int[1000][] a; ... a[3][4] =3D 2; involves no casts.
Erm... I'm sure I wanted to write something other than "casts" there, bu= t now I even forgot what I wanted to say originally. I think I meant tha= t when you have imbricated dynamic arrays, there are even more dereferen= ces.
 3) rectangular
 static arrays are much faster when you want to access it by column -
 you just add (width*element_size) to the pointer, and go to the
 element on the next row
It's the same for int[1000][]
Right. I always overlook that case!
 4) when you want to access an address inside
 the element (you have an array of structs), you must perform an
 addition after the multiplication (I think it can be part of the
 instruction in the newer instruction sets though) - while, when the
 array base address is predetermined, you can just use base_address +
 element_offset as the base address, then add index*element_size to
 that etc.
Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].
Yes... that's the one I mentioned (though I'm used to see it as [reg+reg= 2*size+offset]). Thanks for replying. I wonder how hard would it be to write - or, at lea= st, start a community project of a Windows PE linker written from scratc= h - in D, of course :) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Vladimir Panteleev wrote:
 On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:
 
 Yes... that's the one I mentioned (though I'm used to see it as
[reg+reg2*size+offset]).
 
 Thanks for replying. I wonder how hard would it be to write - or, at
 least, start a community project of a Windows PE linker written from
 scratch - in D, of course :)
For anyone whose first inclination is to hex-edit binaries when confronted with a minor annoyance, I'm guessing probably not that hard. ;-) --bb
Oct 02 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:
 
 Unless I'm smoking something, the code to access any array will look
 the same regardless of where the array is.
 
Well, with how I understand this terminology, "dereferencing" means reading an offset from a memory location, other than an immediate instruction operand. Consider:
[...] I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.
 readValue proc near
 mov     eax, ds:bigArr[eax*4]
If I'm reading this correctly what is happening here is "load address (bigArr + eax*4) unless IA32 has a special case op for that, your going to have to compute the address in another op and even if there is a special case, I rather suspect that it will be exactly as fast either way. Furthermore, if more than one detention is used you will have to do the computation in other ops no mater what.
 retn
 readValue endp
BTW where did you get that ASM dump? I don't recognize the format.
 
 readValue proc near
 mov     ecx, ds:bigArr
 mov     eax, [ecx+eax*4]
 retn
 readValue endp
 It gets worse if there are further nests, e.g. when using
 multidimensional arrays.
unless you are nesting dynamic arrays it's just math, not dereferences (i1 + d1*i2 + d2*d1*i3 ...) But it looks like we agree on that point.
 3) rectangular static arrays are much faster when you want to access
 it by column - you just add (width*element_size) to the pointer, and
 go to the element on the next row
this can be done with a dynamic array of static sized arrays.
 
 4) when you want to access an address inside the element (you have an
 array of structs), you must perform an addition after the
 multiplication (I think it can be part of the instruction in the newer
 instruction sets though) - while, when the array base address is
 predetermined, you can just use base_address + element_offset as the
 base address, then add index*element_size to that
It looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)
 
 etc.
 
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:

 I think we are referring to the same thing, but counting different parts
 of it. What I'm counting is the times that a load is done from an address
 that is not an inline value, but comes from a register.
OK, so in the first array example that makes one load (only the value we need), and in the second - two (one to get the address of the array data, and another to get the value).
 readValue proc near
 mov     eax, ds:bigArr[eax*4]
If I'm reading this correctly what is happening here is "load address (bigArr + eax*4)
No, that gets the value directly and puts it in eax. To get the address I would use the "lea" instruction instead of "mov" (which is useful if you want to read and then write to the address).
 unless IA32 has a special case op for that, your going to have to compute
 the address in another op and even if there is a special case, I rather suspect
 that it will be exactly as fast either way. Furthermore, if more than one
 detention is used you will have to do the computation in other ops no mater
 what.
I posted the disassembly listing of the D code I supplied, so this code works (there are no missing instructions).
 retn
 readValue endp
BTW where did you get that ASM dump? I don't recognize the format.
The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)
 readValue proc near
 mov     ecx, ds:bigArr
 mov     eax, [ecx+eax*4]
 retn
 readValue endp
 It gets worse if there are further nests, e.g. when using
 multidimensional arrays.
unless you are nesting dynamic arrays it's just math, not dereferences (i1 + d1*i2 + d2*d1*i3 ...)
Yes, I meant dynamic arrays (or pointers, for the first "link")...
 3) rectangular static arrays are much faster when you want to access
 it by column - you just add (width*element_size) to the pointer, and
 go to the element on the next row
this can be done with a dynamic array of static sized arrays.
Yeah. Forgot about those (again).
 4) when you want to access an address inside the element (you have an
 array of structs), you must perform an addition after the
 multiplication (I think it can be part of the instruction in the newer
 instruction sets though) - while, when the array base address is
 predetermined, you can just use base_address + element_offset as the
 base address, then add index*element_size to that
It looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)
ConstB must be a power of 2 - and I don't think I can find a link more easily than you (since I don't use any online - or any other, actually - reference). As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo + offset] addressing mode. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
next sibling parent BCS <ao pathlink.com> writes:
Reply to Vladimir,

 On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:
 
 I think we are referring to the same thing, but counting different
 parts of it. What I'm counting is the times that a load is done from
 an address that is not an inline value, but comes from a register.
 
OK, so in the first array example that makes one load (only the value we need), and in the second - two (one to get the address of the array data, and another to get the value).
yes, assuming that you have to do some pointer work to get the array (if it's a static variable than you can do a "move from inline defined address")
 readValue proc near
 mov     eax, ds:bigArr[eax*4]
If I'm reading this correctly what is happening here is "load address (bigArr + eax*4)
No, that gets the value directly and puts it in eax. To get the address I would use the "lea" instruction instead of "mov" (which is useful if you want to read and then write to the address).
Baahhh. I need to be more careful with what I actually say that should have been "load from address"
 unless IA32 has a special case op for that, your going to have to
 compute
 the address in another op and even if there is a special case, I
 rather suspect
 that it will be exactly as fast either way. Furthermore, if more than
 one
 detention is used you will have to do the computation in other ops no
 mater
 what.
I posted the disassembly listing of the D code I supplied, so this code works (there are no missing instructions).
[...]
 It looks like you are using a (constA + Constb * reg) address mode. I
 just look in the "Intel Architecture Software Developer's Manual" and
 didn't find any reference to it. Am I missing something? (I didn't
 actual look real hard)
 
ConstB must be a power of 2 - and I don't think I can find a link more easily than you (since I don't use any online - or any other, actually - reference). As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo + offset] addressing mode.
sweet BTW I find these kida handy http://www.intel.com/design/processor/manuals/253665.pdf Volume 1: Basic Architecture http://www.intel.com/design/processor/manuals/253666.pdf Volume 2A: Instruction Set Reference, A-M http://www.intel.com/design/processor/manuals/253667.pdf Volume 2B: Instruction Set Reference, N-Z foud here http://www.intel.com/products/processor/manuals/index.htm In the end I rather suspect that what really matters is how many times you read from memory, not how you got the address to read from.
Oct 02 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 The demo version of Interactive Disassembler[1]. It's quite useful for
 a Windows D programmer's arsenal, since it understands D symbolic
 information. (They're loaded mangled, so I "demangled" them manually.)
link?
Oct 02 2007
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Wed, 03 Oct 2007 03:33:28 +0300, BCS <ao pathlink.com> wrote:

 Reply to Vladimir,

 The demo version of Interactive Disassembler[1]. It's quite useful for
 a Windows D programmer's arsenal, since it understands D symbolic
 information. (They're loaded mangled, so I "demangled" them manually.)
link?
Oops.
 [1] http://datarescue.com/idabase/idadowndemo.htm
-- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 03 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 Yes, only one - when there should be none at all. *sigh*
For accessing a big array like that, it is hard to see how a single dereference through a register is going to affect the run time at all.
 Note that one thing which the compiler fails to prevent but still
 chokes the linker are having more than 16 Mb of data or even distinct
 types (which need initial values, such as structs/classes). Example:
 
 uint[0x800000] a,b,c,d;
 
 void main() { }
 
 So - please, Just Fix That Linker ;)
Easier said than done :-(
Oct 02 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)
Easier said than done :-(
Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die. Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.
Oct 02 2007
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Jarrett Billingsley wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)
Easier said than done :-(
Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die. Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.
There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
Oct 02 2007
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Christopher Wright wrote:

 Yes, I am _actually_ advocating the development of a different 
 backend, or linker, or object format etc. over the release of D2.0.  
 Yes, I am actually more interested in that. 
There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
It usually clocks in somewhere inbetween gcc and g++, but I'm sure that says more about the language complexity than about the compilers... Not sure how LLVM does for speed, as I've only used it through GCC. But I think the speed-above-everything DMD and OPTLINK still "win". --anders
Feb 20 2008
parent naryl <cy ngs.ru> writes:
On Wed, 20 Feb 2008 11:30:40 +0300, Anders F Björklund <afb algonet.se>  
wrote:

 Christopher Wright wrote:

 Yes, I am _actually_ advocating the development of a different  
 backend, or linker, or object format etc. over the release of D2.0.   
 Yes, I am actually more interested in that.
There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
It usually clocks in somewhere inbetween gcc and g++, but I'm sure that says more about the language complexity than about the compilers... Not sure how LLVM does for speed, as I've only used it through GCC. But I think the speed-above-everything DMD and OPTLINK still "win". --anders
It is somewhat outdated but still: http://www.digitalmars.com/d/1.0/cppstrings.html There is a comparison at the end.
Feb 20 2008
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Jarrett Billingsley wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:fduing$trk$1 digitalmars.com...
 So - please, Just Fix That Linker ;)
Easier said than done :-(
Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die.
It's not that bad, especially not this particular issue, which has a painless workaround.
 Yes, I am _actually_ advocating the development of a different backend, or 
 linker, or object format etc. over the release of D2.0.  Yes, I am actually 
 more interested in that. 
Oct 02 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 Why?
 
 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.
 
 But I noticed this statement in the documentation today. And, frankly,
 I don't understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe limitation
 for anyone wanting to work with large amounts of data (e.g. game
 maps). Of course I could work around this by calculating the positions
 manually using x*width+y, but I'd like to know the reason why this
 limitation was put there in the first place, and what's the reasoning
 behind it.
just new the array.
Oct 02 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:

 Reply to Vladimir,

 Why?

 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.

 But I noticed this statement in the documentation today. And, frankly=
,
 I don't understand why that limitation is there.

 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe limitatio=
n
 for anyone wanting to work with large amounts of data (e.g. game
 maps). Of course I could work around this by calculating the position=
s
 manually using x*width+y, but I'd like to know the reason why this
 limitation was put there in the first place, and what's the reasoning=
 behind it.
just new the array.
If you mean that I do something like: int[4096][4096]* arr =3D (new int[4096][4096][1]).ptr; 1) the compiler still (normally) doesn't let me do it 2) it's slower for random access due to having to dereference a pointer.= But, yes, this is the sanest workaround so far. -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 If you mean that I do something like:
         int[4096][4096]* arr = (new int[4096][4096][1]).ptr;
Why not just class Array2D(T) { this(int width, int height) { this.width = width; a = new T(width * height); } T opIndex(int x, int y) { return a[x * width + y]; } void opIndexAssign(T n, int x, int y) { a[x * width + y] = n; } T[] a; int width; } Then arr = new Array2D(4096,4096);
Oct 02 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:
 Then
     arr = new Array2D(4096,4096);
Sorry. That should be arr = new Array!(int)(4096,4096); then you just refer to the elements as arr[x,y] as you'd expect.
Oct 02 2007
prev sibling next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:
 Sorry. That should be

     arr = new Array!(int)(4096,4096);
Gaah! That's too many typos for one morning! arr = new Array2D!(int)(4096,4096); Anyway, hopefully you get the idea. I'm going for more coffee before I mistype anything else!
Oct 02 2007
parent reply Vladimir Panteleev <thecybershadow gmail.com> writes:
Janice Caron Wrote:
 Anyway, hopefully you get the idea. I'm going for more coffee before I
 mistype anything else!
Thanks - this indeed makes the code more readable, however now accessing an element involves two dereferences (the class, and the array pointer) - and, possibly, a function call if the compiler won't inline it. My main concern is the speed, otherwise I wouldn't be complaining and just use dynamic arrays :P -- Vladimir
Oct 02 2007
parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:
 Janice Caron Wrote:
 Anyway, hopefully you get the idea. I'm going for more coffee before I
 mistype anything else!
Thanks - this indeed makes the code more readable, however now accessing an element involves two dereferences (the class, and the array pointer)
It doesn't have to be a class. It could just as easily be a struct. Then you've only got one dereference.
 - and, possibly, a function call if the compiler won't inline it.
I would hope that the compiler would inline that. Maybe not in debug mode, but certainly in release mode.
Oct 02 2007
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Vladimir,

 On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:
 
 Reply to Vladimir,
 
 Why?
 
 I posted a bug report yesterday, thinking it was just some
 accidentally left over code.
 
 But I noticed this statement in the documentation today. And,
 frankly, I don't understand why that limitation is there.
 
 The problem with this restriction comes when using multi-dimensional
 arrays. Branched dynamic arrays are MUCH slower than
 rectangular/cubic/etc. static arrays - and this is a severe
 limitation for anyone wanting to work with large amounts of data
 (e.g. game maps). Of course I could work around this by calculating
 the positions manually using x*width+y, but I'd like to know the
 reason why this limitation was put there in the first place, and
 what's the reasoning behind it.
 
just new the array.
If you mean that I do something like: int[4096][4096]* arr = (new int[4096][4096][1]).ptr; 1) the compiler still (normally) doesn't let me do it 2) it's slower for random access due to having to dereference a pointer. But, yes, this is the sanest workaround so far.
I was thinking of this: alias int[4096] block; // alias to be shure I get dynamic array of static arrays block[] arr = new block[4096];
Oct 02 2007
parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 02 Oct 2007 18:12:41 +0300, BCS <ao pathlink.com> wrote:

 I was thinking of this:

 alias int[4096] block;   // alias to be shure I get dynamic array of s=
tatic arrays
 block[] arr =3D new block[4096];
Why haven't I thought of that... :O -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007