digitalmars.D - "The total size of a static array cannot exceed 16Mb."
- Vladimir Panteleev (8/8) Oct 01 2007 Why?
- Oskar Linde (5/13) Oct 01 2007 This is a reply from Walter last year:
- Vladimir Panteleev (17/19) Oct 02 2007 If you ask me, most of those reasons are excuses for 6). Putting restric...
- Vladimir Panteleev (9/14) Oct 01 2007 I think I found the cause of this limitation - OPTLINK hangs, crashes an...
- Christopher Wright (16/35) Oct 02 2007 T[][] SquareArray (T) (int x, int y) {
- Vladimir Panteleev (22/37) Oct 02 2007 ne
- Janice Caron (22/27) Oct 02 2007 For that matter, I think you could also do:
- Jarrett Billingsley (7/38) Oct 02 2007 You've basically implemented, in templates, what you can already do with...
- Janice Caron (3/5) Oct 02 2007 Ooh! I didn't know you could do that!
- Jarrett Billingsley (6/12) Oct 02 2007 :)
- Walter Bright (4/10) Oct 02 2007 If you define an array as:
- Vladimir Panteleev (14/24) Oct 02 2007 Yes, only one - when there should be none at all. *sigh*
- BCS (3/24) Oct 02 2007 Sorry, no can do. Even with stack based static arrays there is a single ...
- Vladimir Panteleev (8/10) Oct 02 2007 If you're talking about the current situation in D with huge arrays, the...
- BCS (13/33) Oct 02 2007 Unless I'm smoking something, the code to access any array will look the...
- Vladimir Panteleev (43/45) Oct 02 2007 Well, with how I understand this terminology, "dereferencing" means read...
- Walter Bright (15/32) Oct 02 2007 You might be surprised. Intel has done a great deal of work optimizing
- Vladimir Panteleev (19/51) Oct 02 2007 s
- Bill Baxter (4/11) Oct 02 2007 For anyone whose first inclination is to hex-edit binaries when
- BCS (20/54) Oct 02 2007 [...]
- Vladimir Panteleev (11/48) Oct 02 2007 No, that gets the value directly and puts it in eax. To get the address ...
- BCS (18/59) Oct 02 2007 yes, assuming that you have to do some pointer work to get the array (if...
- BCS (3/6) Oct 02 2007
- Vladimir Panteleev (5/12) Oct 03 2007 --
- Walter Bright (4/14) Oct 02 2007 For accessing a big array like that, it is hard to see how a single
- Jarrett Billingsley (9/12) Oct 02 2007 Then Make That Backend Generate Some Other Format Besides OMF (Like ELF,...
- Christopher Wright (5/18) Oct 02 2007 There's GDC. Walter could make that the default branch (and actually
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Feb 20 2008 It usually clocks in somewhere inbetween gcc and g++, but I'm sure
- naryl (5/18) Feb 20 2008 It is somewhat outdated but still:
- Walter Bright (3/15) Oct 02 2007 It's not that bad, especially not this particular issue, which has a
- BCS (2/18) Oct 02 2007 just new the array.
- Vladimir Panteleev (12/30) Oct 02 2007 n
- Janice Caron (22/24) Oct 02 2007 Why not just
- Janice Caron (4/6) Oct 02 2007 Sorry. That should be
- Janice Caron (5/7) Oct 02 2007 Gaah! That's too many typos for one morning!
- Vladimir Panteleev (3/5) Oct 02 2007 Thanks - this indeed makes the code more readable, however now accessing...
- Janice Caron (5/10) Oct 02 2007 It doesn't have to be a class. It could just as easily be a struct.
- BCS (5/36) Oct 02 2007 I was thinking of this:
- Vladimir Panteleev (6/9) Oct 02 2007 Why haven't I thought of that... :O
Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it. No other compiled language that I used (C/C++/Delphi) has such a limitation. The check is in mtype.c, line 1835 (DMD 1.021). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 01 2007
Vladimir Panteleev wrote:Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it. No other compiled language that I used (C/C++/Delphi) has such a limitation. The check is in mtype.c, line 1835 (DMD 1.021).This is a reply from Walter last year: http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071 -- Oskar
Oct 01 2007
On Tue, 02 Oct 2007 09:44:22 +0300, Oskar Linde <oskar.lindeREM ovegmail= .com> wrote:This is a reply from Walter last year: http://www.digitalmars.com/d/archives/digitalmars/D/37038.html#N37071If you ask me, most of those reasons are excuses for 6). Putting restric= tions on things which MAY hurt you IF you do things you're never going t= o do is, to put it softly, not a very good idea. A static array in the u= ninitialized part of the data segment allows me to randomly access any p= art of it very quickly, because its address and the address of every of = its elements is predetermined at compile time, so - unlike with the case= of dynamic arrays or pointers to static arrays - you do not need to der= eference pointers every time. I'll say it again, D is the only compiled = language that has such a limitation. = As for point 5) - well, the executable grows by the size of the array re= gardless if the array has any initialization. Heck, it adds initializati= on data even if I use "=3Dvoid". -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev <thecybershadow gmail.com> wrote:Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it. No other compiled language that I used (C/C++/Delphi) has such a limitation. The check is in mtype.c, line 1835 (DMD 1.021).I think I found the cause of this limitation - OPTLINK hangs, crashes and/or generates corrupt data if I try to force it such input. It works if I use a pointer to a huge static array, but DMD wouldn't accept that as input anyway. I thought that putting the array in a struct or class would prevent the linker from generating a >16MB data segment, but I forgot that D needs(!) to have an initial value for every type, even if it's completely null. Once again, I'll mention that Borland and Microsoft's linkers don't have such a limitation, and they can generate data segments over 16 MB with no problems. So, suggestions for fixing this: 1) fix or rewrite OPTLINK ("rewrite" because I keep hearing about how OPTLINK holds D back for one reason or another; I know it's no small task, however D doesn't need a multi-purpose linker, just one that can handle what the DMD compiler generates) 2) if a type doesn't contain any non-0 initial values, instead of generating a big block of zeroes for the initializer, initialize it with a memset (rep stosd/stosb). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 01 2007
Vladimir Panteleev wrote:On Tue, 02 Oct 2007 09:20:29 +0300, Vladimir Panteleev <thecybershadow gmail.com> wrote:T[][] SquareArray (T) (int x, int y) { T[] block = new T[x * y]; T[][] ret = []; ret.length = x; for (int i = 0; i < x; i++) { ret[i] = block[i * y .. (i + 1) * y]; } return ret; } void main () { int[][] array = SquareArray!(int)(9000, 9000); } Slightly tedious, and it gives you reference semantics and places the array on the heap (but do you really *want* 16MB stack frames? I imagine the OS might have something to say about that).Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it. No other compiled language that I used (C/C++/Delphi) has such a limitation. The check is in mtype.c, line 1835 (DMD 1.021).I think I found the cause of this limitation - OPTLINK hangs, crashes and/or generates corrupt data if I try to force it such input. It works if I use a pointer to a huge static array, but DMD wouldn't accept that as input anyway. I thought that putting the array in a struct or class would prevent the linker from generating a >16MB data segment, but I forgot that D needs(!) to have an initial value for every type, even if it's completely null. Once again, I'll mention that Borland and Microsoft's linkers don't have such a limitation, and they can generate data segments over 16 MB with no problems. So, suggestions for fixing this: 1) fix or rewrite OPTLINK ("rewrite" because I keep hearing about how OPTLINK holds D back for one reason or another; I know it's no small task, however D doesn't need a multi-purpose linker, just one that can handle what the DMD compiler generates) 2) if a type doesn't contain any non-0 initial values, instead of generating a big block of zeroes for the initializer, initialize it with a memset (rep stosd/stosb).
Oct 02 2007
On Tue, 02 Oct 2007 16:18:26 +0300, Christopher Wright <dhasenan gmail.c= om> wrote:T[][] SquareArray (T) (int x, int y) { T[] block =3D new T[x * y]; T[][] ret =3D []; ret.length =3D x; for (int i =3D 0; i < x; i++) { ret[i] =3D block[i * y .. (i + 1) * y]; } return ret; } void main () { int[][] array =3D SquareArray!(int)(9000, 9000); } Slightly tedious, and it gives you reference semantics and places the array on the heap (but do you really *want* 16MB stack frames? I imagi=nethe OS might have something to say about that).Thanks, however dynamic arrays are unfortunately slow. To access a rando= m element, you need a dereference, a multiplication+addition, another de= reference, another multiplication+addition. Janice Caron's workaround, w= hen modified to use a struct placed in the data segment, requires only o= ne dereferencing/multiply+adding - while a static array placed in the da= ta segment requires only arithmetics to access. Also, I was planning to place the array in the uninitialized data segmen= t (I believe it is called BSS in ELF and object files), not the stack - = I believe the default stack size for Windows executables to be 256 kB an= yway. P.S. I believe your function can be reduced to this without any signific= ant changes in behavior: T[][] ret =3D new T[][x]; foreach(ref row;ret) row.length =3D y; return ret; -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:P.S. I believe your function can be reduced to this without any significant changes in behavior: T[][] ret = new T[][x]; foreach(ref row;ret) row.length = y; return ret;For that matter, I think you could also do: template MultiDimArray(T,int N) { static if (N == 1) typedef T[] MultiDimArray; else typedef MultiDimArray!(T,N-1)[] MultiDimArray; } MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] others...) { static if (N == 1) return new T[dim]; else { MultiDimArray!(T,N) t; t.length = dim; foreach(ref a;t) a = MakeMultiDimArray!(T,N-1)(others[0],others[1..$]); return t; } }
Oct 02 2007
"Janice Caron" <caron800 googlemail.com> wrote in message news:mailman.353.1191333299.16939.digitalmars-d puremagic.com...On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:You've basically implemented, in templates, what you can already do with the language syntax. MultiDimArray!(int, 3) will turn into int[][][], and the functionality of MakeMultiDimArray is already covered by array new-ing, i.e. new int[][][](10, 20, 30). And it's still an array-of-arrays.P.S. I believe your function can be reduced to this without any significant changes in behavior: T[][] ret = new T[][x]; foreach(ref row;ret) row.length = y; return ret;For that matter, I think you could also do: template MultiDimArray(T,int N) { static if (N == 1) typedef T[] MultiDimArray; else typedef MultiDimArray!(T,N-1)[] MultiDimArray; } MultiDimArray!(T,N) MakeMultiDimArray(T,int N)(int dim, int[] others...) { static if (N == 1) return new T[dim]; else { MultiDimArray!(T,N) t; t.length = dim; foreach(ref a;t) a = MakeMultiDimArray!(T,N-1)(others[0],others[1..$]); return t; } }
Oct 02 2007
On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote:functionality of MakeMultiDimArray is already covered by array new-ing, i.e. new int[][][](10, 20, 30).Ooh! I didn't know you could do that! Nice!
Oct 02 2007
"Janice Caron" <caron800 googlemail.com> wrote in message news:mailman.362.1191349640.16939.digitalmars-d puremagic.com...On 10/2/07, Jarrett Billingsley <kb3ctd2 yahoo.com> wrote::) Actually 'new int[5]' is exactly equivalent to 'new int[](5)' -- the former is sugar for the latter. It's just most people are used to writing it like the latter.functionality of MakeMultiDimArray is already covered by array new-ing, i.e. new int[][][](10, 20, 30).Ooh! I didn't know you could do that! Nice!
Oct 02 2007
Vladimir Panteleev wrote:Thanks, however dynamic arrays are unfortunately slow. To access a random element, you need a dereference, a multiplication+addition, another dereference, another multiplication+addition. Janice Caron's workaround, when modified to use a struct placed in the data segment, requires only one dereferencing/multiply+adding - while a static array placed in the data segment requires only arithmetics to access.If you define an array as: int[10000][] a = new int[10000][100]; only one dereference is required to access the data.
Oct 02 2007
On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright <newshound1 digitalmar= s.com> wrote:Vladimir Panteleev wrote:Thanks, however dynamic arrays are unfortunately slow. To access a random element, you need a dereference, a multiplication+addition, another dereference, another multiplication+addition. Janice Caron's workaround, when modified to use a struct placed in the data segment,=requires only one dereferencing/multiply+adding - while a static array placed in the data segment requires only arithmetics to access.=If you define an array as: int[10000][] a =3D new int[10000][100]; only one dereference is required to access the data.Yes, only one - when there should be none at all. *sigh* Note that one thing which the compiler fails to prevent but still chokes= the linker are having more than 16 Mb of data or even distinct types (w= hich need initial values, such as structs/classes). Example: uint[0x800000] a,b,c,d; void main() { } So - please, Just Fix That Linker ;) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
Reply to Vladimir,On Tue, 02 Oct 2007 21:48:34 +0300, Walter Bright <newshound1 digitalmars.com> wrote:Sorry, no can do. Even with stack based static arrays there is a single dereference. An array access always does a dereference. (might you be thinking about indirects?)Vladimir Panteleev wrote:Yes, only one - when there should be none at all. *sigh*Thanks, however dynamic arrays are unfortunately slow. To access a random element, you need a dereference, a multiplication+addition, another dereference, another multiplication+addition. Janice Caron's workaround, when modified to use a struct placed in the data segment, requires only one dereferencing/multiply+adding - while a static array placed in the data segment requires only arithmetics to access.If you define an array as: int[10000][] a = new int[10000][100]; only one dereference is required to access the data.
Oct 02 2007
On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote:Sorry, no can do. Even with stack based static arrays there is a single dereference. An array access always does a dereference. (might you be thinking about indirects?)If you're talking about the current situation in D with huge arrays, then it's so (and those don't fit in the stack anyway). If not, well, it's certainly very possible if you just declare the array in the data segment (or with "static" anywhere else). Then, like I said earlier, "its address and the address of every of its elements is predetermined at compile time" - so "a static array placed in the data segment requires only arithmetics to access" ((base address+offset into element) + index*element size). OPTLINK's limitation is around 16 MB of data for the data segment. This also includes initializations for structs, so you can't even have struct types with the gross size of more than 16 MB. Other languages certainly would allow it, and I'm sure that if I would use a hex-edited dmd.exe with another linker - had I known a better DMD-compatible one - everything would work. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
Reply to Vladimir,On Tue, 02 Oct 2007 22:30:25 +0300, BCS <ao pathlink.com> wrote: If you're talking about the current situation in D with huge arrays, then it's so (and those don't fit in the stack anyway). If not, well, it's certainly very possible if you just declare the array in the data segment (or with "static" anywhere else). Then, like I said earlier, "its address and the address of every of its elements is predetermined at compile time" - so "a static array placed in the data segment requires only arithmetics to access" ((base address+offset into element) + index*element size). OPTLINK's limitation is around 16 MB of data for the data segment. This also includes initializations for structs, so you can't even have struct types with the gross size of more than 16 MB. Other languages certainly would allow it, and I'm sure that if I would use a hex-edited dmd.exe with another linker - had I known a better DMD-compatible one - everything would work.Unless I'm smoking something, the code to access any array will look the same regardless of where the array is. Get base aggress compute offset add them *dereference* that address the only difference is how the base address is found. with static arrays it can be coded as a part of the opcode (I think), with general arrays, it is loaded from a variable, and with an array in the stack frame, it is pulled from the frame pointer*. But in all cases, their is still a dereference. * BTW, accessing a local variable also does a dereference, just where the offset is known at compile time.
Oct 02 2007
On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:Unless I'm smoking something, the code to access any array will look the same regardless of where the array is.Well, with how I understand this terminology, "dereferencing" means reading an offset from a memory location, other than an immediate instruction operand. Consider: int i; i++; This assembles as: i dd ? ... inc i Assuming i is in the data segment (it has a fixed address), I do not consider this dereferencing, nor would I think anyone should. Then we have: int* i; ... (*i)++; Assuming i is in the data segment (it has a fixed address), this is a "single" dereference. To get the address of the integer we need to read it from a fixed memory location. The same with arrays - except, for the first case, it's a static array, and for the second - it is either a dynamic array, a new-allocated array, etc. So, how does this boil down in performance? Let me show you: int[0x4000000] bigArr; ... int readValue(int index) { return bigArr[index]; } With -release -O, this assembles to: readValue proc near mov eax, ds:bigArr[eax*4] retn readValue endp Now, if bigArr would be a dynamic array or referenced through a pointer, the resulting code would look like this: readValue proc near mov ecx, ds:bigArr mov eax, [ecx+eax*4] retn readValue endp It gets worse if there are further nests, e.g. when using multidimensional arrays. That doesn't sound like a big deal, however: 1) when the array is in the data segment, we know the address of every one of its elements - so random access is faster 2) when using multi-dimensional dynamic arrays, there are more casts 3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next row 4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that etc. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007
Vladimir Panteleev wrote:That doesn't sound like a big deal, however: 1) when the array is in the data segment, we know the address of every one of its elements - so random access is fasterYou might be surprised. Intel has done a great deal of work optimizing and pipelining addressing modes. Furthermore, one would probably access such an array repeatedly, meaning the pointer to it would be cached in a register. I doubt you'll see a dime's worth of difference. In any case, as has been shown time and again, where one thinks the time is spent in a program is always wrong. Even the experts are always wrong. Use a profiler!2) when using multi-dimensional dynamic arrays, there are more castsint[1000][] a; ... a[3][4] = 2; involves no casts.3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next rowIt's the same for int[1000][]4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that etc.Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].
Oct 02 2007
On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmar= s.com> wrote:Vladimir Panteleev wrote:That doesn't sound like a big deal, however: 1) when the array is in the data segment, we know the address of every one of its elements - so random access is fasterYou might be surprised. Intel has done a great deal of work optimizing=and pipelining addressing modes. Furthermore, one would probably acces=ssuch an array repeatedly, meaning the pointer to it would be cached in=aregister. I doubt you'll see a dime's worth of difference. In any case, as has been shown time and again, where one thinks the ti=meis spent in a program is always wrong. Even the experts are always wrong. Use a profiler!I will, if I won't be satisfied by the performance :)Erm... I'm sure I wanted to write something other than "casts" there, bu= t now I even forgot what I wanted to say originally. I think I meant tha= t when you have imbricated dynamic arrays, there are even more dereferen= ces.2) when using multi-dimensional dynamic arrays, there are more castsint[1000][] a; ... a[3][4] =3D 2; involves no casts.Right. I always overlook that case!3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next rowIt's the same for int[1000][]Yes... that's the one I mentioned (though I'm used to see it as [reg+reg= 2*size+offset]). Thanks for replying. I wonder how hard would it be to write - or, at lea= st, start a community project of a Windows PE linker written from scratc= h - in D, of course :) -- = Best regards, Vladimir mailto:thecybershadow gmail.com4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to that etc.Using the address mode offset[reg1][reg2*size] mode is not any slower than using offset[reg2*size].
Oct 02 2007
Vladimir Panteleev wrote:On Wed, 03 Oct 2007 02:09:56 +0300, Walter Bright <newshound1 digitalmars.com> wrote: Yes... that's the one I mentioned (though I'm used to see it as [reg+reg2*size+offset]).Thanks for replying. I wonder how hard would it be to write - or, at least, start a community project of a Windows PE linker written from scratch - in D, of course :)For anyone whose first inclination is to hex-edit binaries when confronted with a minor annoyance, I'm guessing probably not that hard. ;-) --bb
Oct 02 2007
Reply to Vladimir,On Wed, 03 Oct 2007 00:51:46 +0300, BCS <ao pathlink.com> wrote:[...] I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.Unless I'm smoking something, the code to access any array will look the same regardless of where the array is.Well, with how I understand this terminology, "dereferencing" means reading an offset from a memory location, other than an immediate instruction operand. Consider:readValue proc near mov eax, ds:bigArr[eax*4]If I'm reading this correctly what is happening here is "load address (bigArr + eax*4) unless IA32 has a special case op for that, your going to have to compute the address in another op and even if there is a special case, I rather suspect that it will be exactly as fast either way. Furthermore, if more than one detention is used you will have to do the computation in other ops no mater what.retn readValue endpBTW where did you get that ASM dump? I don't recognize the format.readValue proc near mov ecx, ds:bigArr mov eax, [ecx+eax*4] retn readValue endp It gets worse if there are further nests, e.g. when using multidimensional arrays.unless you are nesting dynamic arrays it's just math, not dereferences (i1 + d1*i2 + d2*d1*i3 ...) But it looks like we agree on that point.3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next rowthis can be done with a dynamic array of static sized arrays.4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to thatIt looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)etc.
Oct 02 2007
On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.OK, so in the first array example that makes one load (only the value we need), and in the second - two (one to get the address of the array data, and another to get the value).No, that gets the value directly and puts it in eax. To get the address I would use the "lea" instruction instead of "mov" (which is useful if you want to read and then write to the address).readValue proc near mov eax, ds:bigArr[eax*4]If I'm reading this correctly what is happening here is "load address (bigArr + eax*4)unless IA32 has a special case op for that, your going to have to compute the address in another op and even if there is a special case, I rather suspect that it will be exactly as fast either way. Furthermore, if more than one detention is used you will have to do the computation in other ops no mater what.I posted the disassembly listing of the D code I supplied, so this code works (there are no missing instructions).The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)retn readValue endpBTW where did you get that ASM dump? I don't recognize the format.Yes, I meant dynamic arrays (or pointers, for the first "link")...readValue proc near mov ecx, ds:bigArr mov eax, [ecx+eax*4] retn readValue endp It gets worse if there are further nests, e.g. when using multidimensional arrays.unless you are nesting dynamic arrays it's just math, not dereferences (i1 + d1*i2 + d2*d1*i3 ...)Yeah. Forgot about those (again).3) rectangular static arrays are much faster when you want to access it by column - you just add (width*element_size) to the pointer, and go to the element on the next rowthis can be done with a dynamic array of static sized arrays.ConstB must be a power of 2 - and I don't think I can find a link more easily than you (since I don't use any online - or any other, actually - reference). As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo + offset] addressing mode. -- Best regards, Vladimir mailto:thecybershadow gmail.com4) when you want to access an address inside the element (you have an array of structs), you must perform an addition after the multiplication (I think it can be part of the instruction in the newer instruction sets though) - while, when the array base address is predetermined, you can just use base_address + element_offset as the base address, then add index*element_size to thatIt looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)
Oct 02 2007
Reply to Vladimir,On Wed, 03 Oct 2007 02:31:47 +0300, BCS <ao pathlink.com> wrote:yes, assuming that you have to do some pointer work to get the array (if it's a static variable than you can do a "move from inline defined address")I think we are referring to the same thing, but counting different parts of it. What I'm counting is the times that a load is done from an address that is not an inline value, but comes from a register.OK, so in the first array example that makes one load (only the value we need), and in the second - two (one to get the address of the array data, and another to get the value).Baahhh. I need to be more careful with what I actually say that should have been "load from address"No, that gets the value directly and puts it in eax. To get the address I would use the "lea" instruction instead of "mov" (which is useful if you want to read and then write to the address).readValue proc near mov eax, ds:bigArr[eax*4]If I'm reading this correctly what is happening here is "load address (bigArr + eax*4)[...]unless IA32 has a special case op for that, your going to have to compute the address in another op and even if there is a special case, I rather suspect that it will be exactly as fast either way. Furthermore, if more than one detention is used you will have to do the computation in other ops no mater what.I posted the disassembly listing of the D code I supplied, so this code works (there are no missing instructions).sweet BTW I find these kida handy http://www.intel.com/design/processor/manuals/253665.pdf Volume 1: Basic Architecture http://www.intel.com/design/processor/manuals/253666.pdf Volume 2A: Instruction Set Reference, A-M http://www.intel.com/design/processor/manuals/253667.pdf Volume 2B: Instruction Set Reference, N-Z foud here http://www.intel.com/products/processor/manuals/index.htm In the end I rather suspect that what really matters is how many times you read from memory, not how you got the address to read from.It looks like you are using a (constA + Constb * reg) address mode. I just look in the "Intel Architecture Software Developer's Manual" and didn't find any reference to it. Am I missing something? (I didn't actual look real hard)ConstB must be a power of 2 - and I don't think I can find a link more easily than you (since I don't use any online - or any other, actually - reference). As WB mentioned in another reply, there's also a [reg1 + reg2*powerOfTwo + offset] addressing mode.
Oct 02 2007
Reply to Vladimir,The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)link?
Oct 02 2007
On Wed, 03 Oct 2007 03:33:28 +0300, BCS <ao pathlink.com> wrote:Reply to Vladimir,Oops.The demo version of Interactive Disassembler[1]. It's quite useful for a Windows D programmer's arsenal, since it understands D symbolic information. (They're loaded mangled, so I "demangled" them manually.)link?-- Best regards, Vladimir mailto:thecybershadow gmail.com[1] http://datarescue.com/idabase/idadowndemo.htm
Oct 03 2007
Vladimir Panteleev wrote:Yes, only one - when there should be none at all. *sigh*For accessing a big array like that, it is hard to see how a single dereference through a register is going to affect the run time at all.Note that one thing which the compiler fails to prevent but still chokes the linker are having more than 16 Mb of data or even distinct types (which need initial values, such as structs/classes). Example: uint[0x800000] a,b,c,d; void main() { } So - please, Just Fix That Linker ;)Easier said than done :-(
Oct 02 2007
"Walter Bright" <newshound1 digitalmars.com> wrote in message news:fduing$trk$1 digitalmars.com...Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die. Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.So - please, Just Fix That Linker ;)Easier said than done :-(
Oct 02 2007
Jarrett Billingsley wrote:"Walter Bright" <newshound1 digitalmars.com> wrote in message news:fduing$trk$1 digitalmars.com...There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die. Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.So - please, Just Fix That Linker ;)Easier said than done :-(
Oct 02 2007
Christopher Wright wrote:It usually clocks in somewhere inbetween gcc and g++, but I'm sure that says more about the language complexity than about the compilers... Not sure how LLVM does for speed, as I've only used it through GCC. But I think the speed-above-everything DMD and OPTLINK still "win". --andersYes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
Feb 20 2008
On Wed, 20 Feb 2008 11:30:40 +0300, Anders F Björklund <afb algonet.se> wrote:Christopher Wright wrote:It is somewhat outdated but still: http://www.digitalmars.com/d/1.0/cppstrings.html There is a comparison at the end.It usually clocks in somewhere inbetween gcc and g++, but I'm sure that says more about the language complexity than about the compilers... Not sure how LLVM does for speed, as I've only used it through GCC. But I think the speed-above-everything DMD and OPTLINK still "win". --andersYes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.There's GDC. Walter could make that the default branch (and actually mentioned that as a possibility at the conference). Though one of the benefits of D is a fast compiler, and I don't know how much faster gdc is compared to, say, gcc.
Feb 20 2008
Jarrett Billingsley wrote:"Walter Bright" <newshound1 digitalmars.com> wrote in message news:fduing$trk$1 digitalmars.com...It's not that bad, especially not this particular issue, which has a painless workaround.Then Make That Backend Generate Some Other Format Besides OMF (Like ELF, Oh Wait Doesn't DMD On Linux Already Do That?), or something, _anything_, since this whole OPTLINK thing is having one suboptimal consequence after another. I think it's time to let the poor thing die.So - please, Just Fix That Linker ;)Easier said than done :-(Yes, I am _actually_ advocating the development of a different backend, or linker, or object format etc. over the release of D2.0. Yes, I am actually more interested in that.
Oct 02 2007
Reply to Vladimir,Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it.just new the array.
Oct 02 2007
On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:Reply to Vladimir,,Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly=nI don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitatio=sfor anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the position=manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning=If you mean that I do something like: int[4096][4096]* arr =3D (new int[4096][4096][1]).ptr; 1) the compiler still (normally) doesn't let me do it 2) it's slower for random access due to having to dereference a pointer.= But, yes, this is the sanest workaround so far. -- = Best regards, Vladimir mailto:thecybershadow gmail.combehind it.just new the array.
Oct 02 2007
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:If you mean that I do something like: int[4096][4096]* arr = (new int[4096][4096][1]).ptr;Why not just class Array2D(T) { this(int width, int height) { this.width = width; a = new T(width * height); } T opIndex(int x, int y) { return a[x * width + y]; } void opIndexAssign(T n, int x, int y) { a[x * width + y] = n; } T[] a; int width; } Then arr = new Array2D(4096,4096);
Oct 02 2007
On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:Then arr = new Array2D(4096,4096);Sorry. That should be arr = new Array!(int)(4096,4096); then you just refer to the elements as arr[x,y] as you'd expect.
Oct 02 2007
On 10/2/07, Janice Caron <caron800 googlemail.com> wrote:Sorry. That should be arr = new Array!(int)(4096,4096);Gaah! That's too many typos for one morning! arr = new Array2D!(int)(4096,4096); Anyway, hopefully you get the idea. I'm going for more coffee before I mistype anything else!
Oct 02 2007
Janice Caron Wrote:Anyway, hopefully you get the idea. I'm going for more coffee before I mistype anything else!Thanks - this indeed makes the code more readable, however now accessing an element involves two dereferences (the class, and the array pointer) - and, possibly, a function call if the compiler won't inline it. My main concern is the speed, otherwise I wouldn't be complaining and just use dynamic arrays :P -- Vladimir
Oct 02 2007
On 10/2/07, Vladimir Panteleev <thecybershadow gmail.com> wrote:Janice Caron Wrote:It doesn't have to be a class. It could just as easily be a struct. Then you've only got one dereference.Anyway, hopefully you get the idea. I'm going for more coffee before I mistype anything else!Thanks - this indeed makes the code more readable, however now accessing an element involves two dereferences (the class, and the array pointer)- and, possibly, a function call if the compiler won't inline it.I would hope that the compiler would inline that. Maybe not in debug mode, but certainly in release mode.
Oct 02 2007
Reply to Vladimir,On Tue, 02 Oct 2007 10:16:25 +0300, BCS <ao pathlink.com> wrote:I was thinking of this: alias int[4096] block; // alias to be shure I get dynamic array of static arrays block[] arr = new block[4096];Reply to Vladimir,If you mean that I do something like: int[4096][4096]* arr = (new int[4096][4096][1]).ptr; 1) the compiler still (normally) doesn't let me do it 2) it's slower for random access due to having to dereference a pointer. But, yes, this is the sanest workaround so far.Why? I posted a bug report yesterday, thinking it was just some accidentally left over code. But I noticed this statement in the documentation today. And, frankly, I don't understand why that limitation is there. The problem with this restriction comes when using multi-dimensional arrays. Branched dynamic arrays are MUCH slower than rectangular/cubic/etc. static arrays - and this is a severe limitation for anyone wanting to work with large amounts of data (e.g. game maps). Of course I could work around this by calculating the positions manually using x*width+y, but I'd like to know the reason why this limitation was put there in the first place, and what's the reasoning behind it.just new the array.
Oct 02 2007
On Tue, 02 Oct 2007 18:12:41 +0300, BCS <ao pathlink.com> wrote:I was thinking of this: alias int[4096] block; // alias to be shure I get dynamic array of s=tatic arraysblock[] arr =3D new block[4096];Why haven't I thought of that... :O -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Oct 02 2007