www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - miscellaneous array questions...

reply WhatMeWorry <kheaser gmail.com> writes:
1) The D Language Reference says:

"There are four kinds of arrays..." with the first example being
"type*	Pointers to data"  and "int* p;  etc.

At the risk of sounding overly nitpicky, isn't a pointer to an 
integer simply a pointer to an integer?  How does that pertain to 
an array?


2) "The total size of a static array cannot exceed 16Mb" What 
limits this? And with modern systems of 16GB and 32GB, isn't 16Mb 
excessively small?   (an aside: shouldn't that be 16MB in the 
reference instead of 16Mb? that is, Doesn't b = bits and B = 
bytes)


3) Lastly, In the following code snippet, is arrayA and arrayB 
both allocated on the stack? And how does their scopes and/or 
lifetimes differ?

==== module1 =====
int[100] arrayA;
void main()
{
     int[100] arrayB;
     // ...
}
==== module1 =====
Jul 20 2020
next sibling parent reply <a a.com> writes:
On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 1) The D Language Reference says:

 "There are four kinds of arrays..." with the first example being
 "type*	Pointers to data"  and "int* p;  etc.

 At the risk of sounding overly nitpicky, isn't a pointer to an 
 integer simply a pointer to an integer?  How does that pertain 
 to an array?


 2) "The total size of a static array cannot exceed 16Mb" What 
 limits this? And with modern systems of 16GB and 32GB, isn't 
 16Mb excessively small?   (an aside: shouldn't that be 16MB in 
 the reference instead of 16Mb? that is, Doesn't b = bits and B 
 = bytes)


 3) Lastly, In the following code snippet, is arrayA and arrayB 
 both allocated on the stack? And how does their scopes and/or 
 lifetimes differ?

 ==== module1 =====
 int[100] arrayA;
 void main()
 {
     int[100] arrayB;
     // ...
 }
 ==== module1 =====
1) Pointers can be used as arrays with the [] operator, int* p = arrayA.ptr; assert(*(p + 99) == p[99]); should access the same element. http://ddili.org/ders/d.en/pointers.html ("Using pointers with the array indexing operator []") 2) I've encountered this problem too, it's arbitrary AFAIK but it can be circumvented with dynamic arrays.
Jul 20 2020
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/20/20 8:16 PM, a a.com wrote:

 3) Lastly, In the following code snippet, is arrayA and arrayB both
 allocated on the stack?
arrayA is allocated on thread-local storage and lives as long as the program is active. I guess a final interaction with it can be in a 'static ~this()' or a 'shared static ~this()' block. Note that this is different from e.g. C++: In that language, arrayA would be a "global" variable and there would be a single instance of it. In D, there will be as many arrayA variables as there are active threads. (One thread's modification to its own arrayA is not seen by other threads.) arrayB is allocated on the stack and lives as long as the scope that it is defined inside. That scope is main's body in your code.
 And how does their scopes and/or lifetimes
 differ?

 ==== module1 =====
 int[100] arrayA;
 void main()
 {
     int[100] arrayB;
     // ...
 }
 ==== module1 =====
Ali
Jul 20 2020
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 2) "The total size of a static array cannot exceed 16Mb" What 
 limits this? And with modern systems of 16GB and 32GB, isn't 
 16Mb excessively small?   (an aside: shouldn't that be 16MB in 
 the reference instead of 16Mb? that is, Doesn't b = bits and B 
 = bytes)
I didn't know this but it makes sense and I guess this is a constraint of the D language itself. In practice 16MB should be well enough for most cases. I'm not sure where 16MB is taken from, if there is any OS out there that has this limitation or if it was just taken as an adequate limit. Let's say you have a program with 4 threads, then suddenly the TLS area is 4 * 16 MB = 64MB. This size rapidly increases with number of threads and TLS area size. Let's say TLS area of 128MB and 8 threads, which gives you a memory consumption of 1GB. That's how quickly it starts to consume memory if you don't limit the TLS variables. If you want global variables like in good old C/C++, then use __gshared. Of course you have to take care if any multiple accesses from several threads.
Jul 21 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/21/20 7:10 AM, IGotD- wrote:
 On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 2) "The total size of a static array cannot exceed 16Mb" What limits 
 this? And with modern systems of 16GB and 32GB, isn't 16Mb excessively 
 small?   (an aside: shouldn't that be 16MB in the reference instead of 
 16Mb? that is, Doesn't b = bits and B = bytes)
I didn't know this but it makes sense and I guess this is a constraint of the D language itself. In practice 16MB should be well enough for most cases. I'm not sure where 16MB is taken from, if there is any OS out there that has this limitation or if it was just taken as an adequate limit.
I believe it stems from a limitation in the way the stacks are allocated? Or maybe a limitation in DMC, the basis for DMD. Also, you CAN actually have larger arrays, they just cannot be put on the stack (which most static arrays are): struct S { ubyte[17_000_000] big; } void main() { auto s = new S; // ok S s; // crash (signal 11 on run.dlang.io) } This may not work if `big` had a static initializer, I'm not sure. -Steve
Jul 21 2020
prev sibling next sibling parent wjoe <invalid example.com> writes:
On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 2) "The total size of a static array cannot exceed 16Mb" What 
 limits this? And with modern systems of 16GB and 32GB, isn't 
 16Mb excessively small?   (an aside: shouldn't that be 16MB in 
 the reference instead of 16Mb? that is, Doesn't b = bits and B 
 = bytes)
Static arrays are passed by value. (Also I think you're right about Mb vs MB except it should be MiB. 1MB = 1000^2 (decimal) and 1MiB = 1024^2 (binary). Note that MB is defined 1024^2 in JEDEC 100B.01 but, IMO, ISO standard is superior because it's unambiguous and JEDEC only defines units up to GB (inclusive))
Jul 21 2020
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 How does that pertain to an array?
C arrays work as pointers to the first element and D can use that style too.
 2) "The total size of a static array cannot exceed 16Mb" What 
 limits this?
The others aren't wrong about stack size limits playing some role, but the primary reason is that it is a weird hack for safe, believe it or not. The idea is: --- class A { ubyte[4_000_000_000] whole_system; } safe void lol() { A a; a.whole_system[any_address] = whatever; } --- With the null `a`, the offset to the static array is just 0 + whatever and the safe mechanism can't trace that. So the arbitrary limit was put in place to make it more likely that such a situation will hit a protected page and segfault instead of carrying on. (most low addresses are not actually allocated by the OS... though there's no reason why they couldn't, it just usually doesn't, so that 16 MB limit makes the odds of something like this actually happening a lot lower) I don't recall exactly when this was discussed but it came up in the earlier days of safe, I'm pretty sure it worked before then.
Jul 21 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 21 July 2020 at 12:34:14 UTC, Adam D. Ruppe wrote:
 With the null `a`, the offset to the static array is just 0 + 
 whatever and the  safe mechanism can't trace that.

 So the arbitrary limit was put in place to make it more likely 
 that such a situation will hit a protected page and segfault 
 instead of carrying on. (most low addresses are not actually 
 allocated by the OS... though there's no reason why they 
 couldn't, it just usually doesn't, so that 16 MB limit makes 
 the odds of something like this actually happening a lot lower)

 I don't recall exactly when this was discussed but it came up 
 in the earlier days of  safe, I'm pretty sure it worked before 
 then.
If that's the case I would consider this 16MB limit unnecessary. Most operating systems put a guard page at the very bottom of the stack (which is usually 1MB - 4MB, usually 1MB on Linux). Either the array will hit that page during initialization or something else during the execution. Let's say someone puts a 15MB array on the stack, then we will have a page fault instead for sure and this artificial limit there for nothing. With 64-bits or more and some future crazy operating system, it might support large stack sizes like 256MB. This is a little like a 640kB limit.
Jul 21 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 21 July 2020 at 13:16:44 UTC, IGotD- wrote:
 Either the array will hit that page during initialization or 
 something else during the execution.
But the array isn't initialized in the justification scenario. It is accessed through a null pointer and the type system thinks it is fine because it is still inside the static limit. At run time, the cpu just sees access to memory address 0 + x, and if x is sufficient large, it can bypass those guard pages.
Jul 21 2020
parent IGotD- <nise nise.com> writes:
On Tuesday, 21 July 2020 at 13:23:32 UTC, Adam D. Ruppe wrote:
 But the array isn't initialized in the justification scenario. 
 It is accessed through a null pointer and the type system 
 thinks it is fine because it is still inside the static limit.

 At run time, the cpu just sees access to memory address 0 + x, 
 and if x is sufficient large, it can bypass those guard pages.
I'm not that convinced. This totally depends on how the virtual memory for the process looks like. Some operating systems might have a gap between 0 - 16MB but some others don't. This is also a subject that can change between versions of the OS and even more uncertain as address space randomization becomes popular. Safety based on assumptions aren't really worth it. I don't personally care about the 16MB limit as I would never use it for any foreseeable future but the motivation for it is kind of vague.
Jul 21 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/21/20 8:34 AM, Adam D. Ruppe wrote:

 The others aren't wrong about stack size limits playing some role, but 
 the primary reason is that it is a weird hack for  safe, believe it or not.
...
 I don't recall exactly when this was discussed but it came up in the 
 earlier days of  safe, I'm pretty sure it worked before then.
I think this was discussed, but was not the reason for the limitation. The limitation exists even in D1, which is before safe: https://digitalmars.com/d/1.0/arrays.html#static-arrays I have stressed before that any access of a pointer to a large object in safe code should also check that the base of the object is not within the null page (this is not currently done). This is the only way to ensure safety. -Steve
Jul 21 2020
parent reply Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Tuesday, 21 July 2020 at 13:42:15 UTC, Steven Schveighoffer 
wrote:
 On 7/21/20 8:34 AM, Adam D. Ruppe wrote:

 The others aren't wrong about stack size limits playing some 
 role, but the primary reason is that it is a weird hack for 
  safe, believe it or not.
...
 I don't recall exactly when this was discussed but it came up 
 in the earlier days of  safe, I'm pretty sure it worked before 
 then.
I think this was discussed, but was not the reason for the limitation. The limitation exists even in D1, which is before safe: https://digitalmars.com/d/1.0/arrays.html#static-arrays I have stressed before that any access of a pointer to a large object in safe code should also check that the base of the object is not within the null page (this is not currently done). This is the only way to ensure safety.
It seems the limitation was introduced in DMD 0.123, in May 2005: https://forum.dlang.org/post/d61jpa$1m0l$1 digitaldaemon.com Walter gives some justification in the post immediately following:
 1) Gigantic static arrays are often either the result of a typo 
 or are a
 newbie mistake.
 2) Such require a lot of memory for the compiler to handle. 
 Before the OS
 officially runs out of memory, it goes to greater and greater 
 lengths to
 scavenge memory for the compiler, often bringing the computer 
 to its knees
 in desperation.
 3) D needs to be a portable language, and by capping the array 
 size a
 program is more likely to be portable.
 4) Giant arrays are reflected in a corresponding giant size for 
 the exe
 file.
 5) There simply isn't a need I can think of for such arrays. 
 There shouldn't
 be a problem with allocating them dynamically.
I admit I thought it was an old optlink limitation, but it seems it's basically arbitrary. -- Simen
Jul 21 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 21 July 2020 at 19:20:28 UTC, Simen Kjærås wrote:
 Walter gives some justification in the post immediately 
 following:
whelp proves my memory wrong!
Jul 21 2020
prev sibling parent Johan <j j.nl> writes:
On Monday, 20 July 2020 at 22:05:35 UTC, WhatMeWorry wrote:
 1) The D Language Reference says:

 "There are four kinds of arrays..." with the first example being
 "type*	Pointers to data"  and "int* p;  etc.

 At the risk of sounding overly nitpicky, isn't a pointer to an 
 integer simply a pointer to an integer?  How does that pertain 
 to an array?
I agree. "type*" being an array makes no sense from a D language point of view.
 2) "The total size of a static array cannot exceed 16Mb" What 
 limits this? And with modern systems of 16GB and 32GB, isn't 
 16Mb excessively small?   (an aside: shouldn't that be 16MB in 
 the reference instead of 16Mb? that is, Doesn't b = bits and B 
 = bytes)
This doesn't make sense either. Where did you find this in the documentation? It should be removed, as it is easily proven to work (`ubyte[170_000_000] s; void main(){s[160_000_000] = 1;}`).
 3) Lastly, In the following code snippet, is arrayA and arrayB 
 both allocated on the stack? And how does their scopes and/or 
 lifetimes differ?

 ==== module1 =====
 int[100] arrayA;
 void foo() // changed from main to foo for clarity
 {
     int[100] arrayB;
     // ...
 }
 ==== module1 =====
"The stack" is not a D language thing, a better way of looking at it is that local storage is implemented by all D compilers by using the "stack" (on x86). arrayA is not allocated on the stack, lifetime is whole duration of program, one array per thread. arrayB is indeed allocated on the stack (local storage), lifetime is only from start to end of foo(), one array per call to foo (!). Because arrayB is on the stack, you are limited by stack size which is set by the OS (but can be overridden). The array would be competing with all other things that are put on the stack, such as function call return addresses and temporary values, both of which you as coder cannot see. What maximum size of arrayB you can get away with heavily depends on the rest of your program (and the stack size allocated by the OS, which is somewhere in the 4MB, 8MB, 16MB range), thus best to avoid putting large arrays on the stack alltogether. arrayA is allocated together with other global/TLS variables in a section for which I don't think there really is a size limit. -Johan
Jul 21 2020