digitalmars.D - Why 16Mib static array size limit?
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/7) Aug 15 2016 dmd does not allow anything larger. Shouldn't the limit depend on the
- NX (5/5) Aug 15 2016 https://issues.dlang.org/show_bug.cgi?id=14859
- Jacob Carlborg (4/9) Aug 16 2016 Or using any other platform than Windows :)
- NX (13/13) Aug 15 2016 You can apply a patch if you're willing to compile dmd from
- =?UTF-8?Q?Ali_=c3=87ehreli?= (93/94) Aug 15 2016 Could you please help me understand the following results, possibly by
- ketmar (7/7) Aug 15 2016 On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:
- Yuxuan Shui (5/11) Aug 15 2016 There seem to be two things at work here: 1) when not accessed
- Johan Engelen (23/28) Aug 16 2016 As Yuxuan Shui mentioned the difference is in vectorization. The
- =?UTF-8?Q?Ali_=c3=87ehreli?= (15/41) Aug 16 2016 Thank you all. That makes sense... Agreeing that the POINTER version is
- Yuxuan Shui (3/24) Aug 16 2016 Actually, the STATIC version is always faster on my machine (Core
- Yuxuan Shui (3/31) Aug 16 2016 In dmd's case, the non-STATIC version seems to evaluate the loop
- Yuxuan Shui (3/31) Aug 16 2016 Wait, doesn't D have strict aliasing rules? ubyte* (&evil) should
- ketmar (3/4) Aug 16 2016 luckily, no. at least this is not enforced by dmd. and it is
- deadalnix (7/11) Aug 17 2016 Controlling aliasing is really the #1 optimization barrier these
- ketmar (8/20) Aug 17 2016 from my PoV, this kind of "optimizing" is overrated. i'm
- deadalnix (5/12) Aug 17 2016 Because making 99.9% of the code slower because of a fringe use
- ketmar (7/9) Aug 17 2016 exactly the thing i was writing about. "hey, you, meatbag! i, Teh
- Chris Wright (5/14) Aug 17 2016 It makes your intent more obvious. It's more obvious to other humans as
- jmh530 (2/8) Aug 17 2016 AA? Associative Array?
- Timon Gehr (2/13) Aug 17 2016 (Alias Analysis.)
- deadalnix (3/15) Aug 17 2016 Alias Analysis. This is a common compiler acronym.
- Walter Bright (5/10) Aug 17 2016 At least for this case, as I mentioned in another post, if the pointer a...
- Yuxuan Shui (4/21) Aug 17 2016 But doing so would be incorrect if D doesn't provide strong
- Walter Bright (2/4) Aug 17 2016 It would be correct for that loop if the user does it.
- Chris Wright (4/7) Aug 17 2016 The language can analyze all code that affects a local variable in many
- Yuxuan Shui (3/11) Aug 17 2016 That's right. But for Ali's code, the compiler is clearly not
- Chris Wright (6/18) Aug 17 2016 Most of the time, the compiler can successfully make those optimizations...
- Walter Bright (9/11) Aug 17 2016 Global variables are pretty much spawn of the devil. You're right that d...
- Johan Engelen (24/32) Aug 18 2016 Perhaps not smart enough, but it is very close to being smart
- Johan Engelen (3/5) Aug 18 2016 Nevermind, not possible. Templates, cross-module inlining, and
- =?UTF-8?Q?Ali_=c3=87ehreli?= (11/14) Aug 18 2016 Yet there is the following text in the spec:
- Steven Schveighoffer (4/39) Aug 16 2016 Even if it did, I believe the wildcard is ubyte*. Just like in C, char*
- Charles Hixson via Digitalmars-d (3/44) Aug 16 2016 I think what you say is true (look at the code of std.outbuffer), but
- Steven Schveighoffer (7/17) Aug 17 2016 void * is almost useless. In D you can assign a void[] from another
- deadalnix (4/12) Aug 17 2016 Yes, but everything can alias with void*/void[] . Thus, you can
- Steven Schveighoffer (9/21) Aug 17 2016 Sure, but how do you implement, let's say, byte swapping on an integer?
- Johan Engelen (4/4) Aug 17 2016 How about the specific case of array indexing?
- Walter Bright (7/20) Aug 16 2016 When accessing global arrays like this, cache the address of the data in...
dmd does not allow anything larger. Shouldn't the limit depend on the .data segment size, which can be specified by the system? Is there a technical reason? Observing that the limit is actually one less (note -1 below), it's probably due to an old limitation where indexes were limited to 24 bits? static ubyte[16 * 1024 * 1024 - 1] arr; Ali
Aug 15 2016
https://issues.dlang.org/show_bug.cgi?id=14859 This limitation is put there because of optlink (which fails to link when you have enough static data), and is actually entirely meaningless when combined with -m32mscoff & -m64 switches (since other linkers handle huge static data just fine).
Aug 15 2016
On 2016-08-15 22:28, NX wrote:https://issues.dlang.org/show_bug.cgi?id=14859 This limitation is put there because of optlink (which fails to link when you have enough static data), and is actually entirely meaningless when combined with -m32mscoff & -m64 switches (since other linkers handle huge static data just fine).Or using any other platform than Windows :) -- /Jacob Carlborg
Aug 16 2016
You can apply a patch if you're willing to compile dmd from source by doing the following: Find the following code in file 'mtype.d' at line 4561: bool overflow = false; if (mulu(tbn.size(loc), d2, overflow) >= 0x1000000 || overflow) // put a 'reasonable' limit on it goto Loverflow; And change it to: bool overflow = false; mulu(tbn.size(loc), d2, overflow); if (overflow) goto Loverflow; I would make a PR if I had the time (anyone?)...
Aug 15 2016
On 08/15/2016 12:09 PM, Ali Çehreli wrote:dmd does not allow anything larger.Could you please help me understand the following results, possibly by analyzing the produced assembly? I wanted to see whether there were any performance penalties when one used D's recommendation of using dynamic arrays beyond 16MiB. Here is the test code: enum size = 15 * 1024 * 1024; version (STATIC) { ubyte[size] arr; } else { ubyte[] arr; static this() { arr = new ubyte[](size); } } void main() { auto p = arr.ptr; foreach (j; 0 .. 100) { foreach (i; 0..arr.length) { version (POINTER) { p[i] += cast(ubyte)i; } else { arr[i] += cast(ubyte)i; } } } } My CPU is an i7 with 4M cache: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 78 Model name: Intel(R) Core(TM) i7-6600U CPU 2.60GHz Stepping: 3 CPU MHz: 513.953 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 5615.89 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K I tried two compilers: - DMD64 D Compiler v2.071.2-b2 - LDC - the LLVM D compiler (1.0.0): based on DMD v2.070.2 and LLVM 3.8.0 As seen in the code, I tried two version identifiers: - STATIC: Use static array - else: Use dynamic array - POINTER: Access array elements through .ptr - else: Access array elements through the [] operator So, that gave me 8 combinations. Below, I list both the compilation command lines that I used and the wallclock times that each program execution took (as reported by the 'time' utility). 1) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC -version=POINTER 4.332s 2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC 4.238s 3) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=POINTER 4.321s 4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline 3.845s <== BEST for dmd 5) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=POINTER -d-version=STATIC 0.469s 6) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=STATIC 0.472s 7) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=POINTER 0.182s <== BEST for ldc2 8) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off 0.792s So, for dmd, going with the recommendation of using a dynamic array is faster. Interestingly, using .ptr is actually slower. How? With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that. Does that make sense to you? Why would that be? Thank you, Ali
Aug 15 2016
On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote: for x86 dmd, the results are: 0m0.694s 0m1.176s 0m0.722s 0m1.454s x86_64 codegen looks... not normal.
Aug 15 2016
On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:On 08/15/2016 12:09 PM, Ali Çehreli wrote:There seem to be two things at work here: 1) when not accessed via pointer, access to the array is going through TCB everytime (for it's a TLS variable) 2) LDC is able to vectorize the pointer version but not the array version.[...]Could you please help me understand the following results, possibly by analyzing the produced assembly? [...][...]
Aug 15 2016
On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.As Yuxuan Shui mentioned the difference is in vectorization. The non-POINTER version is not vectorized because the semantics of the code is not the same as the POINTER version. Indexing `arr`, and writing to that address could change `arr.ptr`, and so the loop would do something different when "caching" `arr.ptr` in `p` (POINTER version) versus the case without caching (non-POINTER version). Evil code demonstrating the problem: ``` ubyte evil; ubyte[] arr; void doEvil() { // TODO: use this in the obfuscated-D contest arr = (&evil)[0..50]; } ``` The compiler somehow has to prove that `arr[i]` will never point to `arr.ptr` (it's called Alias Analysis in LLVM). Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I don't know. If so, the code is vectorizable and we can try to make it so. -Johan
Aug 16 2016
On 08/16/2016 10:51 AM, Johan Engelen wrote:On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:Thank you all. That makes sense... Agreeing that the POINTER version is applicable only in some cases, looking only at the non-POINTER cases, for ldc2, a static array is faster, making the "arbitrary" 16MiB limit a performance issue. For ldc2, static array is about 40% faster: 6) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=STATIC 0.472s 8) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off 0.792s It's the opposite for dmd: 2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC 4.238s 4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline 3.845s AliWith ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.As Yuxuan Shui mentioned the difference is in vectorization. The non-POINTER version is not vectorized because the semantics of the code is not the same as the POINTER version. Indexing `arr`, and writing to that address could change `arr.ptr`, and so the loop would do something different when "caching" `arr.ptr` in `p` (POINTER version) versus the case without caching (non-POINTER version). Evil code demonstrating the problem: ``` ubyte evil; ubyte[] arr; void doEvil() { // TODO: use this in the obfuscated-D contest arr = (&evil)[0..50]; } ``` The compiler somehow has to prove that `arr[i]` will never point to `arr.ptr` (it's called Alias Analysis in LLVM). Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I don't know. If so, the code is vectorizable and we can try to make it so. -Johan
Aug 16 2016
On Tuesday, 16 August 2016 at 18:46:06 UTC, Ali Çehreli wrote:On 08/16/2016 10:51 AM, Johan Engelen wrote:Actually, the STATIC version is always faster on my machine (Core i5 5200U), in both dmd and ldc2 casesOn Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:ONLY IF youWith ldc2, the best option is to go with a dynamic arraythe lastaccess the elements through the .ptr property. As seen inslowerresult, using the [] operator on the array is about 4 timesThank you all. That makes sense... Agreeing that the POINTER version is applicable only in some cases, looking only at the non-POINTER cases, for ldc2, a static array is faster, making the "arbitrary" 16MiB limit a performance issue. For ldc2, static array is about 40% faster: [...] Alithan that.[...] -Johan
Aug 16 2016
On Tuesday, 16 August 2016 at 19:50:14 UTC, Yuxuan Shui wrote:On Tuesday, 16 August 2016 at 18:46:06 UTC, Ali Çehreli wrote:In dmd's case, the non-STATIC version seems to evaluate the loop condition (arr.length) **everytime**.On 08/16/2016 10:51 AM, Johan Engelen wrote:Actually, the STATIC version is always faster on my machine (Core i5 5200U), in both dmd and ldc2 casesOn Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:ONLY IF youWith ldc2, the best option is to go with a dynamic arraythe lastaccess the elements through the .ptr property. As seen inslowerresult, using the [] operator on the array is about 4 timesThank you all. That makes sense... Agreeing that the POINTER version is applicable only in some cases, looking only at the non-POINTER cases, for ldc2, a static array is faster, making the "arbitrary" 16MiB limit a performance issue. For ldc2, static array is about 40% faster: [...] Alithan that.[...] -Johan
Aug 16 2016
On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:On Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:Wait, doesn't D have strict aliasing rules? ubyte* (&evil) should not be allowed to alias with ubyte** (&arr.ptr).With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.As Yuxuan Shui mentioned the difference is in vectorization. The non-POINTER version is not vectorized because the semantics of the code is not the same as the POINTER version. Indexing `arr`, and writing to that address could change `arr.ptr`, and so the loop would do something different when "caching" `arr.ptr` in `p` (POINTER version) versus the case without caching (non-POINTER version). Evil code demonstrating the problem: ``` ubyte evil; ubyte[] arr; void doEvil() { // TODO: use this in the obfuscated-D contest arr = (&evil)[0..50]; } ``` The compiler somehow has to prove that `arr[i]` will never point to `arr.ptr` (it's called Alias Analysis in LLVM). Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I don't know. If so, the code is vectorizable and we can try to make it so. -Johan
Aug 16 2016
On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:Wait, doesn't D have strict aliasing rules?luckily, no. at least this is not enforced by dmd. and it is great.
Aug 16 2016
On Tuesday, 16 August 2016 at 20:19:32 UTC, ketmar wrote:On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:days, so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.Wait, doesn't D have strict aliasing rules?luckily, no. at least this is not enforced by dmd. and it is great.
Aug 17 2016
On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:On Tuesday, 16 August 2016 at 20:19:32 UTC, ketmar wrote:from my PoV, this kind of "optimizing" is overrated. i'm absolutely unable to understand why should i obey orders from machine instead of machine obeys my orders. if i want to go wild with pointers, don't tell me that i can't, just compile my code! C is literally ridden with this shit, and in the end it is a freakin' pain to write correct C code (if it is possible at all for something comlex).On Tuesday, 16 August 2016 at 20:11:12 UTC, Yuxuan Shui wrote:these days, so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.Wait, doesn't D have strict aliasing rules?luckily, no. at least this is not enforced by dmd. and it is great.
Aug 17 2016
On Wednesday, 17 August 2016 at 12:32:20 UTC, ketmar wrote:from my PoV, this kind of "optimizing" is overrated. i'm absolutely unable to understand why should i obey orders from machine instead of machine obeys my orders. if i want to go wild with pointers, don't tell me that i can't, just compile my code! C is literally ridden with this shit, and in the end it is a freakin' pain to write correct C code (if it is possible at all for something comlex).Because making 99.9% of the code slower because of a fringe use case isn't sound engineering. Especially since there are already ways to do this is a way that makes the AA happy, for instance using unions.
Aug 17 2016
On Wednesday, 17 August 2016 at 13:27:14 UTC, deadalnix wrote:Especially since there are already ways to do this is a way that makes the AA happyexactly the thing i was writing about. "hey, you, meatbag! i, Teh Great Machine, said that you have to use unions, not pointers! what? making a pointer to union point into the middle of the buffer is exactly the same aliasing problem, so unions doesn't solve anything? I, Teh Great Machine, don't care. it is your problems, meatbag, i'm not here to serve you."
Aug 17 2016
On Wed, 17 Aug 2016 21:37:11 +0000, ketmar wrote:On Wednesday, 17 August 2016 at 13:27:14 UTC, deadalnix wrote:It makes your intent more obvious. It's more obvious to other humans as well as the compiler. For me, it's a win with no downsides. For you, don't use safe and you opt out of this class of optimizations and the related restrictions.Especially since there are already ways to do this is a way that makes the AA happyexactly the thing i was writing about. "hey, you, meatbag! i, Teh Great Machine, said that you have to use unions, not pointers! what? making a pointer to union point into the middle of the buffer is exactly the same aliasing problem, so unions doesn't solve anything? I, Teh Great Machine, don't care. it is your problems, meatbag, i'm not here to serve you."
Aug 17 2016
On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:these days, so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.AA? Associative Array?
Aug 17 2016
On 17.08.2016 15:41, jmh530 wrote:On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:(Alias Analysis.)so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.AA? Associative Array?
Aug 17 2016
On Wednesday, 17 August 2016 at 13:41:09 UTC, jmh530 wrote:On Wednesday, 17 August 2016 at 12:20:28 UTC, deadalnix wrote:Alias Analysis. This is a common compiler acronym. Associative array are called map by everybody outside this forum.these days, so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.AA? Associative Array?
Aug 17 2016
On 8/17/2016 5:20 AM, deadalnix wrote:don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.At least for this case, as I mentioned in another post, if the pointer and length of the global is cached in a local, it can be cached in a register. The contents of locals don't have aliasing problems because if their addresses are not taken, nobody can point to them. Optimization relies heavily on that.
Aug 17 2016
On Wednesday, 17 August 2016 at 19:36:17 UTC, Walter Bright wrote:On 8/17/2016 5:20 AM, deadalnix wrote:But doing so would be incorrect if D doesn't provide strong aliasing guarantees. And if D does provide these guarantees, we won't need to do this manually.these days, so I don't think it's that good of a thing. Almost every single one case where Rust end up being faster than C++ is because their type system allow for more AA information available for the optimizer. AA is also key to do non GC memory management at language level.At least for this case, as I mentioned in another post, if the pointer and length of the global is cached in a local, it can be cached in a register. The contents of locals don't have aliasing problems because if their addresses are not taken, nobody can point to them. Optimization relies heavily on that.
Aug 17 2016
On 8/17/2016 3:12 PM, Yuxuan Shui wrote:But doing so would be incorrect if D doesn't provide strong aliasing guarantees. And if D does provide these guarantees, we won't need to do this manually.It would be correct for that loop if the user does it.
Aug 17 2016
On Wed, 17 Aug 2016 22:12:25 +0000, Yuxuan Shui wrote:But doing so would be incorrect if D doesn't provide strong aliasing guarantees. And if D does provide these guarantees, we won't need to do this manually.The language can analyze all code that affects a local variable in many cases. You don't always need the language to guarantee it's impossible if the compiler can see that the user isn't doing anything funky.
Aug 17 2016
On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:On Wed, 17 Aug 2016 22:12:25 +0000, Yuxuan Shui wrote:That's right. But for Ali's code, the compiler is clearly not smart enough.But doing so would be incorrect if D doesn't provide strong aliasing guarantees. And if D does provide these guarantees, we won't need to do this manually.The language can analyze all code that affects a local variable in many cases. You don't always need the language to guarantee it's impossible if the compiler can see that the user isn't doing anything funky.
Aug 17 2016
On Thu, 18 Aug 2016 01:18:03 +0000, Yuxuan Shui wrote:On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:Most of the time, the compiler can successfully make those optimizations for local variables. The compiler can seldom make them for global variables. You get into whole program analysis territory pretty fast. So it's not surprising that the compiler doesn't handle global variables as well as it does local ones.On Wed, 17 Aug 2016 22:12:25 +0000, Yuxuan Shui wrote:That's right. But for Ali's code, the compiler is clearly not smart enough.But doing so would be incorrect if D doesn't provide strong aliasing guarantees. And if D does provide these guarantees, we won't need to do this manually.The language can analyze all code that affects a local variable in many cases. You don't always need the language to guarantee it's impossible if the compiler can see that the user isn't doing anything funky.
Aug 17 2016
On 8/17/2016 9:16 PM, Chris Wright wrote:So it's not surprising that the compiler doesn't handle global variables as well as it does local ones.Global variables are pretty much spawn of the devil. You're right that dmd does not make a special effort optimizing them. The way to deal with it: 1. minimize use of globals 2. if using them in a hot loop, copy a reference to the global into a local variable, and use the local variable in the loop instead. This also gets around the problem that thread local storage address calculation is slow and inefficient on all platforms.
Aug 17 2016
On Thursday, 18 August 2016 at 01:18:03 UTC, Yuxuan Shui wrote:On Thursday, 18 August 2016 at 00:20:32 UTC, Chris Wright wrote:Perhaps not smart enough, but it is very close to being smart enough. Perhaps we, the LDC devs, weren't smart enough in using the LLVM backend. ;-) For Ali's code, `arr` is a global symbol that can be modified by code not seen by the compiler. So no assumptions can be made about the contents of arr.ptr and arr.length, and thus `arr[i]` could index into `arr` itself. Now, if we tell [*] the compiler that `arr` is not touched by code outside the module... the resulting machine code is identical with and without POINTER! It's an interesting case to look further into. For example with Link-Time Optimization, delaying codegen until linktime, we can internalize many symbols (i.e. telling the compiler no other module is going to do anything with it) allowing the compiler to reason better about the code and generate faster executables. Ideally, without much user effort. It's something I am already looking into. I don't know if we can mark `private` global variables as internal. If so, wow :-) -Johan [*] Unfortunately `private arr` does not have the desired effect (it does not make it an `internal` LLVM variable), so you have to hack into the LLVM IR file to make it happen.The language can analyze all code that affects a local variable in many cases. You don't always need the language to guarantee it's impossible if the compiler can see that the user isn't doing anything funky.That's right. But for Ali's code, the compiler is clearly not smart enough.
Aug 18 2016
On Thursday, 18 August 2016 at 12:20:50 UTC, Johan Engelen wrote:I don't know if we can mark `private` global variables as internal. If so, wow :-)Nevermind, not possible. Templates, cross-module inlining, and probably other reasons.
Aug 18 2016
On 08/18/2016 05:20 AM, Johan Engelen wrote:`arr` is a global symbol that can be modified by code not seen by the compiler. So no assumptions can be made about the contents of arr.ptr and arr.lengthYet there is the following text in the spec: * https://dlang.org/spec/statement.html#ForeachStatement "The aggregate must be loop invariant, meaning that elements to the aggregate cannot be added or removed from it in the NoScopeNonEmptyStatement." I think the spirit of the spec includes other code outside of NoScopeNonEmptyStatement as well. If so, if elements are added to the array from other code (e.g. other threads) it should be a case of "all bets are off". So, I think both .ptr and .length could be cached. Ali
Aug 18 2016
On 8/16/16 4:11 PM, Yuxuan Shui wrote:On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:Even if it did, I believe the wildcard is ubyte*. Just like in C, char* can point at anything, ubyte is D's equivalent. -SteveOn Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:Wait, doesn't D have strict aliasing rules? ubyte* (&evil) should not be allowed to alias with ubyte** (&arr.ptr).With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.As Yuxuan Shui mentioned the difference is in vectorization. The non-POINTER version is not vectorized because the semantics of the code is not the same as the POINTER version. Indexing `arr`, and writing to that address could change `arr.ptr`, and so the loop would do something different when "caching" `arr.ptr` in `p` (POINTER version) versus the case without caching (non-POINTER version). Evil code demonstrating the problem: ``` ubyte evil; ubyte[] arr; void doEvil() { // TODO: use this in the obfuscated-D contest arr = (&evil)[0..50]; } ``` The compiler somehow has to prove that `arr[i]` will never point to `arr.ptr` (it's called Alias Analysis in LLVM). Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I don't know. If so, the code is vectorizable and we can try to make it so. -Johan
Aug 16 2016
On 08/16/2016 01:49 PM, Steven Schveighoffer via Digitalmars-d wrote:On 8/16/16 4:11 PM, Yuxuan Shui wrote:I think what you say is true (look at the code of std.outbuffer), but IIRC the documentation says that's supposed to be the job of void*.On Tuesday, 16 August 2016 at 17:51:13 UTC, Johan Engelen wrote:Even if it did, I believe the wildcard is ubyte*. Just like in C, char* can point at anything, ubyte is D's equivalent. -SteveOn Tuesday, 16 August 2016 at 01:28:05 UTC, Ali Çehreli wrote:Wait, doesn't D have strict aliasing rules? ubyte* (&evil) should not be allowed to alias with ubyte** (&arr.ptr).With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.As Yuxuan Shui mentioned the difference is in vectorization. The non-POINTER version is not vectorized because the semantics of the code is not the same as the POINTER version. Indexing `arr`, and writing to that address could change `arr.ptr`, and so the loop would do something different when "caching" `arr.ptr` in `p` (POINTER version) versus the case without caching (non-POINTER version). Evil code demonstrating the problem: ``` ubyte evil; ubyte[] arr; void doEvil() { // TODO: use this in the obfuscated-D contest arr = (&evil)[0..50]; } ``` The compiler somehow has to prove that `arr[i]` will never point to `arr.ptr` (it's called Alias Analysis in LLVM). Perhaps it is UB in D to have `arr[i]` ever point into `arr` itself, I don't know. If so, the code is vectorizable and we can try to make it so. -Johan
Aug 16 2016
On 8/16/16 7:23 PM, Charles Hixson via Digitalmars-d wrote:On 08/16/2016 01:49 PM, Steven Schveighoffer via Digitalmars-d wrote:void * is almost useless. In D you can assign a void[] from another void[], but other than that, there's no way to write the memory or read it. In C, void * is also allowed to alias any other pointer. But char * is also allowed to provide arbitrary byte reading/writing. I'd expect that D also would provide a similar option. -SteveOn 8/16/16 4:11 PM, Yuxuan Shui wrote:I think what you say is true (look at the code of std.outbuffer), but IIRC the documentation says that's supposed to be the job of void*.Wait, doesn't D have strict aliasing rules? ubyte* (&evil) should not be allowed to alias with ubyte** (&arr.ptr).Even if it did, I believe the wildcard is ubyte*. Just like in C, char* can point at anything, ubyte is D's equivalent.
Aug 17 2016
On Wednesday, 17 August 2016 at 14:21:32 UTC, Steven Schveighoffer wrote:void * is almost useless. In D you can assign a void[] from another void[], but other than that, there's no way to write the memory or read it. In C, void * is also allowed to alias any other pointer. But char * is also allowed to provide arbitrary byte reading/writing. I'd expect that D also would provide a similar option. -SteveYes, but everything can alias with void*/void[] . Thus, you can cast from void* to T* "safely".
Aug 17 2016
On 8/17/16 10:38 AM, deadalnix wrote:On Wednesday, 17 August 2016 at 14:21:32 UTC, Steven Schveighoffer wrote:Sure, but how do you implement, let's say, byte swapping on an integer? ubyte[] x = &myInt[0 .. 1]; foreach(i; 0 .. x.length / 2) swap(x[i], x[$-i-1]); So if the compiler can assume that x can't point at myInt, and thus myInt can't have changed, then we have a problem. You just can't do this with void (or at least not very easily). -Stevevoid * is almost useless. In D you can assign a void[] from another void[], but other than that, there's no way to write the memory or read it. In C, void * is also allowed to alias any other pointer. But char * is also allowed to provide arbitrary byte reading/writing. I'd expect that D also would provide a similar option.Yes, but everything can alias with void*/void[] . Thus, you can cast from void* to T* "safely".
Aug 17 2016
How about the specific case of array indexing? Is it UB to have `arr[i]` ever point into `arr` itself, or can we make it UB? -Johan
Aug 17 2016
On 8/15/2016 6:28 PM, Ali Çehreli wrote:void main() { auto p = arr.ptr; foreach (j; 0 .. 100) { foreach (i; 0..arr.length) { version (POINTER) { p[i] += cast(ubyte)i; } else { arr[i] += cast(ubyte)i; } } } }When accessing global arrays like this, cache the address of the data in a local variable. This will enable the compiler to enregister it. Putting global data in registers is problematic because any assignment through a pointer could change it, so the compiler takes a pessimistic view of it. Your POINTER version does cache the pointer, but arr.length needs to be cached as well.
Aug 16 2016