digitalmars.D - C `restrict` keyword in D
- Uknown (41/41) Sep 02 2017 In C, the `restrict` keyword implies that 2 or more pointer
- Moritz Maxeiner (3/22) Sep 02 2017 How does the compiler know which member of RCArray!int to check
- Uknown (28/54) Sep 02 2017 If I understand C's version of restrict correctly, the pointers
- Moritz Maxeiner (12/54) Sep 03 2017 AFAICT that's not what's needed here for safety, though: RCArray
- Uknown (71/76) Sep 03 2017 Well, I thought about it, I have to agree with you, as far as
- Moritz Maxeiner (23/68) Sep 03 2017 References are just non-null syntax for pointers that take
- Uknown (16/88) Sep 03 2017 Yes. But this is what makes them so useful. You don't have to
- Moritz Maxeiner (12/77) Sep 03 2017 Indeed, but it also means that - other than null dereferencing -
- Uknown (3/10) Sep 03 2017 I think I understand now. Thanks!
- ag0aep6g (15/18) Sep 04 2017 Why "other than null dereferencing"? You can dereference a null pointer
- Moritz Maxeiner (8/17) Sep 04 2017 Because I was ignorant and apparently wrong, thanks for the
- ag0aep6g (13/17) Sep 04 2017 I'm only aware of this part of the spec, which doesn't say much about
- Moritz Maxeiner (9/28) Sep 04 2017 Yes, which is why I wrongly assumed that turning a null pointer
- Johan Engelen (21/41) Sep 04 2017 LDC treats passing `null` to a reference parameter as UB.
- Moritz Maxeiner (25/67) Sep 04 2017 Ok, that's good to know, though it'd be nice for this to be
- Johan Engelen (19/35) Sep 05 2017 My point was that that is not workable. The "null dereference" is
- dukc (3/20) Sep 05 2017 Perhaps it should nullcheck exceptionally large types which may
- Moritz Maxeiner (21/57) Sep 05 2017 While "null dereference" is a language construct "null" is
- Jonathan M Davis via Digitalmars-d (22/57) Sep 05 2017 dmd and the spec were written with the assumption that the CPU is going ...
- Johan Engelen (21/24) Sep 06 2017 In my terminology, "dereference" is a language spec term. It is
- Jonathan M Davis via Digitalmars-d (20/43) Sep 06 2017 I would argue that if the dereferencing of the pointer is optimized out,
- Cecil Ward (9/12) Sep 06 2017 That's just an observation based on a detail of a particular
- Dukc (5/7) Sep 04 2017 I really don't see where the restrict keyword is needed at all,
- Johan Engelen (9/16) Sep 04 2017 It's need for auto-vectorization, for example.
- Dukc (15/22) Sep 05 2017 That probably explains it in case of c. But I still think that D
- Petar Kirov [ZombineDev] (2/7) Sep 06 2017 You mean like this: https://github.com/dlang/druntime/pull/1891?
- Uknown (7/14) Sep 05 2017 You can see the second answer on this Stack overflow question[1].
- jmh530 (22/24) Sep 04 2017 You might find interesting the reference capabilities of Pony:
In C, the `restrict` keyword implies that 2 or more pointer arguments in a function call do not point to the same data. This allows for some additional optimizations which were not possible before, finally making C as fast as Fortran. e.g. This is the new definition for memcpy in C99 void* memcpy(void *restrict dst, const void *restrict src, size_t n); `dst` and `src` should never point to the same block of memory, and this is enforced by the programmer. In D, it makes sense to add a similar functionality, that extends beyond just performance optimizations. It could potentially be used to better guarantee safety of some code. e.g. (from discussions about ref counting in D) : void main() safe { auto arr = RCArray!int([0]); foo(arr, arr[0]); } void foo(ref RCArray!int arr, ref int val) safe { { auto copy = arr; //arr's (and copy's) reference counts are both 2 arr = RCArray!int([]); // There is another owner, so arr // forgets about the old payload } // Last owner of the array ('copy') gets destroyed and happily // frees the payload. val = 3; // Oops. } Here, adding `restrict` to foo's parameters like so : void foo(restrict ref RCArray!int arr, restrict ref int val) would make the compiler statically enforce the fact that neither references are pointing to the same data. This would cause an error in main, since arr[0] is from the same block of memory as arr. The same would apply for pointers. I just hope to have a nice discussion on this topic here. Thanks! Read more about `restrict` here : http://en.cppreference.com/w/c/language/restrict
Sep 02 2017
On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:[...] void foo(ref RCArray!int arr, ref int val) safe { { auto copy = arr; //arr's (and copy's) reference counts are both 2 arr = RCArray!int([]); // There is another owner, so arr // forgets about the old payload } // Last owner of the array ('copy') gets destroyed and happily // frees the payload. val = 3; // Oops. } Here, adding `restrict` to foo's parameters like so : void foo(restrict ref RCArray!int arr, restrict ref int val) would make the compiler statically enforce the fact that neither references are pointing to the same data. This would cause an error in main, since arr[0] is from the same block of memory as arr.How does the compiler know which member of RCArray!int to check for pointing to the same memory chunk as val?
Sep 02 2017
On Sunday, 3 September 2017 at 03:49:21 UTC, Moritz Maxeiner wrote:On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:If I understand C's version of restrict correctly, the pointers must not refer to the same block. So extending the same here, val should not be allowed to be a reference to any members of RCArray!int. This does seem to get get more confusing when the heap is involved as a member of a struct. e.g. void main() safe { struct HeapAsMember { int* _someArr; } HeapAsMember x; x._someArr = new int; void foo(restrict ref HeapAsMember x, restrict ref int val) safe { x._someArr = new int; val = 0; } foo(x, x._someArr[0]); } I feel that in this case, the compiler should throw an error, since val would be a reference to a member pointed to by _someArr, which is a member of x. Although, I wonder if such analysis would be feasible? This case is trivial, but there could be more complicated cases.[...] void foo(ref RCArray!int arr, ref int val) safe { { auto copy = arr; //arr's (and copy's) reference counts are both 2 arr = RCArray!int([]); // There is another owner, so arr // forgets about the old payload } // Last owner of the array ('copy') gets destroyed and happily // frees the payload. val = 3; // Oops. } Here, adding `restrict` to foo's parameters like so : void foo(restrict ref RCArray!int arr, restrict ref int val) would make the compiler statically enforce the fact that neither references are pointing to the same data. This would cause an error in main, since arr[0] is from the same block of memory as arr.How does the compiler know which member of RCArray!int to check for pointing to the same memory chunk as val?
Sep 02 2017
On Sunday, 3 September 2017 at 06:11:10 UTC, Uknown wrote:On Sunday, 3 September 2017 at 03:49:21 UTC, Moritz Maxeiner wrote:AFAICT that's not what's needed here for safety, though: RCArray will have a member (`data`, `store`, or something like that) pointing to the actual elements (usually on the heap). You essentially want `val` not to point into the same memory chunk as `data` points into, which is different from `val` not to point to a member of RCArray.On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:If I understand C's version of restrict correctly, the pointers must not refer to the same block. So extending the same here, val should not be allowed to be a reference to any members of RCArray!int.[...] void foo(ref RCArray!int arr, ref int val) safe { { auto copy = arr; //arr's (and copy's) reference counts are both 2 arr = RCArray!int([]); // There is another owner, so arr // forgets about the old payload } // Last owner of the array ('copy') gets destroyed and happily // frees the payload. val = 3; // Oops. } Here, adding `restrict` to foo's parameters like so : void foo(restrict ref RCArray!int arr, restrict ref int val) would make the compiler statically enforce the fact that neither references are pointing to the same data. This would cause an error in main, since arr[0] is from the same block of memory as arr.How does the compiler know which member of RCArray!int to check for pointing to the same memory chunk as val?This does seem to get get more confusing when the heap is involved as a member of a struct. e.g. [...]Right, that's essentially what an RCArray does, as well.I feel that in this case, the compiler should throw an error, since val would be a reference to a member pointed to by _someArr, which is a member of x. Although, I wonder if such analysis would be feasible? This case is trivial, but there could be more complicated cases.The main issue I see is that pointers/references can change at runtime, so I don't think a static analysis in the compiler can cover this in general (which, I think, is also why the C99 keyword is an optimization hint only).
Sep 03 2017
On Sunday, 3 September 2017 at 12:59:25 UTC, Moritz Maxeiner wrote:[...] The main issue I see is that pointers/references can change at runtime, so I don't think a static analysis in the compiler can cover this in general (which, I think, is also why the C99 keyword is an optimization hint only).Well, I thought about it, I have to agree with you, as far as pointers go. There seems to be no simple way in which the compiler can safely ensure that the two restrict pointers point to the same data. But fir references, it seems trivial. In order to do so, RCArray would have to first annotate it's opIndex, opSlice and any other data returning member functions with the restrict keyword. e.g. struct RCArray(T) safe { private T[] _payload; /+some other functions needed to implement RCArray correctly+/ restrict ref T opIndex(size_t i) { //implimentation as usual return _payload[i]; } restrict ref T opIndex() { return _payload; } //opSlice and the rest defined similary } void main() safe { RCArray!int my_array; ... auto t = my_array[0];//error: my_array.opIndex(0) is defined as restrict //This essentialy prevents a second reference from existing in the same scope foo(arr, arr[0]); //error: call to foo introduces `restrict` data from the same container //into the scope of foo } void foo(ref RCArray!int arr, ref int val) safe { { auto copy = arr; //arr's (and copy's) reference counts are both 2 arr = RCArray!int([]); // There is another owner, so arr // forgets about the old payload } // Last owner of the array ('copy') gets destroyed and happily // frees the payload. val = 3;//No longer an issue! } This is now no longer like the C99 keyword in behaviour, but on the bright side, with one annotation to the return types, RCArray suddenly doesn't need to worry about escaped references. Also, no modifications would be needed to the foo function. This is potentially useful for other owning container types. The compiler could still use the information gainer from the restrict annotation for optimizations, although such optimizations would be much less aggressive than in C. Coming back to pointers, the only way I can see (short of bringing Rust's borrow checker to D) is to add additional annotations to function return values. The problem comes with code like this : int * foo() safe { static int[1] data; return &data[0]; } void main() { int * restrict p1 = foo(); int * restrict p2 = foo();//Should be error, but the compiler can't figure //this out without further annotations }
Sep 03 2017
On Sunday, 3 September 2017 at 15:39:58 UTC, Uknown wrote:On Sunday, 3 September 2017 at 12:59:25 UTC, Moritz Maxeiner wrote:References are just non-null syntax for pointers that take addresses implicitly on function call. Issues not related to null that pertain to pointers translate to references, as any (non-null) pointer can be turned into a reference (and vice versa): --- void foo(int* a, bool b) { if (b) bar(a); else baz(*a); } void bar(int* a) {} void baz(ref int a) { bar(&a); } ---[...] The main issue I see is that pointers/references can change at runtime, so I don't think a static analysis in the compiler can cover this in general (which, I think, is also why the C99 keyword is an optimization hint only).Well, I thought about it, I have to agree with you, as far as pointers go. There seems to be no simple way in which the compiler can safely ensure that the two restrict pointers point to the same data. But fir references, it seems trivial.In order to do so, RCArray would have to first annotate it's opIndex, opSlice and any other data returning member functions with the restrict keyword. e.g. struct RCArray(T) safe { private T[] _payload; /+some other functions needed to implement RCArray correctly+/ restrict ref T opIndex(size_t i) { //implimentation as usual return _payload[i]; } restrict ref T opIndex() { return _payload; } //opSlice and the rest defined similary } [...]Note: There's no need to attribute the RCArray template as safe (other than for debugging when developing the template). The compiler will derive it for each member if they are indeed safe. W.r.t. the rest: I don't think treating references as different from pointers can be done correctly, as any pointers/references can be interchanged at runtime.Coming back to pointers, the only way I can see (short of bringing Rust's borrow checker to D) is to add additional annotations to function return values. The problem comes with code like this : int * foo() safe { static int[1] data; return &data[0]; } void main() { int * restrict p1 = foo(); int * restrict p2 = foo();//Should be error, but the compiler can't figure //this out without further annotations }Dealing with pointer aliasing in a generic way is a hard problem :p
Sep 03 2017
On Sunday, 3 September 2017 at 16:55:51 UTC, Moritz Maxeiner wrote:On Sunday, 3 September 2017 at 15:39:58 UTC, Uknown wrote:Yes. But this is what makes them so useful. You don't have to worry about null dereferences.On Sunday, 3 September 2017 at 12:59:25 UTC, Moritz Maxeiner wrote:References are just non-null syntax for pointers that take addresses implicitly on function call. Issues not related to null that pertain to pointers translate to references, as any (non-null) pointer can be turned into a reference (and vice versa): --- void foo(int* a, bool b) { if (b) bar(a); else baz(*a); } void bar(int* a) {} void baz(ref int a) { bar(&a); } ---[...] The main issue I see is that pointers/references can change at runtime, so I don't think a static analysis in the compiler can cover this in general (which, I think, is also why the C99 keyword is an optimization hint only).Well, I thought about it, I have to agree with you, as far as pointers go. There seems to be no simple way in which the compiler can safely ensure that the two restrict pointers point to the same data. But fir references, it seems trivial.Indeed. I just wrote it to emphasize on the fact that its safe.In order to do so, RCArray would have to first annotate it's opIndex, opSlice and any other data returning member functions with the restrict keyword. e.g. struct RCArray(T) safe { private T[] _payload; /+some other functions needed to implement RCArray correctly+/ restrict ref T opIndex(size_t i) { //implimentation as usual return _payload[i]; } restrict ref T opIndex() { return _payload; } //opSlice and the rest defined similary } [...]Note: There's no need to attribute the RCArray template as safe (other than for debugging when developing the template). The compiler will derive it for each member if they are indeed safe.W.r.t. the rest: I don't think treating references as different from pointers can be done correctly, as any pointers/references can be interchanged at runtime.I'm not sure I understand how one could switch between pointers and refs at runtime. Could you please elaborate a bit or link to an example? Thanks.Yep! I feel there's little point in discussing the introduction of a new keyword if it only works on returning `ref` and has none of the original optimization advantages C brought. On a side note, C99 added `inline` and `restrict`, 2 new keywords, without any worry of potentially breaking existing code. Normally they would have dded _Restrict and _Inline, and then #defined those.Coming back to pointers, the only way I can see (short of bringing Rust's borrow checker to D) is to add additional annotations to function return values. The problem comes with code like this : int * foo() safe { static int[1] data; return &data[0]; } void main() { int * restrict p1 = foo(); int * restrict p2 = foo();//Should be error, but the compiler can't figure //this out without further annotations }Dealing with pointer aliasing in a generic way is a hard problem :p
Sep 03 2017
On Monday, 4 September 2017 at 02:43:48 UTC, Uknown wrote:On Sunday, 3 September 2017 at 16:55:51 UTC, Moritz Maxeiner wrote:Indeed, but it also means that - other than null dereferencing - pointer issues can by made into reference issues my dereferencing a pointer and passing that into a function that takes that parameter by reference.On Sunday, 3 September 2017 at 15:39:58 UTC, Uknown wrote:Yes. But this is what makes them so useful. You don't have to worry about null dereferences.On Sunday, 3 September 2017 at 12:59:25 UTC, Moritz Maxeiner wrote:References are just non-null syntax for pointers that take addresses implicitly on function call. Issues not related to null that pertain to pointers translate to references, as any (non-null) pointer can be turned into a reference (and vice versa): --- void foo(int* a, bool b) { if (b) bar(a); else baz(*a); } void bar(int* a) {} void baz(ref int a) { bar(&a); } ---[...] The main issue I see is that pointers/references can change at runtime, so I don't think a static analysis in the compiler can cover this in general (which, I think, is also why the C99 keyword is an optimization hint only).Well, I thought about it, I have to agree with you, as far as pointers go. There seems to be no simple way in which the compiler can safely ensure that the two restrict pointers point to the same data. But fir references, it seems trivial.What I meant (and apparently poorly expressed) is that you can turn a pointer into a reference (as long as it's not null) and taking the address of a "ref" yields a pointer and as in my `foo` example in the above, which path is taken can change at runtime. You can, e.g. generate a reference to an object's member without the compiler being able to detect it by calculating the appropriate pointer and then dereferencing it.Indeed. I just wrote it to emphasize on the fact that its safe.In order to do so, RCArray would have to first annotate it's opIndex, opSlice and any other data returning member functions with the restrict keyword. e.g. struct RCArray(T) safe { private T[] _payload; /+some other functions needed to implement RCArray correctly+/ restrict ref T opIndex(size_t i) { //implimentation as usual return _payload[i]; } restrict ref T opIndex() { return _payload; } //opSlice and the rest defined similary } [...]Note: There's no need to attribute the RCArray template as safe (other than for debugging when developing the template). The compiler will derive it for each member if they are indeed safe.W.r.t. the rest: I don't think treating references as different from pointers can be done correctly, as any pointers/references can be interchanged at runtime.I'm not sure I understand how one could switch between pointers and refs at runtime. Could you please elaborate a bit or link to an example? Thanks.
Sep 03 2017
On Monday, 4 September 2017 at 04:10:44 UTC, Moritz Maxeiner wrote:What I meant (and apparently poorly expressed) is that you can turn a pointer into a reference (as long as it's not null) and taking the address of a "ref" yields a pointer and as in my `foo` example in the above, which path is taken can change at runtime. You can, e.g. generate a reference to an object's member without the compiler being able to detect it by calculating the appropriate pointer and then dereferencing it.I think I understand now. Thanks!
Sep 03 2017
On 09/04/2017 06:10 AM, Moritz Maxeiner wrote:Indeed, but it also means that - other than null dereferencing - pointer issues can by made into reference issues my dereferencing a pointer and passing that into a function that takes that parameter by reference.Why "other than null dereferencing"? You can dereference a null pointer and pass it in a ref parameter. That doesn't crash at the call site, but only when the callee accesses the parameter: ---- int f(ref int x, bool b) { return b ? x : 0; } void main() { int* p = null; /* Syntactically a null dereference, but doesn't crash: */ f(*p, false); /* This crashes: */ f(*p, true); } ----
Sep 04 2017
On Monday, 4 September 2017 at 09:15:30 UTC, ag0aep6g wrote:On 09/04/2017 06:10 AM, Moritz Maxeiner wrote:Because I was ignorant and apparently wrong, thanks for the correction. Still, though, this is surprising to me, because this means taking the address of a parameter passed by reference (which is in your case is typed as an existing int) can be null. Is this documented somewhere (couldn't find it in the spec and it seems like a bug to me)?Indeed, but it also means that - other than null dereferencing - pointer issues can by made into reference issues my dereferencing a pointer and passing that into a function that takes that parameter by reference.Why "other than null dereferencing"? You can dereference a null pointer and pass it in a ref parameter. That doesn't crash at the call site, but only when the callee accesses the parameter: [...]
Sep 04 2017
On 09/04/2017 11:47 AM, Moritz Maxeiner wrote:Still, though, this is surprising to me, because this means taking the address of a parameter passed by reference (which is in your case is typed as an existing int) can be null. Is this documented somewhere (couldn't find it in the spec and it seems like a bug to me)?I'm only aware of this part of the spec, which doesn't say much about ref parameters: https://dlang.org/spec/function.html#parameters g++ accepts the equivalent C++ code and shows the same behavior. But, as far as I can tell, it's undefined behavior there, because dereferencing null has undefined behavior. In D, dereferencing a null pointer is expected to crash the program. It's allowed in safe code with that expectation. So it seems to have defined behavior that way. But if a dereferencing null must crash the program, shouldn't my code crash at the call site? Or is there an exception for ref parameters? Any way, the spec seems to be missing some paragraphs that clear all this up.
Sep 04 2017
On Monday, 4 September 2017 at 10:24:48 UTC, ag0aep6g wrote:On 09/04/2017 11:47 AM, Moritz Maxeiner wrote:Yes, which is why I wrongly assumed that turning a null pointer into a reference would crash the program (as such references can't be tested for being null, you'd have to turn them back into a pointer to test).Still, though, this is surprising to me, because this means taking the address of a parameter passed by reference (which is in your case is typed as an existing int) can be null. Is this documented somewhere (couldn't find it in the spec and it seems like a bug to me)?I'm only aware of this part of the spec, which doesn't say much about ref parameters: https://dlang.org/spec/function.html#parameters g++ accepts the equivalent C++ code and shows the same behavior. But, as far as I can tell, it's undefined behavior there, because dereferencing null has undefined behavior. In D, dereferencing a null pointer is expected to crash the program. It's allowed in safe code with that expectation. So it seems to have defined behavior that way.But if a dereferencing null must crash the program, shouldn't my code crash at the call site? Or is there an exception for ref parameters? Any way, the spec seems to be missing some paragraphs that clear all this up.Yes, that is what I meant by saying it looks like a bug to me. It really ought to crash at the call site imho; this would require injecting null checks at the call site when the argument is a pointer dereference.
Sep 04 2017
On Monday, 4 September 2017 at 09:47:12 UTC, Moritz Maxeiner wrote:On Monday, 4 September 2017 at 09:15:30 UTC, ag0aep6g wrote:LDC treats passing `null` to a reference parameter as UB. It doesn't matter when the program crashes after passing null to ref, exactly because it is UB. Because the caller has to do the dereferencing (semantically) you only have to do the null-check in the caller, and not in callee. This removes a ton of manual null-ptr checks from the code, and enables more optimizations too. For class parameters, they are pointers not references, as in: it is _not_ UB to pass-in `null`. Very unfortunate, because it necessitates null-ptr checks everywhere in the code, and hurts performance due to missed optimization opportunities. (The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.) - JohanOn 09/04/2017 06:10 AM, Moritz Maxeiner wrote:Because I was ignorant and apparently wrong, thanks for the correction. Still, though, this is surprising to me, because this means taking the address of a parameter passed by reference (which is in your case is typed as an existing int) can be null. Is this documented somewhere (couldn't find it in the spec and it seems like a bug to me)?Indeed, but it also means that - other than null dereferencing - pointer issues can by made into reference issues my dereferencing a pointer and passing that into a function that takes that parameter by reference.Why "other than null dereferencing"? You can dereference a null pointer and pass it in a ref parameter. That doesn't crash at the call site, but only when the callee accesses the parameter: [...]
Sep 04 2017
On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:On Monday, 4 September 2017 at 09:47:12 UTC, Moritz Maxeiner wrote:Ok, that's good to know, though it'd be nice for this to be defined somewhere in the language spec.On Monday, 4 September 2017 at 09:15:30 UTC, ag0aep6g wrote:LDC treats passing `null` to a reference parameter as UB. It doesn't matter when the program crashes after passing null to ref, exactly because it is UB.On 09/04/2017 06:10 AM, Moritz Maxeiner wrote:Because I was ignorant and apparently wrong, thanks for the correction. Still, though, this is surprising to me, because this means taking the address of a parameter passed by reference (which is in your case is typed as an existing int) can be null. Is this documented somewhere (couldn't find it in the spec and it seems like a bug to me)?Indeed, but it also means that - other than null dereferencing - pointer issues can by made into reference issues my dereferencing a pointer and passing that into a function that takes that parameter by reference.Why "other than null dereferencing"? You can dereference a null pointer and pass it in a ref parameter. That doesn't crash at the call site, but only when the callee accesses the parameter: [...]Because the caller has to do the dereferencing (semantically) you only have to do the null-check in the caller, and not in callee. This removes a ton of manual null-ptr checks from the code, and enables more optimizations too.Indeed, which is why I currently think the spec should state that this isn't UB, but has to crash at the call site.For class parameters, they are pointers not references, as in: it is _not_ UB to pass-in `null`. Very unfortunate, because it necessitates null-ptr checks everywhere in the code, and hurts performance due to missed optimization opportunities.Well, technically they are "class references". In any case, they don't require injecting null checks from the compiler in general, as using them in any way will be a null dereference (which the hardware&OS are required to turn into a crash).(The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not safe D)". The issue concerning turning a pointer into a reference parameter is that when reading the code it looks like the dereference is happening at the call site, while the resulting compiled executable will actually perform the (null) dereference inside the function on use of the reference parameter. That is why I think the null check should be injected at the call site, because depending on platform support for the crash will may yield the wrong result (if the reference parameter isn't actually used in the function, it won't crash, even though it *should*). [1] https://forum.dlang.org/post/udkdqogtrvanhbotdoik forum.dlang.org
Sep 04 2017
On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner wrote:On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant): ``` struct S { ubyte[1_000_000] a; int b; } void main() { S* s = null; s.b = 1; } ``` -Johan(The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not safe D)".
Sep 05 2017
On Tuesday, 5 September 2017 at 18:32:34 UTC, Johan Engelen wrote:My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant): ``` struct S { ubyte[1_000_000] a; int b; } void main() { S* s = null; s.b = 1; } ``` -JohanPerhaps it should nullcheck exceptionally large types which may overflow the memory protected area, but not others?
Sep 05 2017
On Tuesday, 5 September 2017 at 18:32:34 UTC, Johan Engelen wrote:On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner wrote:While "null dereference" is a language construct "null" is defined as actual address zero (like it's defined in C/C++ by implementation) and dereference means r/w from/to that virtual memory address, it is something the machine does: Namely, memory protection, because the page for address 0 is (usually) not mapped (and D requires it to not be mapped for safe to work), accessing it will lead to a page fault, which in turn leads to a segmentation fault and then program crash.On Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing.(The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not safe D)".It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant): ``` struct S { ubyte[1_000_000] a; int b; } void main() { S* s = null; s.b = 1; } ```In order to be spec compliant and correct a compiler would only need to inject null checks on dereferences where the size of the object being pointed to (in your example S.sizeof) is larger than the bottom virtual memory segment of the target OS (the one which no C compatible OS maps automatically and you also shouldn't map manually). The size of that bottom segment, however, is usually _deliberately_ large precisely so that buggy (C) programs crash on NULL dereference (even with structures as the above), so in practice, unless you invalidate assumptions about expected maximum structure sizes made by the OS, null dereferences can be assumed to crash.
Sep 05 2017
On Tuesday, September 05, 2017 18:32:34 Johan Engelen via Digitalmars-d wrote:On Monday, 4 September 2017 at 21:23:50 UTC, Moritz Maxeiner wrote:dmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds. The problem of course is the case that you bring up where you're dealing with objects that are large enough that the CPU can't do that anymore. And as Moritz points out, all that's required to fix that is to insert null checks for those types. It shouldn't be necessary at all for the vast majority of types. The CPU already handles them correctly - at least on any x86-based system. I would expect any other modern CPU to do the same, but I'm not familiar enough with other such systems to know for sure. Regardless, there definitely should be no need to insert null checks all over the place in any x86-based code. At most, it's needed in a few places to deal with abnormally large objects. Regardless, for safe to do its job, the program does need to crash when dereferencing null. So, if the CPU can't do the checks like the spec currently assumes, then the compiler is going to need to insert the checks, and while that may hurt performance, I don't think that there's really any way around that while still ensuring that safe code does not corrupt memory or access memory that it's not supposed to. system code could skip it to get the full performance, but safe is stuck. - Jonathan M DavisOn Monday, 4 September 2017 at 17:58:41 UTC, Johan Engelen wrote:My point was that that is not workable. The "null dereference" is a D language construct, not something that the machine is doing. It's ridiculous to specify that reading from address 1_000_000 should crash the program, yet that is exactly what is specified by D when running this code (and thus null checks need to be injected in many places to be spec compliant): ``` struct S { ubyte[1_000_000] a; int b; } void main() { S* s = null; s.b = 1; } ```(The spec requires crashing on null dereferencing, but this spec bit is ignored by DMD and LDC, I assume in GDC too. Crashing on `null` dereferencing requires a null-check on every dereferencing through an unchecked pointer, because 0 might be a valid memory access, and also because ptr->someDataField is not going to lookup address 0, but 0+offsetof(someDataField) instead, e.g. potentially addressing a valid low address at 1000000, say.)It's not implemented as compiler checks because the "actual" requirement is "the platform has to crash on null dereference" (see the discussion in/around [1]). Essentially: "if your platform doesn't crash on null dereference, don't use D on it (at the very least not safe D)".
Sep 05 2017
On Tuesday, 5 September 2017 at 22:59:12 UTC, Jonathan M Davis wrote:dmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds.In my terminology, "dereference" is a language spec term. It is not directly related to what the CPU is doing. ``` struct S { void nothing() {} } void foo (S* s) { (*s).nothing(); //dereference must crash on null? } ``` If you call the `(*s)` a dereference, then you are agreeing with "my" dereference terminology. ( used the * for extra clarity; "s.nothing()" is the same.) In LDC, dereferencing a null ptr is UB. DMD is assuming the same, or something similar. Otherwise DMD wouldn't be able to optimize foo in this example to an empty body as it does currently. (go go null-checks everywhere) -Johan
Sep 06 2017
On Wednesday, September 06, 2017 19:40:16 Johan Engelen via Digitalmars-d wrote:On Tuesday, 5 September 2017 at 22:59:12 UTC, Jonathan M Davis wrote:I would argue that if the dereferencing of the pointer is optimized out, then it is never dereferenced, and therefore, it doesn't need to crash. It's only if it's actually dereferenced that the crashing needs to occur, because that's what's need for safe to be safe. I can totally believe that the spec needs to be clearer about this, but I would definitely interpret it to mean that if the pointer is actually dereferenced, the program must crash and not that your code example must crash even if it's optimized. And I would be surprised if Walter meant anything else. He just isn't always good about writing the spec in a way that others agree that it means what he meant, and to be fair, it can be very hard to write things in an unambiguous way. Regardless, I don't see a problem here - or a need to insert a bunch of null checks. The spec should probably be clarified, but the only thing that I'm aware of that I would consider a hole in dereferencing null right now is the fact that it relies on the CPU to segfault the program in cases where the object is too large for that to occur - and in those cases, null checks really should be inserted. For objects that are small enough to trigger segfaults with null, null checks should not be necessary. - Jonathan M Davisdmd and the spec were written with the assumption that the CPU is going to segfault your program when you dereference a null pointer. In the vast majority of cases, that assumption holds.In my terminology, "dereference" is a language spec term. It is not directly related to what the CPU is doing. ``` struct S { void nothing() {} } void foo (S* s) { (*s).nothing(); //dereference must crash on null? } ``` If you call the `(*s)` a dereference, then you are agreeing with "my" dereference terminology. ( used the * for extra clarity; "s.nothing()" is the same.) In LDC, dereferencing a null ptr is UB. DMD is assuming the same, or something similar. Otherwise DMD wouldn't be able to optimize foo in this example to an empty body as it does currently. (go go null-checks everywhere)
Sep 06 2017
On Monday, 4 September 2017 at 09:15:30 UTC, ag0aep6g wrote:On 09/04/2017 06:10 AM, Moritz Maxeiner wrote: That doesn't crash at the call site, but only when the callee accesses the parameter:That's just an observation based on a detail of a particular compiler implementation. It's simply not true in general but might appear that way in a particular case. Did you inspect the generated code? If the entire thing has been _inlined_ and properly optimised as decent modern compilers most definitely all do _when the correct switches are used_, then looking at the code there is no such thing as caller and callee - it's all just a stream of code.
Sep 06 2017
On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:In C, the `restrict` keyword implies that 2 or more pointer arguments in a function call do not point to the same data.I really don't see where the restrict keyword is needed at all, neither in C nor in D. If you want to imply to the compiler that there is no need to reload the pointed data between uses, just assign it to a local.
Sep 04 2017
On Monday, 4 September 2017 at 14:28:14 UTC, Dukc wrote:On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:It's need for auto-vectorization, for example. I would support an LDC PR for adding a magic UDA to be able to attach 'restrict' with C-semantics to function parameters. E.g. ``` // add restrict to parameters 1 and 2 void foo(int, int*, int*) restrict(1,2) ``` -JohanIn C, the `restrict` keyword implies that 2 or more pointer arguments in a function call do not point to the same data.I really don't see where the restrict keyword is needed at all, neither in C nor in D. If you want to imply to the compiler that there is no need to reload the pointed data between uses, just assign it to a local.
Sep 04 2017
On Monday, 4 September 2017 at 18:03:51 UTC, Johan Engelen wrote:It's need for auto-vectorization, for example. I would support an LDC PR for adding a magic UDA to be able to attach 'restrict' with C-semantics to function parameters. E.g. ``` // add restrict to parameters 1 and 2 void foo(int, int*, int*) restrict(1,2) ```That probably explains it in case of c. But I still think that D might be able to do this better without language changes. This way (not compiler-checked for errors): ``` for(int i = 0; i < a.length; i+=8) { int[8] aVec = a[i .. i+8], bVec = b[i .. i+8], cVec; foreach(j; 0 .. 8) cVec[j] = aVec[j].foo(bVec[j]); c[i .. i+8] = cVec[]; } ``` Of course, if we want to support this we should construct a high-level library template that chooses the correct vector size for the platform, eliminates that outer for loop and handles uneven array lenghts.
Sep 05 2017
On Tuesday, 5 September 2017 at 15:46:13 UTC, Dukc wrote:[..] Of course, if we want to support this we should construct a high-level library template that chooses the correct vector size for the platform, eliminates that outer for loop and handles uneven array lenghts.You mean like this: https://github.com/dlang/druntime/pull/1891?
Sep 06 2017
On Wednesday, 6 September 2017 at 09:21:59 UTC, Petar Kirov [ZombineDev] wrote:On Tuesday, 5 September 2017 at 15:46:13 UTC, Dukc wrote:No. I meant a function which, given an array, returns a range over that array which internally reads many elements at once from the array by copying them to a static array for handling. Then the compiler knows it can take advantage of optimizations like that pull request, because it knows static arrays can't overlap, even if the original arguments do. Of course the user should not call that function if the arrays do overlap, or if the loop body mutates other elements. See David Simcha's talk at DConf 13 at 37:30, that's the basic idea how I'm thinking the range would internally iterate. https://www.youtube.com/watch?v=yMNMV9JlkcQ&list=PLpISZoFBH1xtyA6uBsNyQH8P3lx92U64V&index=16[..] Of course, if we want to support this we should construct a high-level library template that chooses the correct vector size for the platform, eliminates that outer for loop and handles uneven array lenghts.You mean like this: https://github.com/dlang/druntime/pull/1891?
Sep 06 2017
On Wednesday, 6 September 2017 at 17:30:44 UTC, Dukc wrote:See David Simcha's talk at DConf 13 at 37:30, that's the basic idea how I'm thinking the range would internally iterate.Correction: The outer loop would iterate in steps like that but the body would be different. It would each time copy elements into static array of length unroll.length (which in this case would be width of vector operations), let the user iterate over that and then assign it back to the original array.
Sep 08 2017
On Monday, 4 September 2017 at 14:28:14 UTC, Dukc wrote:On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:You can see the second answer on this Stack overflow question[1]. It explains how `restrict` can be used in C to tell the compiler that there won't be any aliasing between two pointers, and the optimizations the compiler can do, given this knowledge. [1] https://stackoverflow.com/questions/745870/realistic-usage-of-the-c99-restrict-keyword#745877In C, the `restrict` keyword implies that 2 or more pointer arguments in a function call do not point to the same data.I really don't see where the restrict keyword is needed at all, neither in C nor in D. If you want to imply to the compiler that there is no need to reload the pointed data between uses, just assign it to a local.
Sep 05 2017
On Sunday, 3 September 2017 at 03:04:58 UTC, Uknown wrote:I just hope to have a nice discussion on this topic here. Thanks!You might find interesting the reference capabilities of Pony: https://tutorial.ponylang.org/capabilities/reference-capabilities.html So with respect to restrict, there are a few that might be relevant. iso - can only have one reference box - similar to const function parameters in D, can give out read-only views trn - sort of the reverse of const, it's like a write unique, you can write to, but other people cannot When I think of restrict, it's like telling the compiler that these two pointers don't overlap their data. By contrast, if you pass parameters to a function with these reference capabilities, then it would be providing the compiler slightly different information. If you pass iso, it tells the compiler that no other variable pointers to this variable. If you use box (const), you're telling the compiler that you won't modify it in this function. If you use trn, you're telling the compiler that this reference is the only way to modify the variable (i.e. another variable in the function cannot modify it). I'm not an expert on pony or anything, so I'm not sure how complicated this makes things with aggregate types.
Sep 04 2017