digitalmars.D - D defined behavior
- =?UTF-8?B?THXDrXM=?= Marques (24/24) Apr 27 2020 I've lost track. Does D have undefined behavior at all? (e.g.
- Johan (15/24) Apr 27 2020 I think you mean to ask whether D defines writing to partially
- =?UTF-8?B?THXDrXM=?= Marques (8/20) Apr 27 2020 I left it purposefully vague because it wasn't clear what
- =?UTF-8?B?THXDrXM=?= Marques (4/5) Apr 27 2020 Could you expand on that, please? Do you mean in the
- welkam (3/8) Apr 27 2020 casting out immutable or const and then modifying the value would
- =?UTF-8?B?THXDrXM=?= Marques (9/11) Apr 27 2020 Ah, yes, I should have phrased it to make it clear I didn't have
- Johan (5/10) Apr 27 2020 What comes to mind are null dereference and shifting by more than
- Arine (17/22) Apr 27 2020 D has UB even in @safe. @safe doesn't mean there's no UB, it
- Dennis (6/12) Apr 27 2020 https://dlang.org/spec/function.html#function-safety
- Arine (4/16) Apr 27 2020 There's lots of bugs filed there. A lot of them aren't valid. No
- Timon Gehr (2/3) Apr 27 2020 UB precludes memory safety.
- Johan (4/9) Apr 27 2020 I would greatly appreciate it if the outcome of this discussion
- kinke (17/41) Apr 27 2020 A case where LDC goes further than GDC, wrt. infinite loops (see
- ag0aep6g (19/43) Apr 27 2020 In @system and @trusted code, there is plenty undefined behavior.
- Walter Bright (3/6) Apr 27 2020 Yes, but there will still be a way to trick it. @live will stop that tri...
I've lost track. Does D have undefined behavior at all? (e.g. outside safe code). In any case, I imagine that the D spec (ha!) would dictate that the following code should print 0x10001, although that's not quite clear. GDC (optimized) prints 0x1. LDC2 prints 0x10001, but I suspect only because LLVM doesn't happen to do the same optimization, not because this is handled properly. ``` import std.stdio; safe: int foo(int* a, int* b) { *a = 1; *b = 1; return *a; } void main() { ubyte[6] buffer; auto a = cast(int[]) (buffer[0 .. 4]); auto b = cast(int[]) (buffer[2 .. 6]); writefln("0x%X", foo(&a[0], &b[0])); } ```
Apr 27 2020
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:I've lost track. Does D have undefined behavior at all? (e.g. outside safe code).Yes, D has UB.In any case, I imagine that the D spec (ha!) would dictate that the following code should print 0x10001, although that's not quite clear. GDC (optimized) prints 0x1. LDC2 prints 0x10001, but I suspect only because LLVM doesn't happen to do the same optimization, not because this is handled properly.I think you mean to ask whether D defines writing to partially overlapping objects. Or if it is defined to write a type `int` to memory locations that were typed as `char`. LDC assumes `a` and `b` may alias partially, and therefore `a` must be reloaded in the return statement. LDC does not pass type-based alias information to the optimizer (TBAA if you are interested), which Clang _would_ do; it be nice if someone could check me on this: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules Still, Clang does not optimize the code as it probably could, same as GCC does. My guess is that GDC inherits the C or C++ type-based aliasing rules here. -Johan
Apr 27 2020
On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:I think you mean to ask whether D defines writing to partially overlapping objects. Or if it is defined to write a type `int` to memory locations that were typed as `char`.I left it purposefully vague because it wasn't clear what question should be asked at that level (and because I'm *really* short for time at the moment, sorry). For instance, another possibly question is if it's valid for safe code to create pointers to unaligned values. Avoiding that could potentially provide another avenue for addressing issues like this, I guess.LDC assumes `a` and `b` may alias partially, and therefore `a` must be reloaded in the return statement. LDC does not pass type-based alias information to the optimizer (TBAA if you are interested), which Clang _would_ do; it be nice if someone could check me on this: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules Still, Clang does not optimize the code as it probably could, same as GCC does. My guess is that GDC inherits the C or C++ type-based aliasing rules here.Thanks for the details.
Apr 27 2020
On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:Yes, D has UB.Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
Apr 27 2020
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:casting out immutable or const and then modifying the value would lead to incorrect code.Yes, D has UB.Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
Apr 27 2020
On Monday, 27 April 2020 at 16:10:12 UTC, welkam wrote:casting out immutable or const and then modifying the value would lead to incorrect code.Ah, yes, I should have phrased it to make it clear I didn't have that kind of thing in mind. In your example we are in the realm of "well, you promised something and then you didn't follow through, and this is system, so you're on your own". Whereas things like evaluation order, alignment, etc. could arguably all be specified such that D does the right thing, even in various cases where in C they would be undefined behavior.
Apr 27 2020
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:What comes to mind are null dereference and shifting by more than the bit width. ( safe plays no role for these) -JohanYes, D has UB.Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
Apr 27 2020
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:D has UB even in safe. safe doesn't mean there's no UB, it simply means it is memory safe. This will print both true and false: import std.stdio; safe void foo() { bool v = void; if ( v ) { writeln("true"); } if ( !v ) { writeln("false"); } } safe void main() { foo(); }Yes, D has UB.Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
Apr 27 2020
On Monday, 27 April 2020 at 18:21:45 UTC, Arine wrote:D has UB even in safe. safe doesn't mean there's no UB, it simply means it is memory safe. This will print both true and false:Note that safe is defined to have no undefined behavior.Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior. Undefined behavior is often used as a vector for malicious attacks.https://dlang.org/spec/function.html#function-safety Anytime it there is UB in a safe function, it's a bug. The example you posted in particular is filed under https://issues.dlang.org/show_bug.cgi?id=20148
Apr 27 2020
On Monday, 27 April 2020 at 18:59:22 UTC, Dennis wrote:On Monday, 27 April 2020 at 18:21:45 UTC, Arine wrote:There's lots of bugs filed there. A lot of them aren't valid. No one's confirmed whether that actually even is a bug. It is working as intended, otherwise the fix is rather simple.D has UB even in safe. safe doesn't mean there's no UB, it simply means it is memory safe. This will print both true and false:Note that safe is defined to have no undefined behavior.Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior. Undefined behavior is often used as a vector for malicious attacks.https://dlang.org/spec/function.html#function-safety Anytime it there is UB in a safe function, it's a bug. The example you posted in particular is filed under https://issues.dlang.org/show_bug.cgi?id=20148
Apr 27 2020
On 27.04.20 20:21, Arine wrote:safe doesn't mean there's no UB, it simply means it is memory safe.UB precludes memory safety.
Apr 27 2020
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:I've lost track. Does D have undefined behavior at all? (e.g. outside safe code). In any case, I imagine that the D spec (ha!) would dictate that the following code should print 0x10001, although that's not quite clear.I would greatly appreciate it if the outcome of this discussion leads to addition of this case to the compiler testsuite. -Johan
Apr 27 2020
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:I've lost track. Does D have undefined behavior at all? (e.g. outside safe code). In any case, I imagine that the D spec (ha!) would dictate that the following code should print 0x10001, although that's not quite clear. GDC (optimized) prints 0x1. LDC2 prints 0x10001, but I suspect only because LLVM doesn't happen to do the same optimization, not because this is handled properly. ``` import std.stdio; safe: int foo(int* a, int* b) { *a = 1; *b = 1; return *a; } void main() { ubyte[6] buffer; auto a = cast(int[]) (buffer[0 .. 4]); auto b = cast(int[]) (buffer[2 .. 6]); writefln("0x%X", foo(&a[0], &b[0])); } ```A case where LDC goes further than GDC, wrt. infinite loops (see https://github.com/ldc-developers/ldc/issues/2827): void foo() safe { int x; while (x != x + 1) ++x; } int bar(int p) safe { if (p > 1) foo(); return p; } With -O, LDC optimizes bar() to simply returning p. See https://godbolt.org/z/84wZ9k.
Apr 27 2020
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:I've lost track. Does D have undefined behavior at all? (e.g. outside safe code).In system and trusted code, there is plenty undefined behavior. Just search the spec for it: https://www.google.com/search?q=%22undefined%20behavior%22%20site:dlang.org/spec Per definition, safe is supposed to be free of undefined behavior [1]. But there are many holes in the system, so that's not really true at the moment. Maybe we get there some day. Even then, all guarantees are off when you feed bad data to safe code from system code (e.g. an invalid pointer).In any case, I imagine that the D spec (ha!) would dictate that the following code should print 0x10001, although that's not quite clear. GDC (optimized) prints 0x1. LDC2 prints 0x10001, but I suspect only because LLVM doesn't happen to do the same optimization, not because this is handled properly. ``` import std.stdio; safe: int foo(int* a, int* b) { *a = 1; *b = 1; return *a; } void main() { ubyte[6] buffer; auto a = cast(int[]) (buffer[0 .. 4]); auto b = cast(int[]) (buffer[2 .. 6]); writefln("0x%X", foo(&a[0], &b[0])); } ```As far as I'm aware, an implementation can only assume alignment on GC pointers [2]. It cannot assume that two `int*`s don't point to overlapping locations. So GDC shouldn't do that optimization. Then again, Walter wants to disallow passing multiple mutable references to the same memory with DIP 1021 [3]. If he goes through with that, implementations should reject that code. [1] https://dlang.org/spec/function.html#function-safety [2] https://dlang.org/spec/attribute.html#align [3] https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md
Apr 27 2020
On 4/27/2020 12:04 PM, ag0aep6g wrote:Then again, Walter wants to disallow passing multiple mutable references to the same memory with DIP 1021 [3]. If he goes through with that, implementations should reject that code.Yes, but there will still be a way to trick it. live will stop that trickery, unless bad pointers are passed to the live function.
Apr 27 2020