digitalmars.D - trusted assumptions about safe code
- ag0aep6g (68/69) May 25 2020 That statement fits something that occurred to me when trying to lock
- Paul Backus (12/43) May 25 2020 My reading of the spec is that f violates this requirement of
- ag0aep6g (16/31) May 25 2020 Hm. The meaning I intended with that is that it's only invalid when the
- Paul Backus (8/20) May 25 2020 There's no part of the program outside of f's body that has
- Dukc (13/59) May 25 2020 No, `f` should be just dead `@system`. If you call f with a
- ag0aep6g (17/25) May 25 2020 Right. I can fix that by changing
- Dukc (20/35) May 26 2020 Looks good to me. Well, not sure about that mutable aliasing
- Steven Schveighoffer (23/36) May 26 2020 Yes, I was going to point that out too. But it doesn't affect the main
- ag0aep6g (18/40) May 26 2020 I think you've got a good point, but the example isn't so great. memcpy
- Paul Backus (11/22) May 26 2020 All that's necessary is to have the @trusted function check that
- Paolo Invernizzi (8/31) May 26 2020 Exactly: the whole point in trusted it’s that it MUST check the
- ag0aep6g (37/62) May 26 2020 I.e., "the exact behavior of @safe functions (including their return
- Steven Schveighoffer (21/90) May 27 2020 I think this is not the way to view it. @safe code still should do what
- Stefan Koch (14/21) May 27 2020 's relying on is actually true:
- Steven Schveighoffer (21/46) May 27 2020 Again, I also think this is valid @trusted code:
- ag0aep6g (4/33) May 27 2020 Just to be clear: I agree with you. Requiring @trusted to be so
- Paul Backus (21/25) May 27 2020 I think you are focused in so closely on this particular example
- ag0aep6g (21/31) May 27 2020 Agreed. The question is: What else can @trusted code rely on, beyond the...
- Paul Backus (29/59) May 27 2020 If the specific functions it's calling make any additional
- ag0aep6g (50/69) May 27 2020 I think we're on the same page for the most part, but not here.
- ag0aep6g (5/7) May 27 2020 Should read: Let's say that the author of the @safe **function** is as
- Arafel (18/22) May 27 2020 There's one big difference: in @safe there are bound checks by default,
- Paul Backus (43/47) May 27 2020 It depends entirely on how far you are willing to extend your
- ag0aep6g (11/20) May 27 2020 So:
- Paul Backus (3/10) May 27 2020 Yes, that's correct.
- Arine (16/39) May 25 2020 You are passing a pointer into a function that takes a mutable
- Arine (2/12) May 25 2020 Ops, even that would still need a size check as well.
- ag0aep6g (6/12) May 25 2020 So would you say that the function should not have been @trusted in the
- Timon Gehr (3/23) May 26 2020 I don't think so. @trusted code can't rely on @safe code behaving a
- Walter Bright (3/5) May 26 2020 I agree. The trusted code here is not passing safe arguments to g(), but...
- ag0aep6g (8/14) May 26 2020 Nice. Timon and Walter agree on something related to safety. That must
- Paul Backus (6/17) May 26 2020 As far as I can tell that's already implied by the first sentence
- Timon Gehr (3/12) May 26 2020 (We agree on many things. It just does not seem that way because I
- Walter Bright (2/15) May 27 2020 It is indeed nice when we agree. I appreciate it.
- Nathan S. (16/19) May 27 2020 I'd say it's sound to rely on any postconditions in a function's
- Petar Kirov [ZombineDev] (12/34) May 26 2020 As mentioned by others, it is incorrect to label `f` with
- ag0aep6g (7/24) May 26 2020 [...]
Deep in the discussion thread for DIP 1028 there is this little remark by Zoadian [1]:you can break previously verified trusted code by just writing safe code today.That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise. Now imagine that some time passes. Code gets added and shuffled around. Maybe g ends up in another module, far away from f. And as it happens, someone adds a line to g: size_t g(ref size_t s) safe { s = 0xDEADBEEF; /* ! */ return s; } g is still perfectly safe, and there's not even any trusted code in the vicinity to consider. So review comes to the conclusion that the change is fine safety-wise. But g violates an assumption in f. And with that broken assumption, memory safety comes crumbling down, "by just writing safe code". I think that's a problem. Ideally, it should not be possible to cause memory corruption by adding a line to safe code. But I know that I'm more nitpicky than many when it comes to that rule and safe/ trusted in general. Anyway, one way to address this would be disallowing f's call to g. I.e., add a sentence like this to the spec: Undefined behavior: Calling a safe function or a trusted function with unsafe values or unsafe aliasing has undefined behavior. The aliasing of `immutable(char)*` and `size_t` is unsafe, so the call becomes invalid. That means f can no longer be trusted as it it's now considered to have undefined behavior. The downside is that functions may become invalid even when they don't make bad assumptions. For example, this f2 would (arguably?) also be invalid, because the unsafe aliasing is still there even though it's not being used: char f2(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return 'x'; /* ! */ } Or maybe we say that c gets invalidated by the call to g and using it afterwards triggers undefined behavior. Then f2 is ok. Would still have to disallow calls that pass both ends of an unsafe aliasing to an safe/ trusted function, though. Thoughts? Am I overthinking it as usual when it comes to trusted? [1] https://forum.dlang.org/post/iwddwsdpsntajyblnttk forum.dlang.org [2] https://dlang.org/spec/function.html#safe-interfaces
May 25 2020
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:Deep in the discussion thread for DIP 1028 there is this little remark by Zoadian [1]:My reading of the spec is that f violates this requirement of safe interfaces:you can break previously verified trusted code by just writing safe code today.That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise.3. it cannot introduce unsafe aliasing that is accessible from other parts of the program."Other parts of the program," taken at face value, should include both f's callers (direct and indirect) as well as any functions it calls (directly or indirectly). Since f introduces unsafe aliasing, and makes that aliasing visible to g, it should not be marked as trusted. I suppose it depends on exactly what is meant by "accessible"--if it refers to the aliased memory location, then my interpretation follows, but if it refers to the pointers, there's an argument to be made that f is fine, since only one of the pointers escapes.
May 25 2020
On 26.05.20 01:25, Paul Backus wrote:My reading of the spec is that f violates this requirement of safe interfaces:Hm. The meaning I intended with that is that it's only invalid when the memory location becomes accessible via both types elsewhere. And g only has access via one type. Would you say that this next function is also leaking unsafe aliasing? immutable(int[]) f() trusted { int[] a = [1, 2, 3]; a[] += 10; return cast(immutable) a; } Because that one is definitely supposed to be allowed. And I must also say that I didn't really consider called functions to be "other parts of the program". But reading it that way makes sense. Then I suppose calling an safe function with both ends of an unsafe aliasing can be seen as already not allowed.3. it cannot introduce unsafe aliasing that is accessible from other parts of the program."Other parts of the program," taken at face value, should include both f's callers (direct and indirect) as well as any functions it calls (directly or indirectly). Since f introduces unsafe aliasing, and makes that aliasing visible to g, it should not be marked as trusted. I suppose it depends on exactly what is meant by "accessible"--if it refers to the aliased memory location, then my interpretation follows, but if it refers to the pointers, there's an argument to be made that f is fine, since only one of the pointers escapes.
May 25 2020
On Tuesday, 26 May 2020 at 00:01:37 UTC, ag0aep6g wrote:Hm. The meaning I intended with that is that it's only invalid when the memory location becomes accessible via both types elsewhere. And g only has access via one type. Would you say that this next function is also leaking unsafe aliasing? immutable(int[]) f() trusted { int[] a = [1, 2, 3]; a[] += 10; return cast(immutable) a; } Because that one is definitely supposed to be allowed.There's no part of the program outside of f's body that has access to either reference while both are alive, so I'd say that's fine. In Rust-ish terms, the ownership of the array is being moved from a to the return value, whereas in the previous example, both f and g were attempting to mutably borrow the same data at the same time.
May 25 2020
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:Deep in the discussion thread for DIP 1028 there is this little remark by Zoadian [1]:No, `f` should be just dead ` system`. If you call f with a string which does not point to null, but is empty, boom! But for rest of the reply, I assume you meant a function that could really be ` trusted` with all parameters.you can break previously verified trusted code by just writing safe code today.That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise.Now imagine that some time passes. Code gets added and shuffled around. Maybe g ends up in another module, far away from f. And as it happens, someone adds a line to g: size_t g(ref size_t s) safe { s = 0xDEADBEEF; /* ! */ return s; }Credible scenario.g is still perfectly safe, and there's not even any trusted code in the vicinity to consider. So review comes to the conclusion that the change is fine safety-wise. But g violates an assumption in f. And with that broken assumption, memory safety comes crumbling down, "by just writing safe code". I think that's a problem. Ideally, it should not be possible to cause memory corruption by adding a line to safe code. But I know that I'm more nitpicky than many when it comes to that rule and safe/ trusted in general.I don't think it as a problem for ` safe`. ` safe` is just a command to turn the memory checking tool on, not a code certification (although using safe where possible would probably be required for certifying). Combating the scenarios you mentioned means that the ` safe` function called must be at least as certified as the ` trusted` caller, but that is no reason to forbid using the memory checking tool the language offers.
May 25 2020
On 26.05.20 01:47, Dukc wrote:No, `f` should be just dead ` system`. If you call f with a string which does not point to null, but is empty, boom!Right. I can fix that by changing immutable(char)* c = s.ptr; to immutable(char)* c = &s[0]; correct? [...]I don't think it as a problem for ` safe`. ` safe` is just a command to turn the memory checking tool on, not a code certification (although using safe where possible would probably be required for certifying).And you don't think it's possible and worthwhile to strengthen safe so far that it becomes a code certification?Combating the scenarios you mentioned means that the ` safe` function called must be at least as certified as the ` trusted` caller, but that is no reason to forbid using the memory checking tool the language offers.In my scenario, the safe function is (supposed to be) perfectly safe before and after the catastrophic change. You can try certifying it beyond what the compiler does, but there's really nothing wrong with the function. The thing is that editing the safe function affects the status of the trusted one. And I think it should be possible to tweak the rules so that a correctly verified trusted function cannot become invalid when something changes in a called safe function.
May 25 2020
On Tuesday, 26 May 2020 at 00:22:05 UTC, ag0aep6g wrote:On 26.05.20 01:47, Dukc wrote:Looks good to me. Well, not sure about that mutable aliasing thing others have mentioned, maybe it's good, maybe not.No, `f` should be just dead ` system`. If you call f with a string which does not point to null, but is empty, boom!Right. I can fix that by changing immutable(char)* c = s.ptr; to immutable(char)* c = &s[0]; correct? [...]No, even if safe would in all instances be 100% idiot-proof memory safe, I think it would work badly as a code certificate. Why? It would mean that if you have an already ` safe` function that would be fit for use in ` system` code, you would have to either duplicate it or disable it's memory checks, because it could not be certified for ` system` code. The closest thing to a certificate a code can have AFAIK is an automatic test suite. ` safe` works well in conjunction with such a test suite: if a function is both ` safe` and `pure`, you know that regular `unittest`s are enough to certify it in the general case. With ` system` or ` trusted`, Valgrinding might be required. This also applies to your example: Had you tested that `g` does not mutate the pointed value (or set the parameter as `const`), the bug would have been caught. On the other hand, no amount of unit testing can alone certify `f` to a high standard, as it could still silently be corrupting the memory despite outputting and returning correctlyI don't think it as a problem for ` safe`. ` safe` is just a command to turn the memory checking tool on, not a code certification (although using safe where possible would probably be required for certifying).And you don't think it's possible and worthwhile to strengthen safe so far that it becomes a code certification?
May 26 2020
On 5/25/20 8:22 PM, ag0aep6g wrote:On 26.05.20 01:47, Dukc wrote:Yes, I was going to point that out too. But it doesn't affect the main point. The main problem is that f does not provide a safe value to g. safe code can always mess up if handed garbage. One might say that f should not be trusted. But that just means NO functions could be trusted. If you cannot trust the semantic expectations of the functions you are calling to hold true, then you cannot write trusted code ever. I mean, someone could do this inside memcpy: size_t memcpy(void *dst, void *src, size_t length) { *(size_t *)(dst + length + 10) = 0xdeadbeef; ...// normal implementation } And violate safety that way. So does that mean you can never use memcpy inside trusted functions *just in case* it did something like this? I get what you are saying, that it would be nice if one can just write safe code and never worry that you might violate any memory rules. But in reality, you still have to implement the function as designed, and unless you do that, you are not going to be memory safe, period. The only call stacks that would be "safe" were ones that were safe all the way down. And then safe becomes essentially useless. -SteveNo, `f` should be just dead ` system`. If you call f with a string which does not point to null, but is empty, boom!Right. I can fix that by changing immutable(char)* c = s.ptr; to immutable(char)* c = &s[0]; correct?
May 26 2020
On 26.05.20 17:41, Steven Schveighoffer wrote:The main problem is that f does not provide a safe value to g. safe code can always mess up if handed garbage. One might say that f should not be trusted. But that just means NO functions could be trusted. If you cannot trust the semantic expectations of the functions you are calling to hold true, then you cannot write trusted code ever. I mean, someone could do this inside memcpy: size_t memcpy(void *dst, void *src, size_t length) { *(size_t *)(dst + length + 10) = 0xdeadbeef; ...// normal implementation } And violate safety that way. So does that mean you can never use memcpy inside trusted functions *just in case* it did something like this? I get what you are saying, that it would be nice if one can just write safe code and never worry that you might violate any memory rules. But in reality, you still have to implement the function as designed, and unless you do that, you are not going to be memory safe, period. The only call stacks that would be "safe" were ones that were safe all the way down. And then safe becomes essentially useless.I think you've got a good point, but the example isn't so great. memcpy is system and can only be system, so of course you can break safety by changing its behavior. But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells. So unless we do something very drastic, any visible change in the behavior of an safe function can possibly lead to memory corruption. And strictly speaking, any trusted and system code that calls it must be re-evaluated for safety. Seems kinda obvious now. But I don't think I really realized this before.
May 26 2020
On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells.All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
May 26 2020
On Wednesday, 27 May 2020 at 00:50:26 UTC, Paul Backus wrote:On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:Exactly: the whole point in trusted it’s that it MUST check the parameters it’s using to call system code, and with an assert that’s just fine in this case, as it’s checking invariants in the program logic. The only way for a safe program to corrupt memory should be inside the external system code binary: a bug in a used system library, a misunderstanding of the API documentation, and so on.But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells.All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
May 26 2020
On 27.05.20 02:50, Paul Backus wrote:On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:I.e., "the exact behavior of safe functions (including their return values) cannot be relied upon for safety". I think it's going to be hard selling that to users. Especially, because there is no such requirement when calling system functions. Say you have this code: void f() trusted { import core.stdc.string: strlen; import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[strlen(buf.ptr) - 1]; writeln(last_char); } That's ok, right? `f` doesn't have to verify that `strlen` returns a value that is in bounds. It's allowed to assume that `strlen` counts until the first null byte. Now you realize that you can calculate the length of the string more safely than C's strlen does, so you change the code to: size_t my_strlen(ref char[5] buf) safe { foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i; return buf.length; } void f() trusted { import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[my_strlen(buf) - 1]; writeln(last_char); } Nice. Now you're safe even if you forget to put a null-terminator into the buffer. But oh no, `my_strlen` is safe. That means `f` cannot assume that the returned value is in bounds. It now has to verify that. Somehow, it's harder to call the safe function correctly than the system one. What user is going to remember those subtleties?But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells.All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
May 26 2020
On 5/27/20 2:36 AM, ag0aep6g wrote:On 27.05.20 02:50, Paul Backus wrote:I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. You are going to have to trust that whatever functions you call ( trusted, safe, or system) are following their spec. safe has additional restrictions, so you can assume more. But the semantic meaning of things cannot be checked by the compiler, and those are the interesting things that cause bugs. A more realistic example, and one that gets me all the time is something like indexOf. Does it return input.length or -1 if the item isn't found? Using the wrong expectation can lead to bad consequences. So it's important, no matter the safety of the indexOf function, to know what it's supposed to do, and base your review of trusted code on that knowledge. Of course, with something like that, one could be extra cautious, and assert the value is within bounds if it's not the sentinel. You could be even more cautious and check that the index found has the sought-after element. And that's probably the right defensive way to do this. But who's going to do that? Most people will work under the assumption that indexOf does what it says it's going to do, and not worry about unittesting it on every call. -SteveOn Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:I.e., "the exact behavior of safe functions (including their return values) cannot be relied upon for safety". I think it's going to be hard selling that to users. Especially, because there is no such requirement when calling system functions. Say you have this code: void f() trusted { import core.stdc.string: strlen; import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[strlen(buf.ptr) - 1]; writeln(last_char); } That's ok, right? `f` doesn't have to verify that `strlen` returns a value that is in bounds. It's allowed to assume that `strlen` counts until the first null byte. Now you realize that you can calculate the length of the string more safely than C's strlen does, so you change the code to: size_t my_strlen(ref char[5] buf) safe { foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i; return buf.length; } void f() trusted { import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[my_strlen(buf) - 1]; writeln(last_char); } Nice. Now you're safe even if you forget to put a null-terminator into the buffer. But oh no, `my_strlen` is safe. That means `f` cannot assume that the returned value is in bounds. It now has to verify that. Somehow, it's harder to call the safe function correctly than the system one.But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells.All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
May 27 2020
On Wednesday, 27 May 2020 at 12:48:46 UTC, Steven Schveighoffer wrote:On 5/27/20 2:36 AM, ag0aep6g wrote:'s relying on is actually true:[...]{ const i = cast(ssize_t) indexof(x, E); if (i < 0 || i > x.dim) { // no luck. } else { index is in bounds so use it. } }[...]I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. [...]
May 27 2020
On 5/27/20 8:52 AM, Stefan Koch wrote:On Wednesday, 27 May 2020 at 12:48:46 UTC, Steven Schveighoffer wrote:Again, I also think this is valid trusted code: const i = indexof(x, E); if(i != -1){ // use it } or if(i != x.length) { // use it } depending on the spec for that function. I don't think it's 100% necessary to be defensive on all semantics assuming they are not implemented properly. People just aren't going to write what you wrote in the name of safe. They *could* write: size_t i = indexof(x, E); if(i < x.length) { } But most people aren't going to do that either. -SteveOn 5/27/20 2:36 AM, ag0aep6g wrote:'s relying on is actually true:[...]{ const i = cast(ssize_t) indexof(x, E); if (i < 0 || i > x.dim) { // no luck. } else { index is in bounds so use it. } }[...]I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. [...]
May 27 2020
On 27.05.20 14:48, Steven Schveighoffer wrote:On 5/27/20 2:36 AM, ag0aep6g wrote:[...]Just to be clear: I agree with you. Requiring trusted to be so super-defensive doesn't seem viable.But oh no, `my_strlen` is safe. That means `f` cannot assume that the returned value is in bounds. It now has to verify that. Somehow, it's harder to call the safe function correctly than the system one.I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. You are going to have to trust that whatever functions you call ( trusted, safe, or system) are following their spec. safe has additional restrictions, so you can assume more. But the semantic meaning of things cannot be checked by the compiler, and those are the interesting things that cause bugs. A more realistic example, and one that gets me all the time is something like indexOf. Does it return input.length or -1 if the item isn't found? Using the wrong expectation can lead to bad consequences. So it's important, no matter the safety of the indexOf function, to know what it's supposed to do, and base your review of trusted code on that knowledge. Of course, with something like that, one could be extra cautious, and assert the value is within bounds if it's not the sentinel. You could be even more cautious and check that the index found has the sought-after element. And that's probably the right defensive way to do this. But who's going to do that? Most people will work under the assumption that indexOf does what it says it's going to do, and not worry about unittesting it on every call.
May 27 2020
On Wednesday, 27 May 2020 at 06:36:14 UTC, ag0aep6g wrote:I.e., "the exact behavior of safe functions (including their return values) cannot be relied upon for safety". I think it's going to be hard selling that to users. Especially, because there is no such requirement when calling system functions.I think you are focused in so closely on this particular example that you are losing track of the bigger picture. Knowing that a function is safe guarantees you that (modulo errors in trusted code) it will not violate memory safety when called with safe arguments. That is the *only* thing safe guarantees. If you want to guarantee anything else about what the function does, there are many tools you can use to do so. You can use in and out contracts to establish preconditions and postconditions. You can use assert or enforce in calling code to check that the arguments and/or return value meet some criteria. You can have it accept or return types with invariants. In the case of strlen, you can rely on the wording of the C standard which states that "The strlen function returns the number of characters that precede the terminating null character" to know that if you call it with a null-terminated string (its precondition), the result will be in-bounds (its postcondition). There's nothing weird or unusual or "a hard sell" about this. Programming is always like this, in any language, memory-safe or not.
May 27 2020
On 27.05.20 15:43, Paul Backus wrote:I think you are focused in so closely on this particular example that you are losing track of the bigger picture.That's entirely possible.Knowing that a function is safe guarantees you that (modulo errors in trusted code) it will not violate memory safety when called with safe arguments. That is the *only* thing safe guarantees.Agreed. The question is: What else can trusted code rely on, beyond the guarantees of safe? [...]In the case of strlen, you can rely on the wording of the C standard which states that "The strlen function returns the number of characters that precede the terminating null character" to know that if you call it with a null-terminated string (its precondition), the result will be in-bounds (its postcondition).Ok. An trusted function can rely on the C standard. I take it that an trusted function can also rely on other documentation of system functions. So far I'm with you. From your previous post I figured that this is your position towards calling safe from trusted: trusted code is not allowed to rely on the documented return value of an safe function. The trusted function must instead verify that the actually returned value is safe to use. I'm not sure if I'm representing you correctly, but that position makes sense to me. At the same time, it doesn't seem feasible, because I don't see how we're going to get users to adhere to that. The way I see it we can either make the rules for trusted so arcane that practically no one will be able to follow them, or we accept that a change in safe code can lead to memory corruption. At the moment I'm leaning towards the latter.
May 27 2020
On Wednesday, 27 May 2020 at 15:14:17 UTC, ag0aep6g wrote:On 27.05.20 15:43, Paul Backus wrote:If the specific functions it's calling make any additional guarantees above and beyond what safe requires, it can rely on those as well. The question, then, is what constitutes a "guarantee."I think you are focused in so closely on this particular example that you are losing track of the bigger picture.That's entirely possible.Knowing that a function is safe guarantees you that (modulo errors in trusted code) it will not violate memory safety when called with safe arguments. That is the *only* thing safe guarantees.Agreed. The question is: What else can trusted code rely on, beyond the guarantees of safe?[...]It can, to the extent that you trust the code to conform to the documentation. Personally, I am willing to trust libc to conform to the C standard, which means that a buggy or non-conforming libc will be able to cause memory corruption in safe programs that I write. If you want to take a hard-line stance, you should not trust documentation at all. Note that trusting the D compiler to conform to the D language standard is also, in some sense, "trusting documentation." A bug in the compiler can always introduce memory corruption to safe code. So "never trust documentation under any circumstances" is not really a tenable position in practice.In the case of strlen, you can rely on the wording of the C standard which states that "The strlen function returns the number of characters that precede the terminating null character" to know that if you call it with a null-terminated string (its precondition), the result will be in-bounds (its postcondition).Ok. An trusted function can rely on the C standard. I take it that an trusted function can also rely on other documentation of system functions. So far I'm with you.From your previous post I figured that this is your position towards calling safe from trusted: trusted code is not allowed to rely on the documented return value of an safe function. The trusted function must instead verify that the actually returned value is safe to use.This is my position on *any* function calling *any other* function. Even in 100% system code, I must (for example) check the return value of malloc if I want to rely on it not being null.I'm not sure if I'm representing you correctly, but that position makes sense to me. At the same time, it doesn't seem feasible, because I don't see how we're going to get users to adhere to that.Why does it not seem feasible? Checking return values is defensive programming 101. People already do this sort of thing all the time. I agree that writing correct trusted code is difficult, and that people are going to make mistakes--just like writing correct C code is difficult, and people make mistakes trying to. I don't think there's anything we can do in the language itself to fix that, other than making it easy to make the trusted parts of the code as small as possible.
May 27 2020
On 27.05.20 18:04, Paul Backus wrote:On Wednesday, 27 May 2020 at 15:14:17 UTC, ag0aep6g wrote:[...]I think we're on the same page for the most part, but not here. I'm pretty sure that you agree with this: When we call C's strlen like so: `strlen("foo\0".ptr)`, we can assume that the result will be 3, because strlen is documented to behave that way. I'm not sure where you stand on this: If an safe function is documented to return 42, can we rely on that in the same way we can rely on strlen's documented behavior? Let's say that the author of the safe is as trustworthy as the C standard library. If we can assume 42, a bug in safe code can lead to memory corruption by breaking an assumption that is in trusted/ system code. Just like a bug in system code can have that same effect. This might be obviously true to you. But I hadn't really made that connection until recently. So if you agree with this, then I think we're on the same page now. And we just accept that a mistake in safe code can possibly kill safety. On the other side, if we cannot assume that 42 being returned, but we need to check that 42 was indeed returned, then the weird scenario happens where an safe `my_strlen` becomes more cumbersome to use than C's system `strlen`. I don't think any existing trusted code was written with that in mind. If I'm not making any sense, maybe we can compare our answers to the question whether the following snippets (copied from earlier) can really be trusted. I think this one is fine: ---- void f() trusted { import core.stdc.string: strlen; import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[strlen(buf.ptr) - 1]; writeln(last_char); } ---- And I think this one is fine, too: ---- size_t my_strlen(ref char[5] buf) safe { foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i; return buf.length; } void f() trusted { import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[my_strlen(buf) - 1]; writeln(last_char); } ----From your previous post I figured that this is your position towards calling safe from trusted: trusted code is not allowed to rely on the documented return value of an safe function. The trusted function must instead verify that the actually returned value is safe to use.This is my position on *any* function calling *any other* function. Even in 100% system code, I must (for example) check the return value of malloc if I want to rely on it not being null.I'm not sure if I'm representing you correctly, but that position makes sense to me. At the same time, it doesn't seem feasible, because I don't see how we're going to get users to adhere to that.Why does it not seem feasible? Checking return values is defensive programming 101. People already do this sort of thing all the time.
May 27 2020
On 27.05.20 19:02, ag0aep6g wrote:Let's say that the author of the safe is as trustworthy as the C standard library.Should read: Let's say that the author of the safe **function** is as trustworthy as the C standard library.On the other side, if we cannot assume that 42 being returned,Should read: On the other side, if we cannot assume that 42 **is** being returned,
May 27 2020
On 27/5/20 19:02, ag0aep6g wrote:I'm pretty sure that you agree with this: When we call C's strlen like so: `strlen("foo\0".ptr)`, we can assume that the result will be 3, because strlen is documented to behave that way.There's one big difference: in safe there are bound checks by default, so even if you as the programmer assume that `strlen` will return the right value, the compiler is still inserting checks at every access: ``` safe void safeFunc() { string foo = "foo\0"); auto len = strlen(foo); auto bar = foo[0..len - 1]; // It'll be checked even in -release mode } trusted void trustedFunc() { string foo = "foo\0"); auto len = strlen(foo); auto bar = foo[0..len - 1]; } ``` So it's not that you needn't checks in safe code, it's that they are added automatically.
May 27 2020
On Wednesday, 27 May 2020 at 17:02:15 UTC, ag0aep6g wrote:I'm not sure where you stand on this: If an safe function is documented to return 42, can we rely on that in the same way we can rely on strlen's documented behavior? Let's say that the author of the safe is as trustworthy as the C standard library.It depends entirely on how far you are willing to extend your trust. If you're willing to trust the C library to conform to the C standard, and you stipulate that this other function and its documentation are equally trustworthy, then yes, you can trust it to return 42. I think the key insight here is that the memory-safety property established by D's safe is *conditional*, not absolute. In other words, the D compiler (modulo bugs) establishes the following logical implication: my trusted code is memory-safe -> my safe code is memory-safe Proving the left-hand side is left as an exercise to the programmer. In practice, proving your trusted code correct without making *any* assumptions about other code is too difficult, since at the very least you have to account for dependencies like the C library and the operating system. At some point, you have to trust that the code you're calling does what it says it does. So what you end up doing is establishing another logical implication: my dependencies behave as-documented -> my trusted code is memory-safe By the rules of logical inference, it follows that my dependencies behave as-documented -> my safe code is correct This is far from an absolute safety guarantee, but it can be good enough, as long as you stick to dependencies that are well-documented and well-tested. Of course, if you are willing to put in enough effort, you can tighten up the condition on the left as much as you want. For example, maybe you decide to only trust the C library and the operating system, and audit every other dependency yourself. Then you end up with my libc and OS behave as-documented -> my safe code is correct This is more or less the broadest assurance of safety you can achieve if you are trying to write "portable" code, and I expect the vast majority of programmers would be satisfied with it. In princple, if you are doing bare-metal development, you could get all the way to my hardware behaves as-documented -> my safe code is correct ...but that's unlikely to be worth the effort outside of very specialized and safety-critical fields.
May 27 2020
On 27.05.20 19:30, Paul Backus wrote:In practice, proving your trusted code correct without making *any* assumptions about other code is too difficult, since at the very least you have to account for dependencies like the C library and the operating system. At some point, you have to trust that the code you're calling does what it says it does. So what you end up doing is establishing another logical implication: my dependencies behave as-documented -> my trusted code is memory-safeSo: my dependencies do not behave as-documented -> my trusted code may be not memory-safe right? Let's say an safe function `my_strlen` is part of my dependencies. Then: `my_strlen` does not behave as-documented -> my trusted code may be not memory-safe Or in other words: A mistake in an safe function can lead to memory corruption. Which is what Steven is saying. And I agree.
May 27 2020
On Wednesday, 27 May 2020 at 18:46:59 UTC, ag0aep6g wrote:So: my dependencies do not behave as-documented -> my trusted code may be not memory-safe right?[...]Or in other words: A mistake in an safe function can lead to memory corruption. Which is what Steven is saying. And I agree.Yes, that's correct.
May 27 2020
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:Deep in the discussion thread for DIP 1028 there is this little remark by Zoadian [1]:You are passing a pointer into a function that takes a mutable size_t by reference and then use the pointer afterwards. You get what's coming to you if you think that's suitable for trusted. This is a good example that care must still be taken in trusted. You are doing something dangerous, expect to be burned by it. char f(string s) trusted { { immutable(char)* c = s.ptrl writeln(g(* cast(size_t*) &c)); // var c invalidated by above function, don't use after this line } return s[0]; }you can break previously verified trusted code by just writing safe code today.That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); }
May 25 2020
On Tuesday, 26 May 2020 at 00:57:52 UTC, Arine wrote:char f(string s) trusted { { immutable(char)* c = s.ptrl writeln(g(* cast(size_t*) &c)); // var c invalidated by above function, don't use after this line } return s[0]; }Ops, even that would still need a size check as well.
May 25 2020
On 26.05.20 02:57, Arine wrote:You are passing a pointer into a function that takes a mutable size_t by reference and then use the pointer afterwards. You get what's coming to you if you think that's suitable for trusted. This is a good example that care must still be taken in trusted. You are doing something dangerous, expect to be burned by it.So would you say that the function should not have been trusted in the first place, because it can't guarantee to stay safe? Or was the trusted attribute okay at first, and it only became invalid later when the safe code changed? And is it acceptable that safe code can invalidate trusted attributes like that?
May 25 2020
On 26.05.20 01:04, ag0aep6g wrote:Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted.I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.
May 26 2020
On 5/26/2020 12:07 AM, Timon Gehr wrote:I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
May 26 2020
On 26.05.20 11:35, Walter Bright wrote:On 5/26/2020 12:07 AM, Timon Gehr wrote:Nice. Timon and Walter agree on something related to safety. That must mean something. I take it you guys are good with adding the note about undefined behavior to the spec then? Repeating it here for reference: Undefined behavior: Calling a safe function or a trusted function with unsafe values or unsafe aliasing has undefined behavior.I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
May 26 2020
On Tuesday, 26 May 2020 at 13:19:04 UTC, ag0aep6g wrote:I take it you guys are good with adding the note about undefined behavior to the spec then? Repeating it here for reference: Undefined behavior: Calling a safe function or a trusted function with unsafe values or unsafe aliasing has undefined behavior.As far as I can tell that's already implied by the first sentence under "safe interfaces":Given that it is only called with safe values and safe aliasing, a function has a safe interface when:...but being more explicit seems like it can't hurt. "Safe Interfaces" to read as follows:3. it cannot introduce unsafe aliasing **of memory** that is accessible from other parts of the program **while that aliasing exists**.
May 26 2020
On 26.05.20 15:19, ag0aep6g wrote:On 26.05.20 11:35, Walter Bright wrote:(We agree on many things. It just does not seem that way because I seldomly get involved when I agree with a decision.)On 5/26/2020 12:07 AM, Timon Gehr wrote:Nice. Timon and Walter agree on something related to safety.I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
May 26 2020
On 5/26/2020 10:23 AM, Timon Gehr wrote:On 26.05.20 15:19, ag0aep6g wrote:It is indeed nice when we agree. I appreciate it.On 26.05.20 11:35, Walter Bright wrote:(We agree on many things. It just does not seem that way because I seldomly get involved when I agree with a decision.)On 5/26/2020 12:07 AM, Timon Gehr wrote:Nice. Timon and Walter agree on something related to safety.I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
May 27 2020
On Tuesday, 26 May 2020 at 07:07:33 UTC, Timon Gehr wrote:I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.I'd say it's sound to rely on any postconditions in a function's `out` contract. Those can be mechanically enforced. I'd also say it's sound for trusted code to rely on the behavior of a function whose source code the author of the trusted code has reviewed and either has control over or can prevent from changing (such as by specifying a library version in a dependency manager). I'd also say it's within the spirit of trusted to rely on the behavior of any function that allegedly adheres to a specification and has been blessed by some expensive certifying body or meticulous review process as meeting that specification. After all, all trusted is (including the trusted the author is applying to his own function) is some human's statement that he has carefully inspected some function and can vouch that calling it won't cause memory corruption / undefined behavior for a program that wasn't already in an invalid state.
May 27 2020
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:[..] Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted.As mentioned by others, it is incorrect to label `f` with ` trusted` because: 1. It provides unsafe access to potentially out of bounds memory - there's no guarantee that `s.length >= size_t.sizeof` is true (in addition to the possibility of `s.ptr` being `null`). 2. It creates mutable aliasing to immutable memory and passes it to another function. Casting away `immutable` could be safe iff the mutable reference can't be used to modify the memory. So, `g` must take a `const` reference to `size_t`, in order for `f` to even begin to be considered a candidate for the ` trusted` attribute.
May 26 2020
On 26.05.20 11:13, Petar Kirov [ZombineDev] wrote:On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:[...][...]char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; }As mentioned by others, it is incorrect to label `f` with ` trusted` because: 1. It provides unsafe access to potentially out of bounds memory - there's no guarantee that `s.length >= size_t.sizeof` is true (in addition to the possibility of `s.ptr` being `null`).I think you're misreading the code. It's not reading size_t.sizeof bytes starting at c. It's reinterpreting the pointer c itself as a size_t.2. It creates mutable aliasing to immutable memory and passes it to another function. Casting away `immutable` could be safe iff the mutable reference can't be used to modify the memory. So, `g` must take a `const` reference to `size_t`, in order for `f` to even begin to be considered a candidate for the ` trusted` attribute.Great. So far the majority opinion seems to be that f is invalid from the start, and calling g like that just can't considered be safe. I like it.
May 26 2020