www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - trusted assumptions about safe code

reply ag0aep6g <anonymous example.com> writes:
Deep in the discussion thread for DIP 1028 there is this little remark 
by Zoadian [1]:

 you can break previously verified  trusted code by just writing  safe code
today.
That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise. Now imagine that some time passes. Code gets added and shuffled around. Maybe g ends up in another module, far away from f. And as it happens, someone adds a line to g: size_t g(ref size_t s) safe { s = 0xDEADBEEF; /* ! */ return s; } g is still perfectly safe, and there's not even any trusted code in the vicinity to consider. So review comes to the conclusion that the change is fine safety-wise. But g violates an assumption in f. And with that broken assumption, memory safety comes crumbling down, "by just writing safe code". I think that's a problem. Ideally, it should not be possible to cause memory corruption by adding a line to safe code. But I know that I'm more nitpicky than many when it comes to that rule and safe/ trusted in general. Anyway, one way to address this would be disallowing f's call to g. I.e., add a sentence like this to the spec: Undefined behavior: Calling a safe function or a trusted function with unsafe values or unsafe aliasing has undefined behavior. The aliasing of `immutable(char)*` and `size_t` is unsafe, so the call becomes invalid. That means f can no longer be trusted as it it's now considered to have undefined behavior. The downside is that functions may become invalid even when they don't make bad assumptions. For example, this f2 would (arguably?) also be invalid, because the unsafe aliasing is still there even though it's not being used: char f2(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return 'x'; /* ! */ } Or maybe we say that c gets invalidated by the call to g and using it afterwards triggers undefined behavior. Then f2 is ok. Would still have to disallow calls that pass both ends of an unsafe aliasing to an safe/ trusted function, though. Thoughts? Am I overthinking it as usual when it comes to trusted? [1] https://forum.dlang.org/post/iwddwsdpsntajyblnttk forum.dlang.org [2] https://dlang.org/spec/function.html#safe-interfaces
May 25 2020
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:
 Deep in the discussion thread for DIP 1028 there is this little 
 remark by Zoadian [1]:

 you can break previously verified  trusted code by just 
 writing  safe code today.
That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise.
My reading of the spec is that f violates this requirement of safe interfaces:
 3. it cannot introduce unsafe aliasing that is accessible from 
 other parts of the program.
"Other parts of the program," taken at face value, should include both f's callers (direct and indirect) as well as any functions it calls (directly or indirectly). Since f introduces unsafe aliasing, and makes that aliasing visible to g, it should not be marked as trusted. I suppose it depends on exactly what is meant by "accessible"--if it refers to the aliased memory location, then my interpretation follows, but if it refers to the pointers, there's an argument to be made that f is fine, since only one of the pointers escapes.
May 25 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 26.05.20 01:25, Paul Backus wrote:
 My reading of the spec is that f violates this requirement of safe 
 interfaces:
 
 3. it cannot introduce unsafe aliasing that is accessible from other 
 parts of the program.
"Other parts of the program," taken at face value, should include both f's callers (direct and indirect) as well as any functions it calls (directly or indirectly). Since f introduces unsafe aliasing, and makes that aliasing visible to g, it should not be marked as trusted. I suppose it depends on exactly what is meant by "accessible"--if it refers to the aliased memory location, then my interpretation follows, but if it refers to the pointers, there's an argument to be made that f is fine, since only one of the pointers escapes.
Hm. The meaning I intended with that is that it's only invalid when the memory location becomes accessible via both types elsewhere. And g only has access via one type. Would you say that this next function is also leaking unsafe aliasing? immutable(int[]) f() trusted { int[] a = [1, 2, 3]; a[] += 10; return cast(immutable) a; } Because that one is definitely supposed to be allowed. And I must also say that I didn't really consider called functions to be "other parts of the program". But reading it that way makes sense. Then I suppose calling an safe function with both ends of an unsafe aliasing can be seen as already not allowed.
May 25 2020
parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 26 May 2020 at 00:01:37 UTC, ag0aep6g wrote:
 Hm. The meaning I intended with that is that it's only invalid 
 when the memory location becomes accessible via both types 
 elsewhere. And g only has access via one type.

 Would you say that this next function is also leaking unsafe 
 aliasing?

     immutable(int[]) f()  trusted
     {
         int[] a = [1, 2, 3];
         a[] += 10;
         return cast(immutable) a;
     }

 Because that one is definitely supposed to be allowed.
There's no part of the program outside of f's body that has access to either reference while both are alive, so I'd say that's fine. In Rust-ish terms, the ownership of the array is being moved from a to the return value, whereas in the previous example, both f and g were attempting to mutably borrow the same data at the same time.
May 25 2020
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:
 Deep in the discussion thread for DIP 1028 there is this little 
 remark by Zoadian [1]:

 you can break previously verified  trusted code by just 
 writing  safe code today.
That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); } As the spec stands, I believe it allows f to be trusted. The function doesn't exhibit undefined behavior, and it doesn't leak any unsafe values or unsafe aliasing. So it has a safe interface and can be trusted. With f correctly trusted and everything else being safe, that code is good to go safety-wise.
No, `f` should be just dead ` system`. If you call f with a string which does not point to null, but is empty, boom! But for rest of the reply, I assume you meant a function that could really be ` trusted` with all parameters.
 Now imagine that some time passes. Code gets added and shuffled 
 around. Maybe g ends up in another module, far away from f. And 
 as it happens, someone adds a line to g:

     size_t g(ref size_t s)  safe
     {
         s = 0xDEADBEEF; /* ! */
         return s;
     }
Credible scenario.
 g is still perfectly  safe, and there's not even any  trusted 
 code in the vicinity to consider. So review comes to the 
 conclusion that the change is fine safety-wise. But g violates 
 an assumption in f. And with that broken assumption, memory 
 safety comes crumbling down, "by just writing  safe code".

 I think that's a problem. Ideally, it should not be possible to 
 cause memory corruption by adding a line to  safe code. But I 
 know that I'm more nitpicky than many when it comes to that 
 rule and  safe/ trusted in general.
I don't think it as a problem for ` safe`. ` safe` is just a command to turn the memory checking tool on, not a code certification (although using safe where possible would probably be required for certifying). Combating the scenarios you mentioned means that the ` safe` function called must be at least as certified as the ` trusted` caller, but that is no reason to forbid using the memory checking tool the language offers.
May 25 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 26.05.20 01:47, Dukc wrote:
 No, `f` should be just dead ` system`. If you call f with a string which 
 does not point to null, but is empty, boom!
Right. I can fix that by changing immutable(char)* c = s.ptr; to immutable(char)* c = &s[0]; correct? [...]
 I don't think it as a problem for ` safe`. ` safe` is just a command to 
 turn the memory checking tool on, not a code certification (although 
 using  safe where possible would probably be required for certifying). 
And you don't think it's possible and worthwhile to strengthen safe so far that it becomes a code certification?
 Combating the scenarios you mentioned means that the ` safe` function 
 called must be at least as certified as the ` trusted` caller, but that 
 is no reason to forbid using the memory checking tool the language offers.
In my scenario, the safe function is (supposed to be) perfectly safe before and after the catastrophic change. You can try certifying it beyond what the compiler does, but there's really nothing wrong with the function. The thing is that editing the safe function affects the status of the trusted one. And I think it should be possible to tweak the rules so that a correctly verified trusted function cannot become invalid when something changes in a called safe function.
May 25 2020
next sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 26 May 2020 at 00:22:05 UTC, ag0aep6g wrote:
 On 26.05.20 01:47, Dukc wrote:
 No, `f` should be just dead ` system`. If you call f with a 
 string which does not point to null, but is empty, boom!
Right. I can fix that by changing immutable(char)* c = s.ptr; to immutable(char)* c = &s[0]; correct? [...]
Looks good to me. Well, not sure about that mutable aliasing thing others have mentioned, maybe it's good, maybe not.
 I don't think it as a problem for ` safe`. ` safe` is just a 
 command to turn the memory checking tool on, not a code 
 certification (although using  safe where possible would 
 probably be required for certifying).
And you don't think it's possible and worthwhile to strengthen safe so far that it becomes a code certification?
No, even if safe would in all instances be 100% idiot-proof memory safe, I think it would work badly as a code certificate. Why? It would mean that if you have an already ` safe` function that would be fit for use in ` system` code, you would have to either duplicate it or disable it's memory checks, because it could not be certified for ` system` code. The closest thing to a certificate a code can have AFAIK is an automatic test suite. ` safe` works well in conjunction with such a test suite: if a function is both ` safe` and `pure`, you know that regular `unittest`s are enough to certify it in the general case. With ` system` or ` trusted`, Valgrinding might be required. This also applies to your example: Had you tested that `g` does not mutate the pointed value (or set the parameter as `const`), the bug would have been caught. On the other hand, no amount of unit testing can alone certify `f` to a high standard, as it could still silently be corrupting the memory despite outputting and returning correctly
May 26 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/25/20 8:22 PM, ag0aep6g wrote:
 On 26.05.20 01:47, Dukc wrote:
 No, `f` should be just dead ` system`. If you call f with a string 
 which does not point to null, but is empty, boom!
Right. I can fix that by changing     immutable(char)* c = s.ptr; to     immutable(char)* c = &s[0]; correct?
Yes, I was going to point that out too. But it doesn't affect the main point. The main problem is that f does not provide a safe value to g. safe code can always mess up if handed garbage. One might say that f should not be trusted. But that just means NO functions could be trusted. If you cannot trust the semantic expectations of the functions you are calling to hold true, then you cannot write trusted code ever. I mean, someone could do this inside memcpy: size_t memcpy(void *dst, void *src, size_t length) { *(size_t *)(dst + length + 10) = 0xdeadbeef; ...// normal implementation } And violate safety that way. So does that mean you can never use memcpy inside trusted functions *just in case* it did something like this? I get what you are saying, that it would be nice if one can just write safe code and never worry that you might violate any memory rules. But in reality, you still have to implement the function as designed, and unless you do that, you are not going to be memory safe, period. The only call stacks that would be "safe" were ones that were safe all the way down. And then safe becomes essentially useless. -Steve
May 26 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 26.05.20 17:41, Steven Schveighoffer wrote:
 The main problem is that f does not provide a safe value to g.  safe 
 code can always mess up if handed garbage. One might say that f should 
 not be  trusted. But that just means NO functions could be trusted. If 
 you cannot trust the semantic expectations of the functions you are 
 calling to hold true, then you cannot write  trusted code ever. I mean, 
 someone could do this inside memcpy:
 
 size_t memcpy(void *dst, void *src, size_t length)
 {
     *(size_t *)(dst + length + 10) = 0xdeadbeef;
     ...// normal implementation
 }
 
 And violate safety that way. So does that mean you can never use memcpy 
 inside  trusted functions *just in case* it did something like this?
 
 I get what you are saying, that it would be nice if one can just write 
  safe code and never worry that you might violate any memory rules. But 
 in reality, you still have to implement the function as designed, and 
 unless you do that, you are not going to be memory safe, period. The 
 only call stacks that would be "safe" were ones that were  safe all the 
 way down. And then  safe becomes essentially useless.
I think you've got a good point, but the example isn't so great. memcpy is system and can only be system, so of course you can break safety by changing its behavior. But yeah, the trusted function might rely on the safe function returning 42. And when it suddenly returns 43, all hell breaks loose. There doesn't need to be any monkey business with unsafe aliasing or such. Just an safe function returning an unexpected value. I suppose the only ways to catch that kind of thing would be to forbid calling safe (and other trusted?) functions from trusted (and system?) code, or to mandate that the exact behavior of safe functions (including their return values) cannot be relied upon for safety. Those would be really, really tough sells. So unless we do something very drastic, any visible change in the behavior of an safe function can possibly lead to memory corruption. And strictly speaking, any trusted and system code that calls it must be re-evaluated for safety. Seems kinda obvious now. But I don't think I really realized this before.
May 26 2020
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:
 But yeah, the  trusted function might rely on the  safe 
 function returning 42. And when it suddenly returns 43, all 
 hell breaks loose. There doesn't need to be any monkey business 
 with unsafe aliasing or such. Just an  safe function returning 
 an unexpected value.

 I suppose the only ways to catch that kind of thing would be to 
 forbid calling  safe (and other  trusted?) functions from 
  trusted (and  system?) code, or to mandate that the exact 
 behavior of  safe functions (including their return values) 
 cannot be relied upon for safety. Those would be really, really 
 tough sells.
All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
May 26 2020
next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Wednesday, 27 May 2020 at 00:50:26 UTC, Paul Backus wrote:
 On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:
 But yeah, the  trusted function might rely on the  safe 
 function returning 42. And when it suddenly returns 43, all 
 hell breaks loose. There doesn't need to be any monkey 
 business with unsafe aliasing or such. Just an  safe function 
 returning an unexpected value.

 I suppose the only ways to catch that kind of thing would be 
 to forbid calling  safe (and other  trusted?) functions from 
  trusted (and  system?) code, or to mandate that the exact 
 behavior of  safe functions (including their return values) 
 cannot be relied upon for safety. Those would be really, 
 really tough sells.
All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() { int fooResult = foo(); assert(fooResult == 42); // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
Exactly: the whole point in trusted it’s that it MUST check the parameters it’s using to call system code, and with an assert that’s just fine in this case, as it’s checking invariants in the program logic. The only way for a safe program to corrupt memory should be inside the external system code binary: a bug in a used system library, a misunderstanding of the API documentation, and so on.
May 26 2020
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On 27.05.20 02:50, Paul Backus wrote:
 On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:
 But yeah, the  trusted function might rely on the  safe function 
 returning 42. And when it suddenly returns 43, all hell breaks loose. 
 There doesn't need to be any monkey business with unsafe aliasing or 
 such. Just an  safe function returning an unexpected value.

 I suppose the only ways to catch that kind of thing would be to forbid 
 calling  safe (and other  trusted?) functions from  trusted (and 
  system?) code, or to mandate that the exact behavior of  safe 
 functions (including their return values) cannot be relied upon for 
 safety. Those would be really, really tough sells.
All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() {     int fooResult = foo();     assert(fooResult == 42);     // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
I.e., "the exact behavior of safe functions (including their return values) cannot be relied upon for safety". I think it's going to be hard selling that to users. Especially, because there is no such requirement when calling system functions. Say you have this code: void f() trusted { import core.stdc.string: strlen; import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[strlen(buf.ptr) - 1]; writeln(last_char); } That's ok, right? `f` doesn't have to verify that `strlen` returns a value that is in bounds. It's allowed to assume that `strlen` counts until the first null byte. Now you realize that you can calculate the length of the string more safely than C's strlen does, so you change the code to: size_t my_strlen(ref char[5] buf) safe { foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i; return buf.length; } void f() trusted { import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[my_strlen(buf) - 1]; writeln(last_char); } Nice. Now you're safe even if you forget to put a null-terminator into the buffer. But oh no, `my_strlen` is safe. That means `f` cannot assume that the returned value is in bounds. It now has to verify that. Somehow, it's harder to call the safe function correctly than the system one. What user is going to remember those subtleties?
May 26 2020
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/27/20 2:36 AM, ag0aep6g wrote:
 On 27.05.20 02:50, Paul Backus wrote:
 On Tuesday, 26 May 2020 at 22:52:09 UTC, ag0aep6g wrote:
 But yeah, the  trusted function might rely on the  safe function 
 returning 42. And when it suddenly returns 43, all hell breaks loose. 
 There doesn't need to be any monkey business with unsafe aliasing or 
 such. Just an  safe function returning an unexpected value.

 I suppose the only ways to catch that kind of thing would be to 
 forbid calling  safe (and other  trusted?) functions from  trusted 
 (and  system?) code, or to mandate that the exact behavior of  safe 
 functions (including their return values) cannot be relied upon for 
 safety. Those would be really, really tough sells.
All that's necessary is to have the trusted function check that the assumption it's relying on is actually true: safe int foo() { ... } trusted void bar() {      int fooResult = foo();      assert(fooResult == 42);      // proceed accordingly } If the assumption is violated, the program will crash at runtime rather than potentially corrupt memory.
I.e., "the exact behavior of safe functions (including their return values) cannot be relied upon for safety". I think it's going to be hard selling that to users. Especially, because there is no such requirement when calling system functions. Say you have this code:     void f() trusted     {         import core.stdc.string: strlen;         import std.stdio: writeln;         char[5] buf = "foo\0\0";         char last_char = buf.ptr[strlen(buf.ptr) - 1];         writeln(last_char);     } That's ok, right? `f` doesn't have to verify that `strlen` returns a value that is in bounds. It's allowed to assume that `strlen` counts until the first null byte. Now you realize that you can calculate the length of the string more safely than C's strlen does, so you change the code to:     size_t my_strlen(ref char[5] buf) safe     {         foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i;         return buf.length;     }     void f() trusted     {         import std.stdio: writeln;         char[5] buf = "foo\0\0";         char last_char = buf.ptr[my_strlen(buf) - 1];         writeln(last_char);     } Nice. Now you're safe even if you forget to put a null-terminator into the buffer. But oh no, `my_strlen` is safe. That means `f` cannot assume that the returned value is in bounds. It now has to verify that. Somehow, it's harder to call the safe function correctly than the system one.
I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. You are going to have to trust that whatever functions you call ( trusted, safe, or system) are following their spec. safe has additional restrictions, so you can assume more. But the semantic meaning of things cannot be checked by the compiler, and those are the interesting things that cause bugs. A more realistic example, and one that gets me all the time is something like indexOf. Does it return input.length or -1 if the item isn't found? Using the wrong expectation can lead to bad consequences. So it's important, no matter the safety of the indexOf function, to know what it's supposed to do, and base your review of trusted code on that knowledge. Of course, with something like that, one could be extra cautious, and assert the value is within bounds if it's not the sentinel. You could be even more cautious and check that the index found has the sought-after element. And that's probably the right defensive way to do this. But who's going to do that? Most people will work under the assumption that indexOf does what it says it's going to do, and not worry about unittesting it on every call. -Steve
May 27 2020
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 27 May 2020 at 12:48:46 UTC, Steven Schveighoffer 
wrote:
 On 5/27/20 2:36 AM, ag0aep6g wrote:
 [...]
's relying on is actually true:
 [...]
I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. [...]
{ const i = cast(ssize_t) indexof(x, E); if (i < 0 || i > x.dim) { // no luck. } else { index is in bounds so use it. } }
May 27 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/27/20 8:52 AM, Stefan Koch wrote:
 On Wednesday, 27 May 2020 at 12:48:46 UTC, Steven Schveighoffer wrote:
 On 5/27/20 2:36 AM, ag0aep6g wrote:
 [...]
's relying on is actually true:
 [...]
I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. [...]
{   const i = cast(ssize_t) indexof(x, E);   if (i < 0 || i > x.dim)   {     // no luck.   }   else   {     index is in bounds so use it.   } }
Again, I also think this is valid trusted code: const i = indexof(x, E); if(i != -1){ // use it } or if(i != x.length) { // use it } depending on the spec for that function. I don't think it's 100% necessary to be defensive on all semantics assuming they are not implemented properly. People just aren't going to write what you wrote in the name of safe. They *could* write: size_t i = indexof(x, E); if(i < x.length) { } But most people aren't going to do that either. -Steve
May 27 2020
prev sibling parent ag0aep6g <anonymous example.com> writes:
On 27.05.20 14:48, Steven Schveighoffer wrote:
 On 5/27/20 2:36 AM, ag0aep6g wrote:
[...]
 But oh no, `my_strlen` is  safe. That means `f` cannot assume that the 
 returned value is in bounds. It now has to verify that. Somehow, it's 
 harder to call the  safe function correctly than the  system one.
I think this is not the way to view it. safe code still should do what it's supposed to do. It's not any harder *or any easier* to call safe code. You are going to have to trust that whatever functions you call ( trusted, safe, or system) are following their spec. safe has additional restrictions, so you can assume more. But the semantic meaning of things cannot be checked by the compiler, and those are the interesting things that cause bugs. A more realistic example, and one that gets me all the time is something like indexOf. Does it return input.length or -1 if the item isn't found? Using the wrong expectation can lead to bad consequences. So it's important, no matter the safety of the indexOf function, to know what it's supposed to do, and base your review of trusted code on that knowledge. Of course, with something like that, one could be extra cautious, and assert the value is within bounds if it's not the sentinel. You could be even more cautious and check that the index found has the sought-after element. And that's probably the right defensive way to do this. But who's going to do that? Most people will work under the assumption that indexOf does what it says it's going to do, and not worry about unittesting it on every call.
Just to be clear: I agree with you. Requiring trusted to be so super-defensive doesn't seem viable.
May 27 2020
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 27 May 2020 at 06:36:14 UTC, ag0aep6g wrote:
 I.e., "the exact behavior of  safe functions (including their 
 return values) cannot be relied upon for safety". I think it's 
 going to be hard selling that to users. Especially, because 
 there is no such requirement when calling  system functions.
I think you are focused in so closely on this particular example that you are losing track of the bigger picture. Knowing that a function is safe guarantees you that (modulo errors in trusted code) it will not violate memory safety when called with safe arguments. That is the *only* thing safe guarantees. If you want to guarantee anything else about what the function does, there are many tools you can use to do so. You can use in and out contracts to establish preconditions and postconditions. You can use assert or enforce in calling code to check that the arguments and/or return value meet some criteria. You can have it accept or return types with invariants. In the case of strlen, you can rely on the wording of the C standard which states that "The strlen function returns the number of characters that precede the terminating null character" to know that if you call it with a null-terminated string (its precondition), the result will be in-bounds (its postcondition). There's nothing weird or unusual or "a hard sell" about this. Programming is always like this, in any language, memory-safe or not.
May 27 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 27.05.20 15:43, Paul Backus wrote:
 I think you are focused in so closely on this particular example that 
 you are losing track of the bigger picture.
That's entirely possible.
 Knowing that a function is  safe guarantees you that (modulo errors in 
  trusted code) it will not violate memory safety when called with safe 
 arguments. That is the *only* thing  safe guarantees.
Agreed. The question is: What else can trusted code rely on, beyond the guarantees of safe? [...]
 In the case of strlen, you can rely on the wording of 
 the C standard which states that "The strlen function returns the number 
 of characters that precede the terminating null character" to know that 
 if you call it with a null-terminated string (its precondition), the 
 result will be in-bounds (its postcondition).
Ok. An trusted function can rely on the C standard. I take it that an trusted function can also rely on other documentation of system functions. So far I'm with you. From your previous post I figured that this is your position towards calling safe from trusted: trusted code is not allowed to rely on the documented return value of an safe function. The trusted function must instead verify that the actually returned value is safe to use. I'm not sure if I'm representing you correctly, but that position makes sense to me. At the same time, it doesn't seem feasible, because I don't see how we're going to get users to adhere to that. The way I see it we can either make the rules for trusted so arcane that practically no one will be able to follow them, or we accept that a change in safe code can lead to memory corruption. At the moment I'm leaning towards the latter.
May 27 2020
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 27 May 2020 at 15:14:17 UTC, ag0aep6g wrote:
 On 27.05.20 15:43, Paul Backus wrote:
 I think you are focused in so closely on this particular 
 example that you are losing track of the bigger picture.
That's entirely possible.
 Knowing that a function is  safe guarantees you that (modulo 
 errors in  trusted code) it will not violate memory safety 
 when called with safe arguments. That is the *only* thing 
  safe guarantees.
Agreed. The question is: What else can trusted code rely on, beyond the guarantees of safe?
If the specific functions it's calling make any additional guarantees above and beyond what safe requires, it can rely on those as well. The question, then, is what constitutes a "guarantee."
 [...]
 In the case of strlen, you can rely on the wording of the C 
 standard which states that "The strlen function returns the 
 number of characters that precede the terminating null 
 character" to know that if you call it with a null-terminated 
 string (its precondition), the result will be in-bounds (its 
 postcondition).
Ok. An trusted function can rely on the C standard. I take it that an trusted function can also rely on other documentation of system functions. So far I'm with you.
It can, to the extent that you trust the code to conform to the documentation. Personally, I am willing to trust libc to conform to the C standard, which means that a buggy or non-conforming libc will be able to cause memory corruption in safe programs that I write. If you want to take a hard-line stance, you should not trust documentation at all. Note that trusting the D compiler to conform to the D language standard is also, in some sense, "trusting documentation." A bug in the compiler can always introduce memory corruption to safe code. So "never trust documentation under any circumstances" is not really a tenable position in practice.
 From your previous post I figured that this is your position 
 towards calling  safe from  trusted:

      trusted code is not allowed to rely on the documented
     return value of an  safe function. The  trusted function
     must instead verify that the actually returned value is
     safe to use.
This is my position on *any* function calling *any other* function. Even in 100% system code, I must (for example) check the return value of malloc if I want to rely on it not being null.
 I'm not sure if I'm representing you correctly, but that 
 position makes sense to me. At the same time, it doesn't seem 
 feasible, because I don't see how we're going to get users to 
 adhere to that.
Why does it not seem feasible? Checking return values is defensive programming 101. People already do this sort of thing all the time. I agree that writing correct trusted code is difficult, and that people are going to make mistakes--just like writing correct C code is difficult, and people make mistakes trying to. I don't think there's anything we can do in the language itself to fix that, other than making it easy to make the trusted parts of the code as small as possible.
May 27 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 27.05.20 18:04, Paul Backus wrote:
 On Wednesday, 27 May 2020 at 15:14:17 UTC, ag0aep6g wrote:
[...]
 From your previous post I figured that this is your position towards 
 calling  safe from  trusted:

      trusted code is not allowed to rely on the documented
     return value of an  safe function. The  trusted function
     must instead verify that the actually returned value is
     safe to use.
This is my position on *any* function calling *any other* function. Even in 100% system code, I must (for example) check the return value of malloc if I want to rely on it not being null.
 I'm not sure if I'm representing you correctly, but that position 
 makes sense to me. At the same time, it doesn't seem feasible, because 
 I don't see how we're going to get users to adhere to that.
Why does it not seem feasible? Checking return values is defensive programming 101. People already do this sort of thing all the time.
I think we're on the same page for the most part, but not here. I'm pretty sure that you agree with this: When we call C's strlen like so: `strlen("foo\0".ptr)`, we can assume that the result will be 3, because strlen is documented to behave that way. I'm not sure where you stand on this: If an safe function is documented to return 42, can we rely on that in the same way we can rely on strlen's documented behavior? Let's say that the author of the safe is as trustworthy as the C standard library. If we can assume 42, a bug in safe code can lead to memory corruption by breaking an assumption that is in trusted/ system code. Just like a bug in system code can have that same effect. This might be obviously true to you. But I hadn't really made that connection until recently. So if you agree with this, then I think we're on the same page now. And we just accept that a mistake in safe code can possibly kill safety. On the other side, if we cannot assume that 42 being returned, but we need to check that 42 was indeed returned, then the weird scenario happens where an safe `my_strlen` becomes more cumbersome to use than C's system `strlen`. I don't think any existing trusted code was written with that in mind. If I'm not making any sense, maybe we can compare our answers to the question whether the following snippets (copied from earlier) can really be trusted. I think this one is fine: ---- void f() trusted { import core.stdc.string: strlen; import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[strlen(buf.ptr) - 1]; writeln(last_char); } ---- And I think this one is fine, too: ---- size_t my_strlen(ref char[5] buf) safe { foreach (i; 0 .. buf.length) if (buf[i] == '\0') return i; return buf.length; } void f() trusted { import std.stdio: writeln; char[5] buf = "foo\0\0"; char last_char = buf.ptr[my_strlen(buf) - 1]; writeln(last_char); } ----
May 27 2020
next sibling parent ag0aep6g <anonymous example.com> writes:
On 27.05.20 19:02, ag0aep6g wrote:
 Let's say that the author of the  safe is as trustworthy as the C standard
library.
Should read: Let's say that the author of the safe **function** is as trustworthy as the C standard library.
 On the other side, if we cannot assume that 42 being returned,
Should read: On the other side, if we cannot assume that 42 **is** being returned,
May 27 2020
prev sibling next sibling parent Arafel <er.krali gmail.com> writes:
On 27/5/20 19:02, ag0aep6g wrote:
 I'm pretty sure that you agree with this: When we call C's strlen like 
 so: `strlen("foo\0".ptr)`, we can assume that the result will be 3, 
 because strlen is documented to behave that way.
 
There's one big difference: in safe there are bound checks by default, so even if you as the programmer assume that `strlen` will return the right value, the compiler is still inserting checks at every access: ``` safe void safeFunc() { string foo = "foo\0"); auto len = strlen(foo); auto bar = foo[0..len - 1]; // It'll be checked even in -release mode } trusted void trustedFunc() { string foo = "foo\0"); auto len = strlen(foo); auto bar = foo[0..len - 1]; } ``` So it's not that you needn't checks in safe code, it's that they are added automatically.
May 27 2020
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 27 May 2020 at 17:02:15 UTC, ag0aep6g wrote:
 I'm not sure where you stand on this: If an  safe function is 
 documented to return 42, can we rely on that in the same way we 
 can rely on strlen's documented behavior? Let's say that the 
 author of the  safe is as trustworthy as the C standard library.
It depends entirely on how far you are willing to extend your trust. If you're willing to trust the C library to conform to the C standard, and you stipulate that this other function and its documentation are equally trustworthy, then yes, you can trust it to return 42. I think the key insight here is that the memory-safety property established by D's safe is *conditional*, not absolute. In other words, the D compiler (modulo bugs) establishes the following logical implication: my trusted code is memory-safe -> my safe code is memory-safe Proving the left-hand side is left as an exercise to the programmer. In practice, proving your trusted code correct without making *any* assumptions about other code is too difficult, since at the very least you have to account for dependencies like the C library and the operating system. At some point, you have to trust that the code you're calling does what it says it does. So what you end up doing is establishing another logical implication: my dependencies behave as-documented -> my trusted code is memory-safe By the rules of logical inference, it follows that my dependencies behave as-documented -> my safe code is correct This is far from an absolute safety guarantee, but it can be good enough, as long as you stick to dependencies that are well-documented and well-tested. Of course, if you are willing to put in enough effort, you can tighten up the condition on the left as much as you want. For example, maybe you decide to only trust the C library and the operating system, and audit every other dependency yourself. Then you end up with my libc and OS behave as-documented -> my safe code is correct This is more or less the broadest assurance of safety you can achieve if you are trying to write "portable" code, and I expect the vast majority of programmers would be satisfied with it. In princple, if you are doing bare-metal development, you could get all the way to my hardware behaves as-documented -> my safe code is correct ...but that's unlikely to be worth the effort outside of very specialized and safety-critical fields.
May 27 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 27.05.20 19:30, Paul Backus wrote:
 In practice, proving your  trusted code correct without making *any* 
 assumptions about other code is too difficult, since at the very least 
 you have to account for dependencies like the C library and the 
 operating system. At some point, you have to trust that the code you're 
 calling does what it says it does. So what you end up doing is 
 establishing another logical implication:
 
      my dependencies behave as-documented -> my  trusted code is 
 memory-safe
So: my dependencies do not behave as-documented -> my trusted code may be not memory-safe right? Let's say an safe function `my_strlen` is part of my dependencies. Then: `my_strlen` does not behave as-documented -> my trusted code may be not memory-safe Or in other words: A mistake in an safe function can lead to memory corruption. Which is what Steven is saying. And I agree.
May 27 2020
parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 27 May 2020 at 18:46:59 UTC, ag0aep6g wrote:
 So:

     my dependencies do not behave as-documented -> my  trusted 
 code may be not memory-safe

 right?
[...]
 Or in other words: A mistake in an  safe function can lead to 
 memory corruption.

 Which is what Steven is saying. And I agree.
Yes, that's correct.
May 27 2020
prev sibling next sibling parent reply Arine <arine1283798123 gmail.com> writes:
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:
 Deep in the discussion thread for DIP 1028 there is this little 
 remark by Zoadian [1]:

 you can break previously verified  trusted code by just 
 writing  safe code today.
That statement fits something that occurred to me when trying to lock down the definition of "safe interfaces" [2]. Consider this little program that prints the address and first character of a string in a convoluted way: import std.stdio; char f(string s) trusted { immutable(char)* c = s.ptr; writeln(g(* cast(size_t*) &c)); return *c; } size_t g(ref size_t s) safe { return s; } void main() safe { writeln(f("foo")); }
You are passing a pointer into a function that takes a mutable size_t by reference and then use the pointer afterwards. You get what's coming to you if you think that's suitable for trusted. This is a good example that care must still be taken in trusted. You are doing something dangerous, expect to be burned by it. char f(string s) trusted { { immutable(char)* c = s.ptrl writeln(g(* cast(size_t*) &c)); // var c invalidated by above function, don't use after this line } return s[0]; }
May 25 2020
next sibling parent Arine <arine1283798123 gmail.com> writes:
On Tuesday, 26 May 2020 at 00:57:52 UTC, Arine wrote:
      char f(string s)  trusted
      {
          {
              immutable(char)* c = s.ptrl
              writeln(g(* cast(size_t*) &c));
              // var c invalidated by above function, don't use 
 after this line
          }
          return s[0];
      }
Ops, even that would still need a size check as well.
May 25 2020
prev sibling parent ag0aep6g <anonymous example.com> writes:
On 26.05.20 02:57, Arine wrote:
 You are passing a pointer into a function that takes a mutable size_t by 
 reference and then use the pointer afterwards. You get what's coming to 
 you if you think that's suitable for  trusted.
 
 This is a good example that care must still be taken in  trusted. You 
 are doing something dangerous, expect to be burned by it.
So would you say that the function should not have been trusted in the first place, because it can't guarantee to stay safe? Or was the trusted attribute okay at first, and it only became invalid later when the safe code changed? And is it acceptable that safe code can invalidate trusted attributes like that?
May 25 2020
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 26.05.20 01:04, ag0aep6g wrote:
 Consider this little program that prints the address and first character 
 of a string in a convoluted way:
 
      import std.stdio;
      char f(string s)  trusted
      {
          immutable(char)* c = s.ptr;
          writeln(g(* cast(size_t*) &c));
          return *c;
      }
      size_t g(ref size_t s)  safe
      {
          return s;
      }
      void main()  safe
      {
          writeln(f("foo"));
      }
 
 As the spec stands, I believe it allows f to be  trusted.
I don't think so. trusted code can't rely on safe code behaving a certain way to ensure memory safety, it has to be defensive.
May 26 2020
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/26/2020 12:07 AM, Timon Gehr wrote:
 I don't think so.  trusted code can't rely on  safe code behaving a certain
way 
 to ensure memory safety, it has to be defensive.
I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
May 26 2020
parent reply ag0aep6g <anonymous example.com> writes:
On 26.05.20 11:35, Walter Bright wrote:
 On 5/26/2020 12:07 AM, Timon Gehr wrote:
 I don't think so.  trusted code can't rely on  safe code behaving a 
 certain way to ensure memory safety, it has to be defensive.
I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
Nice. Timon and Walter agree on something related to safety. That must mean something. I take it you guys are good with adding the note about undefined behavior to the spec then? Repeating it here for reference: Undefined behavior: Calling a safe function or a trusted function with unsafe values or unsafe aliasing has undefined behavior.
May 26 2020
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 26 May 2020 at 13:19:04 UTC, ag0aep6g wrote:
 I take it you guys are good with adding the note about 
 undefined behavior to the spec then? Repeating it here for 
 reference:

     Undefined behavior: Calling a safe function or a trusted
     function with unsafe values or unsafe aliasing has undefined
     behavior.
As far as I can tell that's already implied by the first sentence under "safe interfaces":
 Given that it is only called with safe values and safe 
 aliasing, a function has a safe interface when:
...but being more explicit seems like it can't hurt. "Safe Interfaces" to read as follows:
 3. it cannot introduce unsafe aliasing **of memory** that is 
 accessible from other parts of the program **while that 
 aliasing exists**.
May 26 2020
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 26.05.20 15:19, ag0aep6g wrote:
 On 26.05.20 11:35, Walter Bright wrote:
 On 5/26/2020 12:07 AM, Timon Gehr wrote:
 I don't think so.  trusted code can't rely on  safe code behaving a 
 certain way to ensure memory safety, it has to be defensive.
I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
Nice. Timon and Walter agree on something related to safety.
(We agree on many things. It just does not seem that way because I seldomly get involved when I agree with a decision.)
May 26 2020
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/26/2020 10:23 AM, Timon Gehr wrote:
 On 26.05.20 15:19, ag0aep6g wrote:
 On 26.05.20 11:35, Walter Bright wrote:
 On 5/26/2020 12:07 AM, Timon Gehr wrote:
 I don't think so.  trusted code can't rely on  safe code behaving a certain 
 way to ensure memory safety, it has to be defensive.
I agree. The trusted code here is not passing safe arguments to g(), but it is trusted to do so.
Nice. Timon and Walter agree on something related to safety.
(We agree on many things. It just does not seem that way because I seldomly get involved when I agree with a decision.)
It is indeed nice when we agree. I appreciate it.
May 27 2020
prev sibling parent Nathan S. <no.public.email example.com> writes:
On Tuesday, 26 May 2020 at 07:07:33 UTC, Timon Gehr wrote:
 I don't think so.  trusted code can't rely on  safe code 
 behaving a certain way to ensure memory safety, it has to be 
 defensive.
I'd say it's sound to rely on any postconditions in a function's `out` contract. Those can be mechanically enforced. I'd also say it's sound for trusted code to rely on the behavior of a function whose source code the author of the trusted code has reviewed and either has control over or can prevent from changing (such as by specifying a library version in a dependency manager). I'd also say it's within the spirit of trusted to rely on the behavior of any function that allegedly adheres to a specification and has been blessed by some expensive certifying body or meticulous review process as meeting that specification. After all, all trusted is (including the trusted the author is applying to his own function) is some human's statement that he has carefully inspected some function and can vouch that calling it won't cause memory corruption / undefined behavior for a program that wasn't already in an invalid state.
May 27 2020
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:
 [..]

 Consider this little program that prints the address and first 
 character of a string in a convoluted way:

     import std.stdio;
     char f(string s)  trusted
     {
         immutable(char)* c = s.ptr;
         writeln(g(* cast(size_t*) &c));
         return *c;
     }
     size_t g(ref size_t s)  safe
     {
         return s;
     }
     void main()  safe
     {
         writeln(f("foo"));
     }

 As the spec stands, I believe it allows f to be  trusted. The 
 function doesn't exhibit undefined behavior, and it doesn't 
 leak any unsafe values or unsafe aliasing. So it has a safe 
 interface and can be  trusted.
As mentioned by others, it is incorrect to label `f` with ` trusted` because: 1. It provides unsafe access to potentially out of bounds memory - there's no guarantee that `s.length >= size_t.sizeof` is true (in addition to the possibility of `s.ptr` being `null`). 2. It creates mutable aliasing to immutable memory and passes it to another function. Casting away `immutable` could be safe iff the mutable reference can't be used to modify the memory. So, `g` must take a `const` reference to `size_t`, in order for `f` to even begin to be considered a candidate for the ` trusted` attribute.
May 26 2020
parent ag0aep6g <anonymous example.com> writes:
On 26.05.20 11:13, Petar Kirov [ZombineDev] wrote:
 On Monday, 25 May 2020 at 23:04:49 UTC, ag0aep6g wrote:
[...]
     char f(string s)  trusted
     {
         immutable(char)* c = s.ptr;
         writeln(g(* cast(size_t*) &c));
         return *c;
     }
[...]
 As mentioned by others, it is incorrect to label `f` with ` trusted` 
 because:
 1. It provides unsafe access to potentially out of bounds memory - 
 there's no guarantee that `s.length >= size_t.sizeof` is true (in 
 addition to the possibility of `s.ptr` being `null`).
I think you're misreading the code. It's not reading size_t.sizeof bytes starting at c. It's reinterpreting the pointer c itself as a size_t.
 2. It creates mutable aliasing to immutable memory and passes it to 
 another function. Casting away `immutable` could be safe iff the mutable 
 reference can't be used to modify the memory. So, `g` must take a 
 `const` reference to `size_t`, in order for `f` to even begin to be 
 considered a candidate for the ` trusted` attribute.
Great. So far the majority opinion seems to be that f is invalid from the start, and calling g like that just can't considered be safe. I like it.
May 26 2020