www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.development - First Draft: Making printf safe

reply Walter Bright <newshound2 digitalmars.com> writes:
https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
Jul 16
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/16/2024 5:42 PM, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
Paul Backus writes:
 What I find objectionable in this case is that (a) the better interface is
implemented using a bunch of compiler-internal rewrites, rather than normal D
code; and (b) it shadows the existing C printf function rather than existing
alongside it.
It's a pretty thin piece of paper over printf. Consider: ``` printf("%s\n", 3); ``` That's going to crash a C program. Currently, for D an error will be given. Under this proposal, it will be rewritten as: ``` printf("%d\n", 3); ``` The rewrite will only happen for %s format specifiers. For the following: ``` char* s; printf("%s\n", s); ``` there will be no rewrite, but that call will be considered unsafe. For: ``` char[] s; printf("%s\n", s); ``` that is currently rejected by the compiler. Under this proposal, it will be rewritten as: ``` char[] s; printf("%.*s\n", cast(int)s.length & 0x7FFF_FFFF); ``` which will make it safe. I can't think of a case where the proposal makes any existing uses of printf impossible. If they exist, there are workarounds: 1. use a variable rather than a string literal for the format: ``` char* fmt = "hello %s!\n"; printf(fmt, "betty"); ``` 2. this behavior is triggered by the function being marked as `pragma(printf)`. Don't do that if you don't want it. Or declare printf yourself as: ``` extern (C) int printf(const(char)*, ...); ```
 If we need a safer printf for DMD that doesn't carry all the bloat and baggage
of Phobos's writef, then by all means, let's write one. But let's write it in D
and put it in a normal D module, instead of sneaking around and redefining
printf behind our users' backs.
The printf argument checking code added in has been an unblemished win for us. C and C++ compilers seem to be adding it, too. This is just a small improvement over that.
Jul 16
prev sibling next sibling parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 17/07/2024 12:42 PM, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
I may have questioned in the motives in the past, however this is a useful feature and the DIP looks fine. I'm commenting that I cannot find anything wrong with it so this DIP can move into the queue sooner.
Jul 17
prev sibling next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
 This DIP applies to any function marked with pragma(printf) and 
  safe or  trusted.
So how does `printf` benefit from this then? It can't be marked ` trusted`. Would we add a ` trusted` overload taking a string format parameter and use non-C variadic arguments? The overload would have to handle the cases in this DIP itself. Or we could use an [enum parameter](https://forum.dlang.org/post/jhondvrsvvvrwjkympjb forum.dlang.org) as the format string, if we had those.
Jul 17
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2024 3:31 AM, Nick Treleaven wrote:
 This DIP applies to any function marked with pragma(printf) and  safe or 
  trusted.
So how does `printf` benefit from this then? It can't be marked ` trusted`. Would we add a ` trusted` overload taking a string format parameter and use non-C variadic arguments?
If pragma(printf) is there, the user is asserting that if the format string and arguments are compatible, and the function is also marked trusted or safe, then that particular call is safe. If the function is marked safe, and the call checks determine that it is not safe, then that call is marked as not safe. This is how functions like sprintf(), which cannot ever be safe, can still be marked as system, and still get printf format checking. And calls to fprintf can be marked safe. Yes, it's a bit of special compiler magic, but it works. But it would be so useful, and we already apply compiler magic via pragma(printf).
Jul 17
prev sibling next sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
`printf` always performs pointer arithmetic, and therefore [should not be marked ` safe`](https://dlang.org/spec/function.html#safe-functions):
 No pointer arithmetic (including pointer indexing & slicing).
Marking it as ` trusted` is fine, and historically other `core.stdc` functions that have similar behaviour have also been marked ` trusted`.
Jul 17
parent reply IchorDev <zxinsworld gmail.com> writes:
On Wednesday, 17 July 2024 at 17:45:12 UTC, IchorDev wrote:
 Marking it as ` trusted` is fine
Oops, I didn't re-read the whole section on [safe interfaces](https://dlang.org/spec/function.html#safe-interfaces):
 C's `strlen` and `memcpy` do not have safe interfaces:
 ```d
 extern (C)  system size_t strlen(char* s);
 extern (C)  system void* memcpy(void* dst, void* src, size_t 
 nbytes);
 ```
 because they iterate pointers based on unverified assumptions 
 (`strlen` assumes that `s` is zero-terminated; `memcpy` assumes 
 that the memory objects pointed to by `dst` and `src` are at 
 least `nbytes` big). Any function that traverses a C string 
 passed as an argument can only be ` system`. Any function that 
 trusts a separate parameter for array bounds can only be 
 ` system`.
So, `printf` must be ` system`. Even `%.*s` is ` system`!
Jul 17
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2024 11:05 AM, IchorDev wrote:
 So, `printf` must be ` system`. Even `%.*s` is ` system`!
This proposal puts a safe interface around %.*s.
Jul 18
prev sibling next sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
Let’s say I have a ` safe`-annotated function. If I understand the DIP draft correctly, it’s proposed that I can call `printf` if * the format is a compile-time constant (possibly derived through CTFE) and plus the compiler: * issues a hard error on incorrect use, e.g. number of arguments and specifiers mismatch, * silently changes the format specifier if it’s meaningful, e.g. `%s` to `%d` for integers, making `%s` essentially universal, * static array arguments are implicitly sliced, e.g. a `char[10]` argument becomes `char[]` argument, * if a `%s` specifier lines up with `const(Char)[]` argument, silently changes the format specifier `%s` to `%.*s`/`%.*ls` and the corresponding argument `xs` is replaced by `cast(int)(xs.length & int.max` and `xs.ptr`. My only issue is the `& int.max`, that should be a non-assert feature. With asserts enabled, just `assert(xs.length < int.max)`. Otherwise, it’s a great idea. I’d make it `__printf`, though, and ideally, `__printf` becomes a new core-language keyword so that all the compiler-magic and special casing is appropriately justified. It should also not require any imports then, which would make it even easier to use. Changing `printf` in any shape or form will make some people unhappy. I could imagine people being much happier having a keyword that is guaranteed to lower to a `printf` call, with some checks and convenience added. If we’re at it, `__printf` could also support slices of non-character type: When a non-character array is an argument type that lines up with some specifier, cut the format in half, loop over elements and print them individually comma-separated and using that specifier, then continue with the rest of the format: ```d int a, b; int[] xs; int n = __printf("%d xyz %X abc %d", a, xs, b); // lowers to: int n = { int __result = __printf("%d xyz [", a); if (xs.length > 0) { __result += __printf("%X", xs[0]); foreach (__x; xs[1..$]) __result += __printf(", %X", __x); } return __result + __printf("] abc %d", a, xs, b); }(); ``` A similar approach would work for associative arrays as well. What’s so cool about it is that it would work with nested arrays! The cutting-and-loop approach also works for `struct` types, printing some header (the type name) and then the comma-separated `tupleof`, provided the members are of printf-friendly types.
Jul 17
prev sibling next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
 This proposal will cause the format specifier to be rewritten 
 to match the argument type, if the format specifier is %s.
Dennis [has pointed out](https://forum.dlang.org/post/vnhnhkxxurgnpvpoilzp forum.dlang.org) that this can corrupt memory (in a system or trusted function) just by simple refactoring:
 You would think it's safe to transform this:
```d int x; ... printf("x = %s\n", x); printf("x = %s\n", x); ```
 Into this:
```d const(char)* fmt = "x = %s\n"; printf(fmt, x); printf(fmt, x); ``` That's quite a pitfall and easy to overlook in code review. I suggest removing that feature for argument types other than character arrays.
 If the format specifier is %s and the corresponding argument is 
 a D array of char or wchar_t, the format will be replace with 
 %.*s (or %.*ls) and the argument will be replaced with two 
 arguments
I think that's fine, because D doesn't allow passing arrays to variadic arguments. So if those calls were refactored, they would cause a compile-time error.
Aug 02
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/2/2024 7:22 AM, Nick Treleaven wrote:
 Dennis [has pointed 
 out](https://forum.dlang.org/post/vnhnhkxxurgnpvpoilzp forum.dlang.org) that 
 this can corrupt memory (in a  system or  trusted function) just by simple 
 refactoring:
I replied to Dennis already.
Aug 05
prev sibling parent reply claptrap <clap trap.com> writes:
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
Why not just make a safe version of printf, and have a compiler error point people at it if they call unsafe printf from safe code?
Aug 02
parent reply Nick Treleaven <nick geany.org> writes:
On Friday, 2 August 2024 at 16:39:19 UTC, claptrap wrote:
 On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:
 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md
Why not just make a safe version of printf, and have a compiler error point people at it if they call unsafe printf from safe code?
This DIP doesn't only apply to C's printf, it (potentially) applies to other pragma(printf) functions. AIUI a major motivation for it is dmd's frontend which has a lot of calls to its own extern(C) functions with printf formatting strings and C varargs. (I'd be interested in seeing an alternative proposal too on how to make those calls safe without this DIP, whilst still compiling with GDC & LDC).
Aug 03
parent reply Tim <tim.dlang t-online.de> writes:
On Saturday, 3 August 2024 at 11:27:19 UTC, Nick Treleaven wrote:
 This DIP doesn't only apply to C's printf, it (potentially) 
 applies to other pragma(printf) functions. AIUI a major 
 motivation for it is dmd's frontend which has a lot of calls to 
 its own extern(C) functions with printf formatting strings and 
 C varargs. (I'd be interested in seeing an alternative proposal 
 too on how to make those calls safe without this DIP, whilst 
 still compiling with GDC & LDC).
One alternative would be a wrapper around printf, which checks the format string at compile time and modifies it and other parameters, so they match. It could look something like this: ```D void printfWrapper(string fmt, P...)(P params) trusted { // ... } void main() safe { string s = "World"; printfWrapper!"Hello %s\n"(s); } ``` An advantage would be, that it could be used immediately in DMD. A new language feature could only be used after the bootstrap compiler is updated to this version, too.
Aug 03
parent reply Nick Treleaven <nick geany.org> writes:
On Saturday, 3 August 2024 at 15:28:35 UTC, Tim wrote:
 One alternative would be a wrapper around printf, which checks 
 the format string at compile time and modifies it and other 
 parameters, so they match. It could look something like this:

 ```D
 void printfWrapper(string fmt, P...)(P params)  trusted
 {
     // ...
 }

 void main()  safe
 {
     string s = "World";
     printfWrapper!"Hello %s\n"(s);
 }
 ```
There are 13 `pragma(printf)` functions in dmd. They could be changed. However, now those aren't functions but templates. I'm not sure if that's OK for ldc and gdc, maybe. And every call to each of those functions would need updating (unless we had enum parameters). The dmd as a library API would also be impacted, even if the original functions were kept as deprecated.
 An advantage would be, that it could be used immediately in 
 DMD. A new language feature could only be used after the 
 bootstrap compiler is updated to this version, too.
Those functions could maybe be marked trusted on the basis that the current dmd tests with the feature have checked all calls to them, and dmd as a library could require the new version of dmd for the API with the new trusted functions.
Aug 03
parent reply Tim <tim.dlang t-online.de> writes:
On Saturday, 3 August 2024 at 17:12:15 UTC, Nick Treleaven wrote:
 There are 13 `pragma(printf)` functions in dmd. They could be 
 changed. However, now those aren't functions but templates. I'm 
 not sure if that's OK for ldc and gdc, maybe. And every call to 
 each of those functions would need updating (unless we had enum 
 parameters). The dmd as a library API would also be impacted, 
 even if the original functions were kept as deprecated.
Yes, the change could be too big. I did not know how much it is used in DMD. The format string could also be checked or rewritten at runtime, but that would have other disadvantages.
 Those functions could maybe be marked  trusted on the basis 
 that the current dmd tests with the feature have checked all 
 calls to them, and dmd as a library could require the new 
 version of dmd for the API with the new  trusted functions.
That would work if the format string is only checked by the compiler. The DIP proposes, that it is also silently modified. Compiling DMD with an old bootstrap compiler could then contain a wrong format string.
Aug 03
parent Nick Treleaven <nick geany.org> writes:
On Saturday, 3 August 2024 at 17:47:45 UTC, Tim wrote:
 That would work if the format string is only checked by the 
 compiler. The DIP proposes, that it is also silently modified. 
 Compiling DMD with an old bootstrap compiler could then contain 
 a wrong format string.
Yes, you're right. And we couldn't pass string arguments either.
Aug 03