digitalmars.dip.development - First Draft: Making printf safe
- Walter Bright (1/1) Jul 16 https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7...
- Walter Bright (44/47) Jul 16 It's a pretty thin piece of paper over printf. Consider:
- Richard (Rikki) Andrew Cattermole (5/6) Jul 17 I may have questioned in the motives in the past, however this is a
- Nick Treleaven (7/10) Jul 17 So how does `printf` benefit from this then? It can't be marked
- Walter Bright (10/16) Jul 17 If pragma(printf) is there, the user is asserting that if the format str...
- IchorDev (7/9) Jul 17 `printf` always performs pointer arithmetic, and therefore
- IchorDev (4/18) Jul 17 Oops, I didn't re-read the whole section on [safe
- Walter Bright (2/3) Jul 18 This proposal puts a safe interface around %.*s.
- Quirin Schroll (52/53) Jul 17 Let’s say I have a `@safe`-annotated function. If I understand
- Nick Treleaven (20/29) Aug 02 Dennis [has pointed
- Walter Bright (2/6) Aug 05 I replied to Dennis already.
- claptrap (4/5) Aug 02 Why not just make a @safe version of printf, and have a compiler
- Nick Treleaven (8/13) Aug 03 This DIP doesn't only apply to C's printf, it (potentially)
- Tim (18/25) Aug 03 One alternative would be a wrapper around printf, which checks
- Nick Treleaven (11/28) Aug 03 There are 13 `pragma(printf)` functions in dmd. They could be
- Tim (8/18) Aug 03 Yes, the change could be too big. I did not know how much it is
- Nick Treleaven (2/6) Aug 03 Yes, you're right. And we couldn't pass string arguments either.
On 7/16/2024 5:42 PM, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdPaul Backus writes:What I find objectionable in this case is that (a) the better interface is implemented using a bunch of compiler-internal rewrites, rather than normal D code; and (b) it shadows the existing C printf function rather than existing alongside it.It's a pretty thin piece of paper over printf. Consider: ``` printf("%s\n", 3); ``` That's going to crash a C program. Currently, for D an error will be given. Under this proposal, it will be rewritten as: ``` printf("%d\n", 3); ``` The rewrite will only happen for %s format specifiers. For the following: ``` char* s; printf("%s\n", s); ``` there will be no rewrite, but that call will be considered unsafe. For: ``` char[] s; printf("%s\n", s); ``` that is currently rejected by the compiler. Under this proposal, it will be rewritten as: ``` char[] s; printf("%.*s\n", cast(int)s.length & 0x7FFF_FFFF); ``` which will make it safe. I can't think of a case where the proposal makes any existing uses of printf impossible. If they exist, there are workarounds: 1. use a variable rather than a string literal for the format: ``` char* fmt = "hello %s!\n"; printf(fmt, "betty"); ``` 2. this behavior is triggered by the function being marked as `pragma(printf)`. Don't do that if you don't want it. Or declare printf yourself as: ``` extern (C) int printf(const(char)*, ...); ```If we need a safer printf for DMD that doesn't carry all the bloat and baggage of Phobos's writef, then by all means, let's write one. But let's write it in D and put it in a normal D module, instead of sneaking around and redefining printf behind our users' backs.The printf argument checking code added in has been an unblemished win for us. C and C++ compilers seem to be adding it, too. This is just a small improvement over that.
Jul 16
On 17/07/2024 12:42 PM, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdI may have questioned in the motives in the past, however this is a useful feature and the DIP looks fine. I'm commenting that I cannot find anything wrong with it so this DIP can move into the queue sooner.
Jul 17
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdThis DIP applies to any function marked with pragma(printf) and safe or trusted.So how does `printf` benefit from this then? It can't be marked ` trusted`. Would we add a ` trusted` overload taking a string format parameter and use non-C variadic arguments? The overload would have to handle the cases in this DIP itself. Or we could use an [enum parameter](https://forum.dlang.org/post/jhondvrsvvvrwjkympjb forum.dlang.org) as the format string, if we had those.
Jul 17
On 7/17/2024 3:31 AM, Nick Treleaven wrote:If pragma(printf) is there, the user is asserting that if the format string and arguments are compatible, and the function is also marked trusted or safe, then that particular call is safe. If the function is marked safe, and the call checks determine that it is not safe, then that call is marked as not safe. This is how functions like sprintf(), which cannot ever be safe, can still be marked as system, and still get printf format checking. And calls to fprintf can be marked safe. Yes, it's a bit of special compiler magic, but it works. But it would be so useful, and we already apply compiler magic via pragma(printf).This DIP applies to any function marked with pragma(printf) and safe or trusted.So how does `printf` benefit from this then? It can't be marked ` trusted`. Would we add a ` trusted` overload taking a string format parameter and use non-C variadic arguments?
Jul 17
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.md`printf` always performs pointer arithmetic, and therefore [should not be marked ` safe`](https://dlang.org/spec/function.html#safe-functions):No pointer arithmetic (including pointer indexing & slicing).Marking it as ` trusted` is fine, and historically other `core.stdc` functions that have similar behaviour have also been marked ` trusted`.
Jul 17
On Wednesday, 17 July 2024 at 17:45:12 UTC, IchorDev wrote:Marking it as ` trusted` is fineOops, I didn't re-read the whole section on [safe interfaces](https://dlang.org/spec/function.html#safe-interfaces):C's `strlen` and `memcpy` do not have safe interfaces: ```d extern (C) system size_t strlen(char* s); extern (C) system void* memcpy(void* dst, void* src, size_t nbytes); ``` because they iterate pointers based on unverified assumptions (`strlen` assumes that `s` is zero-terminated; `memcpy` assumes that the memory objects pointed to by `dst` and `src` are at least `nbytes` big). Any function that traverses a C string passed as an argument can only be ` system`. Any function that trusts a separate parameter for array bounds can only be ` system`.So, `printf` must be ` system`. Even `%.*s` is ` system`!
Jul 17
On 7/17/2024 11:05 AM, IchorDev wrote:So, `printf` must be ` system`. Even `%.*s` is ` system`!This proposal puts a safe interface around %.*s.
Jul 18
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdLet’s say I have a ` safe`-annotated function. If I understand the DIP draft correctly, it’s proposed that I can call `printf` if * the format is a compile-time constant (possibly derived through CTFE) and plus the compiler: * issues a hard error on incorrect use, e.g. number of arguments and specifiers mismatch, * silently changes the format specifier if it’s meaningful, e.g. `%s` to `%d` for integers, making `%s` essentially universal, * static array arguments are implicitly sliced, e.g. a `char[10]` argument becomes `char[]` argument, * if a `%s` specifier lines up with `const(Char)[]` argument, silently changes the format specifier `%s` to `%.*s`/`%.*ls` and the corresponding argument `xs` is replaced by `cast(int)(xs.length & int.max` and `xs.ptr`. My only issue is the `& int.max`, that should be a non-assert feature. With asserts enabled, just `assert(xs.length < int.max)`. Otherwise, it’s a great idea. I’d make it `__printf`, though, and ideally, `__printf` becomes a new core-language keyword so that all the compiler-magic and special casing is appropriately justified. It should also not require any imports then, which would make it even easier to use. Changing `printf` in any shape or form will make some people unhappy. I could imagine people being much happier having a keyword that is guaranteed to lower to a `printf` call, with some checks and convenience added. If we’re at it, `__printf` could also support slices of non-character type: When a non-character array is an argument type that lines up with some specifier, cut the format in half, loop over elements and print them individually comma-separated and using that specifier, then continue with the rest of the format: ```d int a, b; int[] xs; int n = __printf("%d xyz %X abc %d", a, xs, b); // lowers to: int n = { int __result = __printf("%d xyz [", a); if (xs.length > 0) { __result += __printf("%X", xs[0]); foreach (__x; xs[1..$]) __result += __printf(", %X", __x); } return __result + __printf("] abc %d", a, xs, b); }(); ``` A similar approach would work for associative arrays as well. What’s so cool about it is that it would work with nested arrays! The cutting-and-loop approach also works for `struct` types, printing some header (the type name) and then the comma-separated `tupleof`, provided the members are of printf-friendly types.
Jul 17
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdThis proposal will cause the format specifier to be rewritten to match the argument type, if the format specifier is %s.Dennis [has pointed out](https://forum.dlang.org/post/vnhnhkxxurgnpvpoilzp forum.dlang.org) that this can corrupt memory (in a system or trusted function) just by simple refactoring:You would think it's safe to transform this:```d int x; ... printf("x = %s\n", x); printf("x = %s\n", x); ```Into this:```d const(char)* fmt = "x = %s\n"; printf(fmt, x); printf(fmt, x); ``` That's quite a pitfall and easy to overlook in code review. I suggest removing that feature for argument types other than character arrays.If the format specifier is %s and the corresponding argument is a D array of char or wchar_t, the format will be replace with %.*s (or %.*ls) and the argument will be replaced with two argumentsI think that's fine, because D doesn't allow passing arrays to variadic arguments. So if those calls were refactored, they would cause a compile-time error.
Aug 02
On 8/2/2024 7:22 AM, Nick Treleaven wrote:Dennis [has pointed out](https://forum.dlang.org/post/vnhnhkxxurgnpvpoilzp forum.dlang.org) that this can corrupt memory (in a system or trusted function) just by simple refactoring:I replied to Dennis already.
Aug 05
On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdWhy not just make a safe version of printf, and have a compiler error point people at it if they call unsafe printf from safe code?
Aug 02
On Friday, 2 August 2024 at 16:39:19 UTC, claptrap wrote:On Wednesday, 17 July 2024 at 00:42:03 UTC, Walter Bright wrote:This DIP doesn't only apply to C's printf, it (potentially) applies to other pragma(printf) functions. AIUI a major motivation for it is dmd's frontend which has a lot of calls to its own extern(C) functions with printf formatting strings and C varargs. (I'd be interested in seeing an alternative proposal too on how to make those calls safe without this DIP, whilst still compiling with GDC & LDC).https://github.com/WalterBright/documents/blob/ed4f1b441e71b5ac5e23a54e7c93e68997981e9a/SafePrintf.mdWhy not just make a safe version of printf, and have a compiler error point people at it if they call unsafe printf from safe code?
Aug 03
On Saturday, 3 August 2024 at 11:27:19 UTC, Nick Treleaven wrote:This DIP doesn't only apply to C's printf, it (potentially) applies to other pragma(printf) functions. AIUI a major motivation for it is dmd's frontend which has a lot of calls to its own extern(C) functions with printf formatting strings and C varargs. (I'd be interested in seeing an alternative proposal too on how to make those calls safe without this DIP, whilst still compiling with GDC & LDC).One alternative would be a wrapper around printf, which checks the format string at compile time and modifies it and other parameters, so they match. It could look something like this: ```D void printfWrapper(string fmt, P...)(P params) trusted { // ... } void main() safe { string s = "World"; printfWrapper!"Hello %s\n"(s); } ``` An advantage would be, that it could be used immediately in DMD. A new language feature could only be used after the bootstrap compiler is updated to this version, too.
Aug 03
On Saturday, 3 August 2024 at 15:28:35 UTC, Tim wrote:One alternative would be a wrapper around printf, which checks the format string at compile time and modifies it and other parameters, so they match. It could look something like this: ```D void printfWrapper(string fmt, P...)(P params) trusted { // ... } void main() safe { string s = "World"; printfWrapper!"Hello %s\n"(s); } ```There are 13 `pragma(printf)` functions in dmd. They could be changed. However, now those aren't functions but templates. I'm not sure if that's OK for ldc and gdc, maybe. And every call to each of those functions would need updating (unless we had enum parameters). The dmd as a library API would also be impacted, even if the original functions were kept as deprecated.An advantage would be, that it could be used immediately in DMD. A new language feature could only be used after the bootstrap compiler is updated to this version, too.Those functions could maybe be marked trusted on the basis that the current dmd tests with the feature have checked all calls to them, and dmd as a library could require the new version of dmd for the API with the new trusted functions.
Aug 03
On Saturday, 3 August 2024 at 17:12:15 UTC, Nick Treleaven wrote:There are 13 `pragma(printf)` functions in dmd. They could be changed. However, now those aren't functions but templates. I'm not sure if that's OK for ldc and gdc, maybe. And every call to each of those functions would need updating (unless we had enum parameters). The dmd as a library API would also be impacted, even if the original functions were kept as deprecated.Yes, the change could be too big. I did not know how much it is used in DMD. The format string could also be checked or rewritten at runtime, but that would have other disadvantages.Those functions could maybe be marked trusted on the basis that the current dmd tests with the feature have checked all calls to them, and dmd as a library could require the new version of dmd for the API with the new trusted functions.That would work if the format string is only checked by the compiler. The DIP proposes, that it is also silently modified. Compiling DMD with an old bootstrap compiler could then contain a wrong format string.
Aug 03
On Saturday, 3 August 2024 at 17:47:45 UTC, Tim wrote:That would work if the format string is only checked by the compiler. The DIP proposes, that it is also silently modified. Compiling DMD with an old bootstrap compiler could then contain a wrong format string.Yes, you're right. And we couldn't pass string arguments either.
Aug 03