digitalmars.dip.ideas - Make printf safe
- Walter Bright (45/45) Jul 13 2024 The idea is printf is already largely safe:
- monkyyy (15/16) Jul 13 2024 I basically never use the old c printf syntax and Im pretty sure
- Walter Bright (20/20) Jul 13 2024 I understand the desire to use modern write().
- Steven Schveighoffer (4/5) Jul 13 2024 This is trivially achievable with IES. Do not waste time on
- Walter Bright (2/3) Jul 15 2024 What is IES?
- Richard (Rikki) Andrew Cattermole (2/6) Jul 15 2024 https://dlang.org/spec/istring.html
- Walter Bright (2/5) Jul 15 2024 Thank you.
- Richard (Rikki) Andrew Cattermole (5/5) Jul 13 2024 While this does seem to be useful, there is an issue for adoption with d...
- Walter Bright (4/8) Jul 15 2024 It won't be silent, as printf argument checking has been around for many...
- Timon Gehr (12/23) Jul 14 2024 This part is actually not memory safe.
- Walter Bright (6/31) Jul 15 2024 *I* need it :-) It's an enabling feature, in that it enables me to much ...
- Paul Backus (9/32) Jul 15 2024 So, if the length overflows a 32-bit int, it will be ignored, and
- Walter Bright (12/27) Jul 15 2024 Huh, I missed that little gem!
- Paul Backus (11/18) Jul 16 2024 Of course. What I find objectionable in this case is that (a) the
- Walter Bright (2/2) Jul 16 2024 Followups:
- IchorDev (5/7) Jul 17 2024 [`@safe` functions cannot perform pointer
- Nick Treleaven (8/13) Jul 17 2024 The idea is to make certain calls of `printf` safe when the first
- Quirin Schroll (5/19) Jul 17 2024 I fail to see why there can’t be a `dprintf` that works like
- IchorDev (2/9) Jul 17 2024 And the function will still perform pointer arithmetic.
- Nick Treleaven (17/31) Jul 18 2024 So does copying a D array, but that is safe.
- Nick Treleaven (5/8) Jul 18 2024 I messed that up:
- Walter Bright (2/2) Jul 17 2024 Please continue this here:
The idea is printf is already largely safe: ``` printf("number = %d\n", 3); ``` is perfectly safe. Unsafe problems: 1. if the arguments and their types do not match the format string. D already checks for that, so we're good there. 2. if a pointer is passed to %s: ``` char* name; printf("name = %s\n", name); ``` That's unsafe. We normally fix this with: ``` char[] name; printf("name = %.*s\n", cast(int)name.length, name.ptr); ``` Which is safe, because we know how printf() works. I propose that the compiler rewrite: ``` char[] name; printf("name = %s\n", name); ``` into: ``` printf("name = %.*s\n", cast(int)name.length, name.ptr); ``` (and mark any other use of %.*s as unsafe) We can go further, and realize that since we already check the format string against the arguments, we can rewrite the format string to match the arguments: ``` printf("number = %s\n", 3); ``` becomes: ``` printf("number = %d\n", 3); ``` which makes it much simpler to use printf. I can never remember which format is for size_t, for example. The one format specification (I forgot which one) which assigns a int value through a pointer, can simply be marked as unsafe. Of course, this only applies if the format string is a literal, not a variable. Since dmd already scans and checks the format string against the argument list, most of the work for this proposal is already done.
Jul 13 2024
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:printfI basically never use the old c printf syntax and Im pretty sure thats true of 95% of people here, as far as I know theres one usecase for printf and thats picking float significant digits. Printing in a template lang with a pretty solid print api norm(as far as I know everyone on board with toString) have far better methods then the c norms. Templates and a lib level decisions should be the way forward; now if you want to make those better(such as handling the float digit overloads) this would be a point of interest for me Off the top of my head, you should*nt* verify c formated strings at all, those cryptic letters are form a very long dead era; if your changing the api but want to make it follow to old way, drop everything but '%' scanning i.e. `printf("hello %, today is %.\n","bob",days.monday);`
Jul 13 2024
I understand the desire to use modern write(). But there is a place for a lightweight way to do formatted writing. 1. If you're just linking with the stdc library, there is no write(). 2. printf is probably the most debugged and optimized piece of code that has ever existed. Borland C recoded it in hand-optimized assembler, which was a brilliant move as its fast printf covered for a lot of weakness in its poor code generator 3. write() won't work until pretty much everything in the compiler works. With printf, I can get hello world to work needing only a minimally functional compiler 4. can't use Phobos in dmd's source code, because if a working Phobos was required, it becomes much much harder to bootstrap it 5. most any use of write() causes a rather large pile of template bloat to be inserted into the object file. This makes life difficult when trying to isolate a bug. Currently, dmd's checking of the arguments against the format string has already eliminated a large chunk of the problems with printf. It enabled the removal of dozens of bugs in the dmd code base. A big win! My proposal is pretty lightweight, the heavy lifting was already done with the argument checking. It enables safe use of printf, and removes the temptation to rely on char* strings instead of char[] strings.
Jul 13 2024
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:The idea is printf is already largely safe:This is trivially achievable with IES. Do not waste time on making the compiler have some special case here. -Steve
Jul 13 2024
On 7/13/2024 6:10 PM, Steven Schveighoffer wrote:This is trivially achievable with IES.What is IES?
Jul 15 2024
On 16/07/2024 6:25 AM, Walter Bright wrote:On 7/13/2024 6:10 PM, Steven Schveighoffer wrote:https://dlang.org/spec/istring.htmlThis is trivially achievable with IES.What is IES?
Jul 15 2024
On 7/15/2024 11:26 AM, Richard (Rikki) Andrew Cattermole wrote:Thank you.What is IES?https://dlang.org/spec/istring.html
Jul 15 2024
While this does seem to be useful, there is an issue for adoption with dmd. It'll fool you into thinking it works with the latest compiler when bootstrapping, and silently cause broken programs in older ones. Steven is right, interpolated string wrapper around printf would be a better option.
Jul 13 2024
On 7/13/2024 11:39 PM, Richard (Rikki) Andrew Cattermole wrote:It'll fool you into thinking it works with the latest compiler when bootstrapping, and silently cause broken programs in older ones.It won't be silent, as printf argument checking has been around for many years in dmd.Steven is right, interpolated string wrapper around printf would be a better option.That's more work.
Jul 15 2024
On 7/13/24 22:39, Walter Bright wrote:I propose that the compiler rewrite: ``` char[] name; printf("name = %s\n", name); ``` into: ``` printf("name = %.*s\n", cast(int)name.length, name.ptr); ``` (and mark any other use of %.*s as unsafe)This part is actually not memory safe. In general, I guess provided we can get it right, extending the `pragma(printf)` checks in ` safe` code is indeed an improvement to the language, though I think not a lot of people need this. You should probably have to mark the `printf` prototype as ` trusted` for this to work though. (There are `pragma(printf)` functions that still have a ` system` interface even when there is nothing wrong with the format string and arguments, e.g. `sprintf`.) For everyone who is not aware, here's D's existing `printf` support: https://dlang.org/spec/pragma.html#printf Probably `pragma(scanf)` would need to get similar treatment.
Jul 14 2024
On 7/14/2024 7:06 AM, Timon Gehr wrote:On 7/13/24 22:39, Walter Bright wrote:How is it not safe?I propose that the compiler rewrite: ``` char[] name; printf("name = %s\n", name); ``` into: ``` printf("name = %.*s\n", cast(int)name.length, name.ptr); ``` (and mark any other use of %.*s as unsafe)This part is actually not memory safe.In general, I guess provided we can get it right, extending the `pragma(printf)` checks in ` safe` code is indeed an improvement to the language, though I think not a lot of people need this.*I* need it :-) It's an enabling feature, in that it enables me to much more fully transition dmd away from using 0 terminated strings.You should probably have to mark the `printf` prototype as ` trusted` for this to work though. (There are `pragma(printf)` functions that still have a ` system` interface even when there is nothing wrong with the format string and arguments, e.g. `sprintf`.)It would apply to snprintf, but not sprintf which is not fixable.For everyone who is not aware, here's D's existing `printf` support: https://dlang.org/spec/pragma.html#printf Probably `pragma(scanf)` would need to get similar treatment.Possibly, but scanf is almost never used :-/
Jul 15 2024
On Monday, 15 July 2024 at 18:36:07 UTC, Walter Bright wrote:On 7/14/2024 7:06 AM, Timon Gehr wrote:C23, section 7.23.6.1 ("The fprintf function"), paragraph 5:On 7/13/24 22:39, Walter Bright wrote:How is it not safe?I propose that the compiler rewrite: ``` char[] name; printf("name = %s\n", name); ``` into: ``` printf("name = %.*s\n", cast(int)name.length, name.ptr); ``` (and mark any other use of %.*s as unsafe)This part is actually not memory safe.As noted previously, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. [...] A negative precision argument is taken as if the precision were omitted.So, if the length overflows a 32-bit int, it will be ignored, and printf will read until it finds a zero byte. I suppose we could have the compiler insert a bounds check, in addition to all of the other rewrites, but at this point, it feels like we're not really calling printf at all; we're calling some other formatted-output function that's stolen printf's identity.
Jul 15 2024
On 7/15/2024 12:56 PM, Paul Backus wrote:Huh, I missed that little gem! But there's a simple solution: ``` printf("%.*s\n", cast(int)s.length & 0x7FFF_FFFF, s.ptr); ``` Hence, it will always be a positive integer. That means one can print a maximum of 2 billion characters via printf. Like 640Kb, that ought to be enough for anyone! While failing to print the entirety of such a (suspiciously) long string, it would not be a memory safety issue.How is it not safe?C23, section 7.23.6.1 ("The fprintf function"), paragraph 5:As noted previously, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. [...] A negative precision argument is taken as if the precision were omitted.So, if the length overflows a 32-bit int, it will be ignored, and printf will read until it finds a zero byte.I suppose we could have the compiler insert a bounds check, in addition to all of the other rewrites, but at this point, it feels like we're not really calling printf at all; we're calling some other formatted-output function that's stolen printf's identity.Wrapping APIs with a better interface is what we do all the time :-/
Jul 15 2024
On Tuesday, 16 July 2024 at 01:22:14 UTC, Walter Bright wrote:Of course. What I find objectionable in this case is that (a) the better interface is implemented using a bunch of compiler-internal rewrites, rather than normal D code; and (b) it shadows the existing C printf function rather than existing alongside it. If we need a safer printf for DMD that doesn't carry all the bloat and baggage of Phobos's writef, then by all means, let's write one. But let's write it in D and put it in a normal D module, instead of sneaking around and redefining printf behind our users' backs.I suppose we could have the compiler insert a bounds check, in addition to all of the other rewrites, but at this point, it feels like we're not really calling printf at all; we're calling some other formatted-output function that's stolen printf's identity.Wrapping APIs with a better interface is what we do all the time :-/
Jul 16 2024
Followups: https://www.digitalmars.com/d/archives/digitalmars/dip/development/First_Draft_Making_printf_safe_266.html
Jul 16 2024
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:The idea is printf is already largely safe: [...][` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
Jul 17 2024
On Wednesday, 17 July 2024 at 08:58:56 UTC, IchorDev wrote:On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.The idea is printf is already largely safe: [...][` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
Jul 17 2024
On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven wrote:On Wednesday, 17 July 2024 at 08:58:56 UTC, IchorDev wrote:I fail to see why there can’t be a `dprintf` that works like `printf` except that for `*.s` bound to a `const(char)[]` object, it decomposes it properly into its length and pointer component. Or, even better, use a different specifier, e.g. `%D`.On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.The idea is printf is already largely safe: [...][` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
Jul 17 2024
On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven wrote:The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.And the function will still perform pointer arithmetic.
Jul 17 2024
On Wednesday, 17 July 2024 at 17:24:15 UTC, IchorDev wrote:On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven wrote:So does copying a D array, but that is safe. Responding to your post in DIP development here (because that's for reviews):The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.And the function will still perform pointer arithmetic.strlen assumes that s is zero-terminated```d pragma(msg, printf) printf(const char* fmt, ...) safe; ``` What the above would mean is that `printf` is safe only when `fmt` is given a string literal. String literals are *guaranteed* to be zero-terminated, so there's no assumption of that here. If the pragma checks are not met, `printf` is actually treated as system.Any function that traverses a C string passed as an argument can only be system. Any function that trusts a separate parameter for array bounds can only be system.That requires modification for this proposal. It is true when given a char* for the format parameter. But when a string literal implicitly converts to char*, it has a safe interface due to the pragma, because the literal is statically allocated and is never accessed past its allocation when called from safe code.
Jul 18 2024
On Thursday, 18 July 2024 at 16:45:02 UTC, Nick Treleaven wrote:```d pragma(msg, printf) printf(const char* fmt, ...) safe; ```I messed that up: ```d pragma(printf) extern(C) int printf(const char* fmt, ...) safe; ```
Jul 18 2024
Please continue this here: https://www.digitalmars.com/d/archives/digitalmars/dip/development/First_Draft_Making_printf_safe_266.html
Jul 17 2024