www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Make printf safe

reply Walter Bright <newshound2 digitalmars.com> writes:
The idea is printf is already largely safe:

```
printf("number = %d\n", 3);
```
is perfectly safe.

Unsafe problems:

1. if the arguments and their types do not match the format string. D already 
checks for that, so we're good there.

2. if a pointer is passed to %s:

```
char* name;
printf("name = %s\n", name);
```
That's unsafe. We normally fix this with:

```
char[] name;
printf("name = %.*s\n", cast(int)name.length, name.ptr);
```
Which is safe, because we know how printf() works. I propose that the compiler 
rewrite:

```
char[] name;
printf("name = %s\n", name);
```
into:
```
printf("name = %.*s\n", cast(int)name.length, name.ptr);
```
(and mark any other use of %.*s as unsafe)

We can go further, and realize that since we already check the format string 
against the arguments, we can rewrite the format string to match the arguments:

```
printf("number = %s\n", 3);
```
becomes:
```
printf("number = %d\n", 3);
```
which makes it much simpler to use printf. I can never remember which format is 
for size_t, for example.

The one format specification (I forgot which one) which assigns a int value 
through a pointer, can simply be marked as unsafe.

Of course, this only applies if the format string is a literal, not a variable.

Since dmd already scans and checks the format string against the argument list, 
most of the work for this proposal is already done.
Jul 13
next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:
 printf
I basically never use the old c printf syntax and Im pretty sure thats true of 95% of people here, as far as I know theres one usecase for printf and thats picking float significant digits. Printing in a template lang with a pretty solid print api norm(as far as I know everyone on board with toString) have far better methods then the c norms. Templates and a lib level decisions should be the way forward; now if you want to make those better(such as handling the float digit overloads) this would be a point of interest for me Off the top of my head, you should*nt* verify c formated strings at all, those cryptic letters are form a very long dead era; if your changing the api but want to make it follow to old way, drop everything but '%' scanning i.e. `printf("hello %, today is %.\n","bob",days.monday);`
Jul 13
parent Walter Bright <newshound2 digitalmars.com> writes:
I understand the desire to use modern write().

But there is a place for a lightweight way to do formatted writing.

1. If you're just linking with the stdc library, there is no write().

2. printf is probably the most debugged and optimized piece of code that has 
ever existed. Borland C recoded it in hand-optimized assembler, which was a 
brilliant move as its fast printf covered for a lot of weakness in its poor
code 
generator

3. write() won't work until pretty much everything in the compiler works. With 
printf, I can get hello world to work needing only a minimally functional
compiler

4. can't use Phobos in dmd's source code, because if a working Phobos was 
required, it becomes much much harder to bootstrap it

5. most any use of write() causes a rather large pile of template bloat to be 
inserted into the object file. This makes life difficult when trying to isolate 
a bug.

Currently, dmd's checking of the arguments against the format string has
already 
eliminated a large chunk of the problems with printf. It enabled the removal of 
dozens of bugs in the dmd code base. A big win!

My proposal is pretty lightweight, the heavy lifting was already done with the 
argument checking. It enables  safe use of printf, and removes the temptation
to 
rely on char* strings instead of char[] strings.
Jul 13
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:
 The idea is printf is already largely safe:
This is trivially achievable with IES. Do not waste time on making the compiler have some special case here. -Steve
Jul 13
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/13/2024 6:10 PM, Steven Schveighoffer wrote:
 This is trivially achievable with IES.
What is IES?
Jul 15
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 16/07/2024 6:25 AM, Walter Bright wrote:
 On 7/13/2024 6:10 PM, Steven Schveighoffer wrote:
 This is trivially achievable with IES.
What is IES?
https://dlang.org/spec/istring.html
Jul 15
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/15/2024 11:26 AM, Richard (Rikki) Andrew Cattermole wrote:
 What is IES?
https://dlang.org/spec/istring.html
Thank you.
Jul 15
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
While this does seem to be useful, there is an issue for adoption with dmd.

It'll fool you into thinking it works with the latest compiler when 
bootstrapping, and silently cause broken programs in older ones.

Steven is right, interpolated string wrapper around printf would be a 
better option.
Jul 13
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/13/2024 11:39 PM, Richard (Rikki) Andrew Cattermole wrote:
 It'll fool you into thinking it works with the latest compiler when 
 bootstrapping, and silently cause broken programs in older ones.
It won't be silent, as printf argument checking has been around for many years in dmd.
 Steven is right, interpolated string wrapper around printf would be a better 
 option.
That's more work.
Jul 15
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/13/24 22:39, Walter Bright wrote:
 I propose that the compiler rewrite:
 
 ```
 char[] name;
 printf("name = %s\n", name);
 ```
 into:
 ```
 printf("name = %.*s\n", cast(int)name.length, name.ptr);
 ```
 (and mark any other use of %.*s as unsafe)
This part is actually not memory safe. In general, I guess provided we can get it right, extending the `pragma(printf)` checks in ` safe` code is indeed an improvement to the language, though I think not a lot of people need this. You should probably have to mark the `printf` prototype as ` trusted` for this to work though. (There are `pragma(printf)` functions that still have a ` system` interface even when there is nothing wrong with the format string and arguments, e.g. `sprintf`.) For everyone who is not aware, here's D's existing `printf` support: https://dlang.org/spec/pragma.html#printf Probably `pragma(scanf)` would need to get similar treatment.
Jul 14
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/14/2024 7:06 AM, Timon Gehr wrote:
 On 7/13/24 22:39, Walter Bright wrote:
 I propose that the compiler rewrite:

 ```
 char[] name;
 printf("name = %s\n", name);
 ```
 into:
 ```
 printf("name = %.*s\n", cast(int)name.length, name.ptr);
 ```
 (and mark any other use of %.*s as unsafe)
This part is actually not memory safe.
How is it not safe?
 In general, I guess provided we can get it right, extending the
`pragma(printf)` 
 checks in ` safe` code is indeed an improvement to the language, though I
think 
 not a lot of people need this.
*I* need it :-) It's an enabling feature, in that it enables me to much more fully transition dmd away from using 0 terminated strings.
 You should probably have to mark the `printf` prototype as ` trusted` for this 
 to work though. (There are `pragma(printf)` functions that still have a 
 ` system` interface even when there is nothing wrong with the format string
and 
 arguments, e.g. `sprintf`.)
It would apply to snprintf, but not sprintf which is not fixable.
 For everyone who is not aware, here's D's existing `printf` support:
 https://dlang.org/spec/pragma.html#printf
 
 Probably `pragma(scanf)` would need to get similar treatment.
Possibly, but scanf is almost never used :-/
Jul 15
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 15 July 2024 at 18:36:07 UTC, Walter Bright wrote:
 On 7/14/2024 7:06 AM, Timon Gehr wrote:
 On 7/13/24 22:39, Walter Bright wrote:
 I propose that the compiler rewrite:

 ```
 char[] name;
 printf("name = %s\n", name);
 ```
 into:
 ```
 printf("name = %.*s\n", cast(int)name.length, name.ptr);
 ```
 (and mark any other use of %.*s as unsafe)
This part is actually not memory safe.
How is it not safe?
C23, section 7.23.6.1 ("The fprintf function"), paragraph 5:
 As noted previously, a field width, or precision, or both, may 
 be indicated
 by an asterisk. In this case, an int argument supplies the 
 field width or
 precision. [...] A negative precision argument is taken as if 
 the precision
 were omitted.
So, if the length overflows a 32-bit int, it will be ignored, and printf will read until it finds a zero byte. I suppose we could have the compiler insert a bounds check, in addition to all of the other rewrites, but at this point, it feels like we're not really calling printf at all; we're calling some other formatted-output function that's stolen printf's identity.
Jul 15
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/15/2024 12:56 PM, Paul Backus wrote:
 How is it not safe?
C23, section 7.23.6.1 ("The fprintf function"), paragraph 5:
 As noted previously, a field width, or precision, or both, may be indicated
 by an asterisk. In this case, an int argument supplies the field width or
 precision. [...] A negative precision argument is taken as if the precision
 were omitted.
So, if the length overflows a 32-bit int, it will be ignored, and printf will read until it finds a zero byte.
Huh, I missed that little gem! But there's a simple solution: ``` printf("%.*s\n", cast(int)s.length & 0x7FFF_FFFF, s.ptr); ``` Hence, it will always be a positive integer. That means one can print a maximum of 2 billion characters via printf. Like 640Kb, that ought to be enough for anyone! While failing to print the entirety of such a (suspiciously) long string, it would not be a memory safety issue.
 I suppose we could have the compiler insert a bounds check, in addition to all 
 of the other rewrites, but at this point, it feels like we're not really
calling 
 printf at all; we're calling some other formatted-output function that's
stolen 
 printf's identity.
Wrapping APIs with a better interface is what we do all the time :-/
Jul 15
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 16 July 2024 at 01:22:14 UTC, Walter Bright wrote:
 I suppose we could have the compiler insert a bounds check, in 
 addition to all of the other rewrites, but at this point, it 
 feels like we're not really calling printf at all; we're 
 calling some other formatted-output function that's stolen 
 printf's identity.
Wrapping APIs with a better interface is what we do all the time :-/
Of course. What I find objectionable in this case is that (a) the better interface is implemented using a bunch of compiler-internal rewrites, rather than normal D code; and (b) it shadows the existing C printf function rather than existing alongside it. If we need a safer printf for DMD that doesn't carry all the bloat and baggage of Phobos's writef, then by all means, let's write one. But let's write it in D and put it in a normal D module, instead of sneaking around and redefining printf behind our users' backs.
Jul 16
parent Walter Bright <newshound2 digitalmars.com> writes:
Followups: 
https://www.digitalmars.com/d/archives/digitalmars/dip/development/First_Draft_Making_printf_safe_266.html
Jul 16
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:
 The idea is printf is already largely safe:
 [...]
[` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
Jul 17
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Wednesday, 17 July 2024 at 08:58:56 UTC, IchorDev wrote:
 On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:
 The idea is printf is already largely safe:
 [...]
[` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.
Jul 17
next sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven wrote:
 On Wednesday, 17 July 2024 at 08:58:56 UTC, IchorDev wrote:
 On Saturday, 13 July 2024 at 20:39:32 UTC, Walter Bright wrote:
 The idea is printf is already largely safe:
 [...]
[` safe` functions cannot perform pointer arithmetic](https://dlang.org/spec/function.html#safe-functions), but `printf` does because it indexes a `char*` (its first parameter).
The idea is to make certain calls of `printf` safe when the first argument is a string literal: ``` char[] s; printf("%s\n", s); ``` See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.
I fail to see why there can’t be a `dprintf` that works like `printf` except that for `*.s` bound to a `const(char)[]` object, it decomposes it properly into its length and pointer component. Or, even better, use a different specifier, e.g. `%D`.
Jul 17
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven wrote:
 The idea is to make certain calls of `printf` safe when the 
 first argument is a string literal:
 ```
 char[] s;
 printf("%s\n", s);
 ```
 See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.
And the function will still perform pointer arithmetic.
Jul 17
parent reply Nick Treleaven <nick geany.org> writes:
On Wednesday, 17 July 2024 at 17:24:15 UTC, IchorDev wrote:
 On Wednesday, 17 July 2024 at 09:20:23 UTC, Nick Treleaven 
 wrote:
 The idea is to make certain calls of `printf` safe when the 
 first argument is a string literal:
 ```
 char[] s;
 printf("%s\n", s);
 ```
 See https://forum.dlang.org/post/v775k1$1tmj$1 digitalmars.com.
And the function will still perform pointer arithmetic.
So does copying a D array, but that is safe. Responding to your post in DIP development here (because that's for reviews):
 strlen assumes that s is zero-terminated
```d pragma(msg, printf) printf(const char* fmt, ...) safe; ``` What the above would mean is that `printf` is safe only when `fmt` is given a string literal. String literals are *guaranteed* to be zero-terminated, so there's no assumption of that here. If the pragma checks are not met, `printf` is actually treated as system.
 Any function that traverses a C string passed as an argument 
 can only be  system. Any function that trusts a separate 
 parameter for array bounds can only be  system.
That requires modification for this proposal. It is true when given a char* for the format parameter. But when a string literal implicitly converts to char*, it has a safe interface due to the pragma, because the literal is statically allocated and is never accessed past its allocation when called from safe code.
Jul 18
parent Nick Treleaven <nick geany.org> writes:
On Thursday, 18 July 2024 at 16:45:02 UTC, Nick Treleaven wrote:
 ```d
 pragma(msg, printf) printf(const char* fmt, ...)  safe;
 ```
I messed that up: ```d pragma(printf) extern(C) int printf(const char* fmt, ...) safe; ```
Jul 18
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Please continue this here:

https://www.digitalmars.com/d/archives/digitalmars/dip/development/First_Draft_Making_printf_safe_266.html
Jul 17