digitalmars.D - std.file functions and embedded NUL characters [CWE-158]
- kdevel (22/22) Jul 31 Some python code:
- Doigt (11/34) Jul 31 Do you actually have a source or an example program that D
- Alex Bryan (7/49) Jul 31 D does nothing with the NULL terminator, but I think the argument
- Doigt (11/13) Jul 31 I understand the entire argument of OP. OP is not arguing
- H. S. Teoh (14/27) Jul 31 This tone is uncalled for. OP is specifically pointing the issue with
- Doigt (12/15) Jul 31 No, OP isn't doing that. OP is fishing and it's obvious:
- Richard (Rikki) Andrew Cattermole (7/22) Jul 31 This is a valid security problem with std.stdio and std.file.
- Doigt (6/30) Jul 31 you provided the proof for them and you are correct in pointing
- monkyyy (4/19) Jul 31 I bet op is a enterprise-grade security researcher; I suggest
- kdevel (15/18) Jul 31 I did not argue that D "does" something "with the null
- Doigt (14/21) Jul 31 and this is my entire problem with your entire argument. You are
- Richard (Rikki) Andrew Cattermole (8/8) Jul 31 This is quite a good example of why for PhobosV3 I want us to go through...
- monkyyy (8/11) Jul 31 whats wrong with just changing toStringz? I dont understand the
- H. S. Teoh (24/35) Jul 31 [...]
- kdevel (14/21) Jul 31 The cause of the problem is the silent truncation when the
- kdevel (13/15) Jul 31 I stand corrected. This is probably not true:
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (8/20) Aug 04 I strongly suggest to also at least distinguish between Windows and
- monkyyy (4/27) Jul 31 https://github.com/dlang/phobos/blob/205256abb1f86faf986f8c789cb733ca413...
- Steven Schveighoffer (21/24) Jul 31 Yeah, I am aware of this, because I ran into it on another
- 0xEAB (5/7) Jul 31 Technically, any function calling `toStringz()` should probably
- monkyyy (4/8) Jul 31 I bet the speed test is effectively a tie if its reasonably
- Steven Schveighoffer (5/13) Jul 31 How do you "reasonably" write a linear search?
- monkyyy (11/25) Jul 31 Lets say you have a few paragraph of text that you split by \n
- kdevel (19/25) Aug 01 This is a strawman. I am writing about file system functions!
- Steven Schveighoffer (38/63) Aug 01 I was responding to the link sent about `toStringz`. I don't
- Steven Schveighoffer (5/14) Aug 01 https://github.com/dlang/phobos/issues/10836
- kdevel (20/26) Aug 02 Pardon?
- Steven Schveighoffer (20/36) Aug 03 I mean in terms of where you apply the checks. You can say "I
- kdevel (47/73) Aug 04 I like the fragment "[...] it has to be correct before being
- Steven Schveighoffer (16/42) Aug 04 TL;DR: We can fix `tempCString` since we are always copying the
- Doigt (5/28) Jul 31 Rikki pointed out to me that I was blind. I apologize for my
Some python code: def myfun (filename): open (filename, 'w') myfun ("a\0c") which when executed behaves in an exemplary manner: Traceback (most recent call last): File "./test.py", line 6, in <module> myfun ("a\0c") File "./test.py", line 4, in myfun open (filename, 'w') TypeError: file() argument 1 must be encoded string without null bytes, not str Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.html
Jul 31
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:Some python code: def myfun (filename): open (filename, 'w') myfun ("a\0c") which when executed behaves in an exemplary manner: Traceback (most recent call last): File "./test.py", line 6, in <module> myfun ("a\0c") File "./test.py", line 4, in myfun open (filename, 'w') TypeError: file() argument 1 must be encoded string without null bytes, not str Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.htmlDo you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it. Try this code on run.dlang.io and report back with your apology: import std.stdio; void main() { writeln("Hello\0World\0!"); }
Jul 31
On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:D does nothing with the NULL terminator, but I think the argument here is that for many file operations, the filename string eventually gets converted to a C string before it is passed to an API (C stdlib? syscall?) where the null terminator is significant. It is best NOT to be rude and nasty, but if you are, you should at least understand the technical argumentSome python code: def myfun (filename): open (filename, 'w') myfun ("a\0c") which when executed behaves in an exemplary manner: Traceback (most recent call last): File "./test.py", line 6, in <module> myfun ("a\0c") File "./test.py", line 4, in myfun open (filename, 'w') TypeError: file() argument 1 must be encoded string without null bytes, not str Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.htmlDo you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it. Try this code on run.dlang.io and report back with your apology: import std.stdio; void main() { writeln("Hello\0World\0!"); }
Jul 31
On Thursday, 31 July 2025 at 21:35:50 UTC, Alex Bryan wrote:It is best NOT to be rude and nasty, but if you are, you should at least understand the technical argumentI understand the entire argument of OP. OP is not arguing anything. They are fishing. They know nothing about D. If you don't find such individuals problematic, then perhaps I should start polluting your repos with imaginary problems. I'm bound to guess one right. The solution to lazy, no research, generic, python supremacy horseshit is to reply in a way that scares them away. Like the telemarketer at the door, you slam in it in their face with a big "fuck off my lawn". If they were genuine, they wouldn't have done what they did, especially not in the way they did it. They were fishing and they hooked on a fish alright...
Jul 31
On Thu, Jul 31, 2025 at 09:25:44PM +0000, Doigt via Digitalmars-d wrote:On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:[...]This tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. For example: ``` void main() { auto fp = File("/tmp/x\0reallynow", "w"); // What's the filename of the created file? } ``` T -- Last night, I dreamed about my pet rabbits all lined up like in a parade, and hopping backwards. Then I woke up and realized that it was my receding hare line!Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.htmlDo you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it.
Jul 31
On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:This tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. TNo, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
Jul 31
On 01/08/2025 11:14 AM, Doigt wrote:On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:This is a valid security problem with std.stdio and std.file. They do not have to produce an example showing that bad behavior exists on our end, this is documented as being possible on the system API's which we wrap without validation. https://github.com/dlang/phobos/ blob/2d8ae67b396f5cc5f4633695c1c5ac67d2bf448e/std/stdio.d#L432This tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. TNo, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
Jul 31
On Thursday, 31 July 2025 at 23:20:28 UTC, Richard (Rikki) Andrew Cattermole wrote:On 01/08/2025 11:14 AM, Doigt wrote:you provided the proof for them and you are correct in pointing it out that it exists on our end. But if their guess had been wrong, it would have all wasted our time and energy. This kind of fishing without proof shouldn't be acceptableOn Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:This is a valid security problem with std.stdio and std.file. They do not have to produce an example showing that bad behavior exists on our end, this is documented as being possible on the system API's which we wrap without validation. https://github.com/dlang/phobos/ blob/2d8ae67b396f5cc5f4633695c1c5ac67d2bf448e/std/stdio.d#L432This tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. TNo, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
Jul 31
On Thursday, 31 July 2025 at 23:14:46 UTC, Doigt wrote:On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:I bet op is a enterprise-grade security researcher; I suggest light mocking when they claim rust is the future or safe/immutable is very important for game devThis tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. TNo, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
Jul 31
On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:[...] Do you actually have a source or an example program that D actually does anything with the null terminatorI did not argue that D "does" something "with the null terminator". D like perl or Ada ignores the potential occurrence of NUL in the string data when calling the underlying C functions for file system operations. import std; void main () { File ("abc", "w"); std.file.rename ("abc", "ab\0c"); } Instead of throwing as the python code does this D program leaves a file named "ab" in the filesystem.
Jul 31
On Thursday, 31 July 2025 at 21:55:03 UTC, kdevel wrote:On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:and this is my entire problem with your entire argument. You are generalizing based on preconceived notions. You are guessing. You aren't actually informed about D. You don't really care what D does and how we program in D. You immediately tout python's superiority in our faces. You impose a solution just because python does it. What about trying out D for real? Have you programmed in D even once? Do you know what is the stance of the community on exceptions? Do you know how we usually (do or don't) handle the null terminator? No, you don't. You're fishing and you have two big ones on your line, but that doesn't mean I'll bite. Like barging into someone's home suddenly without ever meeting them before and demanding they do things for you even though you don't even care that they do it.[...] Do you actually have a source or an example program that D actually does anything with the null terminatorI did not argue that D "does" something "with the null terminator". D like perl or Ada
Jul 31
This is quite a good example of why for PhobosV3 I want us to go through a FilePath abstraction, rather than accepting random strings for file names. This is indeed a security vulnerability, but it isn't on D's side. All system API's take in a null terminated string, when it should've been pointer + length. If someone has a problem with this currently, you can call ``isValidPath`` in ``std.path``, which will check for the null character. https://dlang.org/phobos/std_path.html#isValidPath
Jul 31
On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew Cattermole wrote:This is quite a good example of why for PhobosV3 I want us to go through a FilePath abstraction, rather than accepting random strings for file names.whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char b) replace null with something c) (bad idea but youll love it) assert when detecting extra null
Jul 31
On Thu, Jul 31, 2025 at 10:27:56PM +0000, monkyyy via Digitalmars-d wrote:On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew Cattermole wrote:[...] This is precisely what it currently does, and where the problem is. Consider: you have user input specifying a filename, and your code checks to make sure it doesn't overwrite a file it's not supposed to. As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all. Another example: maybe you have a database management program and your user specifies a filename to save the database, and you keep an AA of previously-used names which you check, so that the user doesn't stomp over another user's database file. Now consider what happens if user A inputs "database\01.db" and user B inputs "database\02.db". The OS considers the NUL as the end of the filename, so user A's file is actually "database", and so is user B's file, so user B can now overwrite user A's data. These are just some mild cases. I'm pretty sure if you think hard enough, you'll be able to come up with something that will open a security hole by exploiting the fact that what the D code thinks is the filename is different from what the OS thinks is the filename. T -- May you live all the days of your life. -- Jonathan SwiftThis is quite a good example of why for PhobosV3 I want us to go through a FilePath abstraction, rather than accepting random strings for file names.whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char
Jul 31
On Thursday, 31 July 2025 at 23:27:42 UTC, H. S. Teoh wrote:On Thu, Jul 31, 2025 at 10:27:56PM +0000, monkyyy via Digitalmars-d wrote:~~I think these are extremely contrived, passwd is the os's responsibly and I expect all oses have it behind root permissions; also backups~~ swapping toStringz to option 2 then? I think adr's FilePath may come from stealing code from rikki, and in thoery what rikki is suggesting is I call `File(FilePath("foo"))` I didnt like figuring it out; I wont actually store a filepath, I will write string then do whatever the error messages and instinct says will make the code I wrote compile If `"passwd\0hi".toStringz == &"passwd__NULL__DONT_DO_THAThi"[0]` it wont match `passwd`; whats the `file(filepath)` abstraction gaining over option 2 or 3On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew Cattermole wrote:[...] This is precisely what it currently does, and where the problem is. Consider: you have user input specifying a filename, and your code checks to make sure it doesn't overwrite a file it's not supposed to. As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all. Another example: maybe you have a database management program and your user specifies a filename to save the database, and you keep an AA of previously-used names which you check, so that the user doesn't stomp over another user's database file. Now consider what happens if user A inputs "database\01.db" and user B inputs "database\02.db". The OS considers the NUL as the end of the filename, so user A's file is actually "database", and so is user B's file, so user B can now overwrite user A's data. These are just some mild cases. I'm pretty sure if you think hard enough, you'll be able to come up with something that will open a security hole by exploiting the fact that what the D code thinks is the filename is different from what the OS thinks is the filename. TThis is quite a good example of why for PhobosV3 I want us to go through a FilePath abstraction, rather than accepting random strings for file names.whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char
Jul 31
On Thursday, 31 July 2025 at 23:27:42 UTC, H. S. Teoh wrote:As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all.If you need path validation, you probably will do more checks than null check, stdio won't cut it. See how path validation vulnerabilities work IRL: https://github.com/dagster-io/dagster/pull/30002
Aug 04
On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew Cattermole wrote:This is indeed a security vulnerability, but it isn't on D's side. All system API's take in a null terminated string, when it should've been pointer + length.The cause of the problem is the silent truncation when the conversion from D-string to C-const char * takes place. This is not a problem of the API which is in the case of POSIX is totally simple (compared to all the VMS/Windows APIs). There must be a reason why C++ uses const char* in its filesystem functions [1].If someone has a problem with this currently, you can call ``isValidPath`` in ``std.path``, which will check for the null character.Unfortunately isValidPath also flags perfectly legal filenames as invalid, e.g. such containing invalid UTF-8 sequences. [1] https://en.cppreference.com/w/cpp/header/iostream.html
Jul 31
On Thursday, 31 July 2025 at 22:38:41 UTC, kdevel wrote:Unfortunately isValidPath also flags perfectly legal filenames as invalid, e.g. such containing invalid UTF-8 sequences.I stand corrected. This is probably not true: ivp.d: ``` import std; void main (string [] args) { auto input = "a\xfc"; writeln (input, ": ", isValidPath (input)); } ``` $ ./ivp [latin9 mode console] aü: true
Jul 31
Am 31.07.25 um 23:47 schrieb Richard (Rikki) Andrew Cattermole:This is quite a good example of why for PhobosV3 I want us to go through a FilePath abstraction, rather than accepting random strings for file names. This is indeed a security vulnerability, but it isn't on D's side. All system API's take in a null terminated string, when it should've been pointer + length. If someone has a problem with this currently, you can call ``isValidPath`` in ``std.path``, which will check for the null character. https://dlang.org/phobos/std_path.html#isValidPathI strongly suggest to also at least distinguish between Windows and Posix path formats to avoid ambiguity issues. vibe.core.path defines `WindowsPath`, `PosixPath` and `InetPath`, as well as an OS dependent alias `NativePath` (aliases to either `WindowsPath` or `PosixPath`) to avoid having to think about this all the time. However, conversions between path formats are always explicit, with the appropriate validation taking place.
Aug 04
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:Some python code: def myfun (filename): open (filename, 'w') myfun ("a\0c") which when executed behaves in an exemplary manner: Traceback (most recent call last): File "./test.py", line 6, in <module> myfun ("a\0c") File "./test.py", line 4, in myfun open (filename, 'w') TypeError: file() argument 1 must be encoded string without null bytes, not str Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.htmlhttps://github.com/dlang/phobos/blob/205256abb1f86faf986f8c789cb733ca4137246e/std/string.d#L368 are you sure? Im not entirely sure if this will always trigger but theres asserts here
Jul 31
On Thursday, 31 July 2025 at 22:36:42 UTC, monkyyy wrote:https://github.com/dlang/phobos/blob/205256abb1f86faf986f8c789cb733ca4137246e/std/string.d#L368 are you sure? Im not entirely sure if this will always trigger but theres asserts hereYeah, I am aware of this, because I ran into it on another project. (https://github.com/snazzy-d/sdc/pull/410) The issue is, that phobos is compiled without asserts. The OP's issue is looking at this the wrong way. The responsibility is on the user to validate their input. I don't know that the library needs to do this. Even if we did, core.stdc.stdio is still there, and we can't change that. In other words, the assert means *it's on the caller* to make sure they don't pass in a string with 0 terminators in it. The fact it is an assert is a clue that this is a programming error, not a validation error. If we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?") What we need in phobos v3 is a set of validating features for user input so we give the user the tools necessary to do this correctly. -Steve
Jul 31
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:The OP's issue is looking at this the wrong way. The responsibility is on the user to validate their input.Technically, any function calling `toStringz()` should probably inherit that warning from the doc comment of `toStringz` into its own documentation.
Jul 31
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:If we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessaryI bet the speed test is effectively a tie if its reasonably written
Jul 31
On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:How do you "reasonably" write a linear search? Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation. -SteveIf we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessaryI bet the speed test is effectively a tie if its reasonably written
Jul 31
On Friday, 1 August 2025 at 00:48:36 UTC, Steven Schveighoffer wrote:On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:Lets say you have a few paragraph of text that you split by \n then you call toStringz to pass it to raylib(and you pay attention to the immutablity of strings and not replace \n with null) You can do better with char arrays and different data structures but the current api of the sorta maybe kinda dynamic array that airnt special case chars, immutablity, and c liking null termination all combine together to mean its probably making a copy. During that copy you could check.On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:How do you "reasonably" write a linear search? Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation. -SteveIf we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessaryI bet the speed test is effectively a tie if its reasonably written
Jul 31
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:[...] If we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?")This is a strawman. I am writing about file system functions! Furthermode in the case of `std.file.rename` `toStringz` is not even called. What is used to convert to `char *` seems to be the highly interesting template `tempCString` in `std.internal.cstring`. And of course a library should not assert, nor exit nor ignore the error, but make it handleable: ``` def myfun (filename): open (filename, 'w') try: myfun ("a\0c") except TypeError: print ("error occurred") ```
Aug 01
On Friday, 1 August 2025 at 11:02:37 UTC, kdevel wrote:On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer wrote:I was responding to the link sent about `toStringz`. I don't think we can always check for internal 0 characters there. But in `tempCString`, we can, since we are always copying. However, this is delicate, because you will affect performance. A slice assign is `memcpy`, and `memcpy` is damn fast (and hard to reimplement). If instead you check every character, you will change to a for loop, which will be slow. I think the right answer here is to use [strncpy](https://cplusplus.com/reference/cstring/strncpy/). According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied. So this really is a limitation of `tempCString`, and we can fix that. However, std.file could just have easily used `toStringz`. In terms of correctness, you are passing in a parameter that has different properties than the one ultimately sent to the underlying call. It's always a fight between correctness and performance here. I always prefer (if it is available) an API that does not rely on zero termination, but when it is not available, it needs to be checked. Where it is checked is important. I'm not sure that's the library's responsibility. In other words, if I pass in `rmdir("foo")`, then why should I pay the penalty of examining `"foo"` for malicious NUL bytes? The source is a literal, I can know whether it has them or not. In fact, there isn't even a way to pass in a `char*` to `rmdir`, which is unfortunate. We have the capability of making an API that allows for all cases -- user input, static data, validated data, actual C strings, etc. We shouldn't make all these go through the same super-defensive gauntlet.[...] If we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?")This is a strawman. I am writing about file system functions! Furthermode in the case of `std.file.rename` `toStringz` is not even called. What is used to convert to `char *` seems to be the highly interesting template `tempCString` in `std.internal.cstring`.And of course a library should not assert, nor exit nor ignore the error, but make it handleable: ``` def myfun (filename): open (filename, 'w') try: myfun ("a\0c") except TypeError: print ("error occurred") ```Python is not D. It cannot do any kind of type introspection, and char* just isn't a thing, so it has to always go through the same path. We don't want to make D as slow as python. -Steve
Aug 01
On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:If instead you check every character, you will change to a for loop, which will be slow. I think the right answer here is to use [strncpy](https://cplusplus.com/reference/cstring/strncpy/). According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied.https://github.com/dlang/phobos/issues/10836 Would be quite an easy fix if someone wants to tackle it. -Steve
Aug 01
On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:[...] It's always a fight between correctness and performance here.Pardon?[...] I'm not sure that's the library's responsibility. In other words, if I pass in `rmdir("foo")`, then why should I pay the penalty of examining `"foo"` for malicious NUL bytes?The `NUL` is not representable in `char *`. But all phobos filesystem functions use unadorned `string` parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles? If the library expects pathnames without embedded `NUL`s I would create a subtype of `string`, say `fstring` or `cstring`. One then has to discuss which of these calls ``` rmdir ("a\0b"); //with the technique from [1] rmdir (fstring ("a\0b")); rmdir (cast (fstring) ("a\0b")); ``` shall compile and what one expects as runtime behavior. [1] *Implicit type conversion of an argument when a function is called* https://forum.dlang.org/thread/agstjpezerwlgdhphclk forum.dlang.org
Aug 02
On Sunday, 3 August 2025 at 00:03:47 UTC, kdevel wrote:On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:I mean in terms of where you apply the checks. You can say "I expect you to have validated this before sending it in", or you can say "I will always validate this, even if you already have, in the name of correctness." And I suppose a better word should have been "tradeoff" and not "fight".[...] It's always a fight between correctness and performance here.Pardon?The issue is going to be fixed, not sure if you saw my issue report. It's quite an easy fix actually. But just so you know, C also allows passing in C strings with embedded null characters: ```c #include <unistd.h> int main() { rmdir("hello\0world"); return 0; } ``` And we do expose `core.stdc.posix.unistd`. -Steve[...] I'm not sure that's the library's responsibility. In other words, if I pass in `rmdir("foo")`, then why should I pay the penalty of examining `"foo"` for malicious NUL bytes?The `NUL` is not representable in `char *`. But all phobos filesystem functions use unadorned `string` parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles? If the library expects pathnames without embedded `NUL`s I would create a subtype of `string`, say `fstring` or `cstring`. One then has to discuss which of these calls
Aug 03
On Sunday, 3 August 2025 at 21:01:05 UTC, Steven Schveighoffer wrote:On Sunday, 3 August 2025 at 00:03:47 UTC, kdevel wrote:I like the fragment "[...] it has to be correct before being fast."On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer wrote:I mean in terms of where you apply the checks. You can say "I expect you to have validated this before sending it in", or you can say "I will always validate this, even if you already have, in the name of correctness." And I suppose a better word should have been "tradeoff" and not "fight".[...] It's always a fight between correctness and performance here.Pardon?[...] The issue is going to be fixed, not sure if you saw my issue report. It's quite an easy fix actually.Thanks but sorry I could not yet take a look.But just so you know, C also allows passing in C strings with embedded null characters: ```c #include <unistd.h> int main() { rmdir("hello\0world"); return 0; } ```On the one hand there is no such thing as a C string with embedded `NUL` (because the `NUL` terminates the string). On the other hand the `rmdir` function takes `const char *pathname`. Your code contains an array argument (`"hello\0world"`). According to C's rules this array decays into a pointer *before* `rmdir` is invoked. Thus one can hardly say that a string with embedded `NUL` is passed to the function.And we do expose `core.stdc.posix.unistd`.Yes. But in the D-function in question is `std.file.rmdir`. I copied the relevant code: ```D import std.stdio : writeln; import std.range : isSomeFiniteCharInputRange; import std.traits : isConvertibleToString; void rmdir (R) (R pathname) if (isSomeFiniteCharInputRange!R && !isConvertibleToString!R) { writeln (__PRETTY_FUNCTION__); writeln (typeid (pathname)); writeln (pathname.length); writeln (pathname); } void rmdir(R)(auto ref R pathname) if (isConvertibleToString!R) { writeln (__PRETTY_FUNCTION__); writeln (typeid (pathname)); writeln (pathname.length); writeln (pathname); } void main () { rmdir ("a\0b"); } ``` which prints ``` void rmdir.rmdir!string.rmdir(string pathname) immutable(char)[] 3 ab <--- there is a NUL between a and b ``` I.e. the function really takes the string with the embedded `NUL`.
Aug 04
On Tuesday, 5 August 2025 at 02:47:22 UTC, kdevel wrote:On Sunday, 3 August 2025 at 21:01:05 UTC, Steven Schveighoffer wrote:TL;DR: We can fix `tempCString` since we are always copying the whole thing. So the issue is moot at that point.[...] The issue is going to be fixed, not sure if you saw my issue report. It's quite an easy fix actually.Thanks but sorry I could not yet take a look.C does not give me an error for this code. Regardless of what it "technically" is doing, you have the same issue there -- C does not warn you or error on this concern. So it is identically vulnerable to your test case.But just so you know, C also allows passing in C strings with embedded null characters: ```c #include <unistd.h> int main() { rmdir("hello\0world"); return 0; } ```On the one hand there is no such thing as a C string with embedded `NUL` (because the `NUL` terminates the string). On the other hand the `rmdir` function takes `const char *pathname`. Your code contains an array argument (`"hello\0world"`). According to C's rules this array decays into a pointer *before* `rmdir` is invoked. Thus one can hardly say that a string with embedded `NUL` is passed to the function.C already needs to caution people not to call this function with embedded null string array. D can also do the same thing. My point is, I don't imagine we need to "fix" the unistd binding. Again, who's responsible for verification is a tradeoff. In this case, we have an obvious answer for std.file.rmdir, because we can fix `tempCString`. But it also could be solved by just identifying what happens if you pass in such a string, and put the onus on the user to check. -SteveAnd we do expose `core.stdc.posix.unistd`.Yes. But in the D-function in question is `std.file.rmdir`.
Aug 04
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:Some python code: def myfun (filename): open (filename, 'w') myfun ("a\0c") which when executed behaves in an exemplary manner: Traceback (most recent call last): File "./test.py", line 6, in <module> myfun ("a\0c") File "./test.py", line 4, in myfun open (filename, 'w') TypeError: file() argument 1 must be encoded string without null bytes, not str Other languages like D, perl or even Ada seem to let the embedded NUL character silently truncate the filename. This poses a considerable risk when the input to std.file functions is not controlled by the program author [1]. E.g. rmdirRecurse ("/\0/home/user/subdir"); [1] https://cwe.mitre.org/data/definitions/158.htmlRikki pointed out to me that I was blind. I apologize for my tone. You did provide that code that proved your assertion. I was wrong to say what I said and I take back everything I said. I will now retire myself from this discussion.
Jul 31