digitalmars.D.learn - dirEntries removes entire branches of empty directories
- =?UTF-8?Q?Ali_=c3=87ehreli?= (25/25) Nov 09 2022 In case it matters, the file system is ext4.
- Vladimir Panteleev (37/39) Nov 09 2022 That's not what happens for me:
- =?UTF-8?Q?Ali_=c3=87ehreli?= (8/12) Nov 10 2022 Does not happen for me today either. (?) I must have confused myself
- Imperatorn (2/17) Nov 10 2022 Oh, did you run the program on Wednesday? Fool!
- H. S. Teoh (7/27) Nov 10 2022 I think it was because yesterday MSFT stock dipped, but today it rose by
- kdevel (19/25) Nov 09 2022 Was say strace/ltrace?
- Imperatorn (3/6) Nov 09 2022 That's not the behaviour I get in Windows.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/6) Nov 09 2022 struct DirIteratorImpl has different implementations for Windows, etc.
- Imperatorn (2/10) Nov 09 2022 Anyway, it's definitely a bug in that implementation
- =?UTF-8?Q?Ali_=c3=87ehreli?= (10/11) Nov 09 2022 Me, me, me! :) I've learned about the Posix function 'nftw' (but I am
- Imperatorn (2/7) Nov 09 2022 👻
- =?UTF-8?Q?Ali_=c3=87ehreli?= (127/128) Nov 10 2022 Now that we know that dirEntries works properly, I decided not to use ft...
- kdevel (43/64) Nov 11 2022 dmd -O compiled patched (see below!) version applied to /usr/bin
- =?UTF-8?Q?Ali_=c3=87ehreli?= (27/47) Nov 11 2022 Great. I did not use -O with my test. It may have to do something with
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/5) Nov 11 2022 I meant "the reason you got a much better improvement" may have to do
- kdevel (9/19) Nov 14 2022 It has to do with the large number of symlinks. When I use
- kdevel (57/62) Nov 14 2022 When I examine the process with strace it appears that the ftw
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/5) Nov 26 2022 Created an enhancement request:
- Vladimir Panteleev (5/6) Nov 29 2022 Yes, `dirEntries` isn't as fast as it could be.
In case it matters, the file system is ext4. 1) Create a directory: mkdir deleteme and then run the following program: import std; void main() { foreach (e; dirEntries(absolutePath("./deleteme"), SpanMode.breadth)) { writeln(e.name); } } Understandably, the top level directory 'deleteme' will not be printed. 2) Make a sub-directory: mkdir deleteme/a Running the program shows no output; 'a' is not visited as a directory entry. 3) Create a file inside the sub-directory: touch deleteme/a/x Now the program will show 2 entries; the branch is accessible: /home/ali/d/./deleteme/a /home/ali/d/./deleteme/a/x Imagine a program that wants to make sure the directory structure is intact, even the empty directories should exist. Can you think of a workaround to achieve that? Do you think this is buggy behavior for dirEntries? Ali
Nov 09 2022
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:Running the program shows no output; 'a' is not visited as a directory entry.That's not what happens for me: ```d import std.exception; import std.file; import std.path; import std.stdio; void ls() { foreach (e; dirEntries(absolutePath("./deleteme"), SpanMode.breadth)) { writeln(e.name); } } void main() { "./deleteme".rmdirRecurse.collectException; "./deleteme".mkdir(); writeln("empty"); ls(); writeln("only a directory"); mkdir("./deleteme/a"); ls(); writeln("directory and file"); std.file.write("./deleteme/a/x", ""); ls(); } ``` Locally and on run.dlang.io I get: ``` empty only a directory /sandbox/./deleteme/a directory and file /sandbox/./deleteme/a /sandbox/./deleteme/a/x ```
Nov 09 2022
On 11/9/22 11:30, Vladimir Panteleev wrote:On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:Does not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"Running the program shows no output; 'a' is not visited as a directory entry.That's not what happens for me:
Nov 10 2022
On Thursday, 10 November 2022 at 16:34:53 UTC, Ali Çehreli wrote:On 11/9/22 11:30, Vladimir Panteleev wrote:Oh, did you run the program on Wednesday? Fool!On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreliwrote:directoryRunning the program shows no output; 'a' is not visited as aDoes not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"entry.That's not what happens for me:
Nov 10 2022
On Thu, Nov 10, 2022 at 07:07:33PM +0000, Imperatorn via Digitalmars-d-learn wrote:On Thursday, 10 November 2022 at 16:34:53 UTC, Ali Çehreli wrote:I think it was because yesterday MSFT stock dipped, but today it rose by 15, so Windows is working properly again. :-P T -- "You are a very disagreeable person." "NO."On 11/9/22 11:30, Vladimir Panteleev wrote:Oh, did you run the program on Wednesday? Fool!On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:Does not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"Running the program shows no output; 'a' is not visited as a directory entry.That's not what happens for me:
Nov 10 2022
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:In case it matters, the file system is ext4.My code runs in tmp (tmpfs).2) Make a sub-directory: mkdir deleteme/a Running the program shows no output; 'a' is not visited as a directory entry.Was say strace/ltrace? ```didi.d import std.stdio; import std.file; void main (string [] args) { auto de = dirEntries (args[1], SpanMode.breadth); foreach (e; de) writeln(e.name); } ``` ``` $ mkdir -p deleteme/a $ dmd didi $ ./didi deleteme deleteme/aDo you think this is buggy behavior for dirEntries?Sure.
Nov 09 2022
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:In case it matters, the file system is ext4. 1) Create a directory: [...]That's not the behaviour I get in Windows. When I create the subdirectory, I see it even if it's empty
Nov 09 2022
On 11/9/22 11:48, Imperatorn wrote:That's not the behaviour I get in Windows.Windows users deserve it! :p (At least it is better in this case. :) )When I create the subdirectory, I see it even if it's emptystruct DirIteratorImpl has different implementations for Windows, etc. Ali
Nov 09 2022
On Wednesday, 9 November 2022 at 19:59:57 UTC, Ali Çehreli wrote:On 11/9/22 11:48, Imperatorn wrote:Anyway, it's definitely a bug in that implementationThat's not the behaviour I get in Windows.Windows users deserve it! :p (At least it is better in this case. :) )When I create the subdirectory, I see it even if it's emptystruct DirIteratorImpl has different implementations for Windows, etc. Ali
Nov 09 2022
On 11/9/22 11:05, Ali Çehreli wrote:Can you think of a workaround to achieve that?Me, me, me! :) I've learned about the Posix function 'nftw' (but I am using its sibling 'ftw'). It was pretty easy to use but there is a quality issue there: They failed to support a 'void*' context for the user! You can walk the tree but can't put the results into your local context! Boo! I guess it was designed by someone who is happy with global variables. :) At least D makes it easy to guard access to module variables with 'synchronized', shared, etc. Ali
Nov 09 2022
On Wednesday, 9 November 2022 at 20:06:15 UTC, Ali Çehreli wrote:On 11/9/22 11:05, Ali Çehreli wrote: It was pretty easy to use but there is a quality issue there: They failed to support a 'void*' context for the user! You can walk the tree but can't put the results into your local context! Boo!👻
Nov 09 2022
On 11/9/22 12:06, Ali Çehreli wrote:I am using its sibling 'ftw'Now that we know that dirEntries works properly, I decided not to use ftw. However, ftw performs about twice as fast as dirEntries (despite some common code in the implementation below). I am leaving it here in case somebody finds it useful. (Why don't I put it on github then; ok, some day I will.) import core.sys.posix.sys.stat; import std.algorithm; import std.exception; import std.file; import std.path; import std.range; import std.string; // The Posix "file tree walker" function extern (C) int ftw(const char *dirpath, int function (const char *fpath, const stat_t *sb, int typeflag) fn, int nopenfd); enum TypeFlag { FTW_F, // regular file FTW_D, // directory // See 'man nftw' or /usr/include/ftw.h for the other values } struct DirectoryEntry { string name; ulong size; } struct WalkResult { DirectoryEntry[] entries; string[] emptyDirs; } WalkResult directoryWalk_ftw(string root) { WalkResult impl_() { // These have to be 'static' because ftw() does not allow us to pass a // context. And that's why this function must only be called from a // synchronized block. static DirectoryEntry[] entries; static string[] dirs; entries.length = 0; entries.assumeSafeAppend(); dirs.length = 0; dirs.assumeSafeAppend(); // This is the callback that ftw() uses. extern (C) int handler(const char *fpath, const stat_t *sb, int typeflag) { const path = fpath.fromStringz.idup; switch (typeflag) { case TypeFlag.FTW_F: entries ~= DirectoryEntry(path, sb.st_size); break; case TypeFlag.FTW_D: dirs ~= path; break; default: import std.stdio; writefln!"Ignoring type %s file: %s\n(See 'man nftw')b"( path, typeflag); break; } return 0; } // The tree walk will be faster up-to this "search depth" (See 'man nftw') enum nopenfd = 32; const ret = ftw(root.toStringz, &handler, nopenfd); enforce(ret == 0, format!"Failed walking the directory tree at %s; error: %s"( root, ret)); string[] nonEmptyDirs = chain(entries.map!(e => e.name), dirs) .map!dirName .array .sort .uniq .array; sort(dirs); string[] emptyDirs = setDifference(dirs, nonEmptyDirs) .array; return WalkResult(entries.dup, emptyDirs); } synchronized { return impl_(); } } WalkResult directoryWalk_dirEntries(string root) { DirectoryEntry[] entries; string[] dirs; foreach (entry; dirEntries(root, SpanMode.depth)) { if (entry.isDir) { dirs ~= entry; } else { entries ~= DirectoryEntry(entry, entry.getSize); } } string[] nonEmptyDirs = chain(entries.map!(e => e.name), dirs) .map!dirName .array .sort .uniq .array; sort(dirs); string[] emptyDirs = setDifference(dirs, nonEmptyDirs) .array; return WalkResult(entries.dup, emptyDirs); } int main(string[] args) { import std.datetime.stopwatch; import std.stdio; import std.path; if (args.length != 2) { stderr.writefln!"Please provide the directory to walk:\n\n %s <directory>\n" (args[0].baseName); return 1; } const dir = buildNormalizedPath("/home/ali/dlang"); auto timings = benchmark!({ directoryWalk_ftw(dir); }, { directoryWalk_dirEntries(dir); })(10); writefln!("ftw : %s\n" ~ "dirEntries: %s")(timings[0], timings[1]); return 0; } Ali
Nov 10 2022
On Thursday, 10 November 2022 at 21:27:28 UTC, Ali Çehreli wrote:On 11/9/22 12:06, Ali Çehreli wrote:dmd -O compiled patched (see below!) version applied to /usr/bin on my desktop yields: ftw : 363 ms, 750 ÎŒs, and 5 [*] dirEntries: 18 secs, 831 ms, 738 ÎŒs, and 3 [*] (* = offending units removed)I am using its sibling 'ftw'Now that we know that dirEntries works properly, I decided not to use ftw. However, ftw performs about twice as fast as dirEntries (despite some common code in the implementation below).[...] foreach (entry; dirEntries(root, SpanMode.depth)) { if (entry.isDir) { dirs ~= entry; } else { entries ~= DirectoryEntry(entry, entry.getSize); }strace reports that entry.getSize invokes stat on the file a second time. Isn't the stat buf saved in the entry? This also gives rise for a complication with symlinks pointing to the directory which contain them: $ pwd /tmp/k/sub $ ln -s . foo $ ../direntrybenchmark . std.file.FileException 8[...]/linux/bin64/../../src/phobos/std/file.d(1150): ./foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/ oo/foo/foo/foo/foo: Too many levels of symbolic links [...][...] if (args.length != 2) { stderr.writefln!"Please provide the directory to walk:\n\n %s <directory>\n" (args[0].baseName); return 1; } const dir = buildNormalizedPath("/home/ali/dlang");diff --git a/direntrybenchmark.d b/direntrybenchmark.d index 661df51..a9a5616 100644 --- a/direntrybenchmark.d +++ b/direntrybenchmark.d -102,8 +102,9 WalkResult directoryWalk_dirEntries(string root) { if (entry.isDir) { dirs ~= entry; - } else { - entries ~= DirectoryEntry(entry, entry.getSize); + } + else { + entries ~= DirectoryEntry(entry, 0); } } -133,7 +134,7 int main(string[] args) { return 1; } - const dir = buildNormalizedPath("/home/ali/dlang"); + const dir = buildNormalizedPath(args[1]); auto timings = benchmark!({ directoryWalk_ftw(dir); }, { directoryWalk_dirEntries(dir); })(10);
Nov 11 2022
On 11/11/22 05:13, kdevel wrote:dmd -O compiled patched (see below!) version applied to /usr/bin on my desktop yields: ftw : 363 ms, 750 ÃŽÅ’s, and 5 [*] dirEntries: 18 secs, 831 ms, 738 ÃŽÅ’s, and 3 [*]Great. I did not use -O with my test. It may have to do something with the performance of the hard disk. ftw wins big time. Being just a D binding of a C library function, its compilation should be quick too.That's my bad. entry.size is the cached version of the file size.entries ~= DirectoryEntry(entry, entry.getSize); }strace reports that entry.getSize invokes stat on the file a second time. Isn't the stat buf saved in the entry?This also gives rise for a complication with symlinks pointing to the directory which contain them: $ pwd /tmp/k/sub $ ln -s . foo $ ../direntrybenchmark .std.file.FileException 8[...]/linux/bin64/../../src/phobos/std/file.d(1150): ./foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/ oo/foo/foo/foo/foo: Too many levels of symbolic links So, ftw does not have that problem? Perhaps because of its default symlink behavior? There is also the more capable nftw, where the caller can specify some flags. And yes, there it is: FTW_PHYS If set, do not follow symbolic links. (This is what you want.) If not set, symbolic links are followed, but no file is reported twice. If FTW_PHYS is not set, but FTW_DEPTH is set, then the function fn() is never called for a directory that would be a descendant of itself.- const dir = buildNormalizedPath("/home/ali/dlang"); + const dir = buildNormalizedPath(args[1]);That one, and I had switched the arguments on the following call. One more example where string interpolation would be useful: writefln!"Ignoring type %s file: %s\n(See 'man nftw')b"( path, typeflag); I meant the arguments in the reverse order there. OT: And there is a 'b' character at the end of that format string which almost certainly appeared when I botched a Ctrl-b command in my editor. :) Ali
Nov 11 2022
On 11/11/22 08:00, Ali Çehreli wrote:It may have to do something with the performance of the hard disk.I meant "the reason you got a much better improvement" may have to do something with the performance differences of your hard disk and mine. Ali
Nov 11 2022
On Friday, 11 November 2022 at 16:00:12 UTC, Ali Çehreli wrote:On 11/11/22 05:13, kdevel wrote:It has to do with the large number of symlinks. When I use dirEntries(root, SpanMode.depth, false) the runtime is dramatically reduced and with entries ~= DirectoryEntry(entry, entry.size); the runtimes are ftw : 98 ms, 470 ÎŒs, and 2 *beeep* dirEntries: 170 ms, 515 ÎŒs, and 2 *beeep* (to be continued)dmd -O compiled patched (see below!) version applied to/usr/bin on mydesktop yields: ftw : 363 ms, 750 ÎŒs, and 5 [*] dirEntries: 18 secs, 831 ms, 738 ÎŒs, and 3 [*]Great. I did not use -O with my test. It may have to do something with the performance of the hard disk.
Nov 14 2022
On Monday, 14 November 2022 at 21:05:01 UTC, kdevel wrote:[...] the runtimes are ftw : 98 ms, 470 ÃŽÅ’s, and 2 *beeep* dirEntries: 170 ms, 515 ÃŽÅ’s, and 2 *beeep* (to be continued)When I examine the process with strace it appears that the ftw version gets the whole information from readdir alone. The dirEntries version seems to call lstat on every file (in order to check that it is not a symlink) Breakpoint 1, 0xf7cc59d4 in lstat64 () from [...]gcc-12.1/lib/libgphobos.so.3 (gdb) bt [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 (root=..., dump=false) at direntrybenchmark.d:111 and after that an additional stat on the same file in order to check if it is a directory: Breakpoint 2, 0xf7cc5954 in stat64 () from [...]gcc-12.1/lib/libgphobos.so.3 (gdb) bt from [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 (root=..., dump=<optimized out>) at direntrybenchmark.d:112 direntrybenchmark.d:158 at /md11/sda2-usr2l/gcc-12.1/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/d/std/datetime/stopwatch.d:421 __applyArg1=...) at direntrybenchmark.d:162 [...]gcc-12.1/lib/libgphobos.so.3 direntrybenchmark.d:161
Nov 14 2022
On 11/14/22 14:41, kdevel wrote:the ftw version gets the whole information from readdir alone.Created an enhancement request: https://issues.dlang.org/show_bug.cgi?id=23512 Ali
Nov 26 2022
On Thursday, 10 November 2022 at 21:27:28 UTC, Ali Çehreli wrote:However, ftw performs about twice as fast as dirEntriesYes, `dirEntries` isn't as fast as it could be. Here is a directory iterator which tries to strictly not do more work than what it must: https://github.com/CyberShadow/ae/blob/86b016fd258ebc26f0da3239a6332c4ebecd3215/sys/file.d#L178
Nov 29 2022