digitalmars.D - A few measurements of stat()'s speed
- Andrei Alexandrescu (30/30) Mar 26 2019 The current process of searching for imports spans the following
- H. S. Teoh (29/44) Mar 26 2019 [...]
- Andrei Alexandrescu (10/34) Mar 26 2019 Because testing takes 10 minutes and implementation takes one day or
- Jonathan Marler (44/90) Mar 26 2019 I've included a script below to generate and run a performance
- Vladimir Panteleev (34/42) Mar 26 2019 I have some related experience with this:
- Andrei Alexandrescu (12/56) Mar 26 2019 That's solid, thanks very much!
- Kagamin (3/7) Mar 28 2019 On windows it calls GetFileAttributesW.
- Bastiaan Veelo (6/11) Mar 27 2019 It could be interesting to know whether timings on Windows are
- Andrei Alexandrescu (23/34) Mar 27 2019 Really simple. Here's the C code Eduard and I used. Run it a few times
- Bastiaan Veelo (16/52) Mar 27 2019 On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I
- Bastiaan Veelo (2/4) Mar 27 2019 File system is NTFS.
The current process of searching for imports spans the following directories: * the current directory * each of the paths specified in the cmdline with -I, in that order * each of the paths specified in DFLAGS, in that order For each of these paths, first the ".di" extension is tried, then the ".d" extension. The function used is stat(). For a majority of cases, the ".di" files doesn't exist so at least 50% of stat() calls fail. The number of failed stat() calls increases with the -I flags, i.e. with the size of the project. (For std imports, that means each will be looked up two times in each of the project directories.) One alternative would be to use opendir()/readdir()/closedir() once for each directory searched, and cache the directory's contents. Then, subsequent attempts can simply look up the local cache and avoid stat() calls in directories that have been previously visited. This approach would accelerate imports if stat() is slow "enough". On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible. The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads. So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks. Andrei
Mar 26 2019
On Tue, Mar 26, 2019 at 02:06:08PM -0400, Andrei Alexandrescu via Digitalmars-d wrote: [...]On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible. The results change drastically when network mounts are tested. For sftp and sshfs mounts on a high speed local connection, one failed stat() takes 6-7 milliseconds, so an import like std.all (and many other imports liable to transitively pull others) would cause significant overheads. So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.[...] Does caching the contents of import directories cause significant overhead? If not, why not just cache it anyway, regardless of whether the import happens across network mounts. Making excessive OS roundtrips (calling stat() hundreds of times) should be reduced anyway. // On a slightly different note, why are we paying so much attention to import speeds anyway? We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo: import std.regex; void main() { version(withRegex) auto re = regex("a"); } This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile. We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere. Why does instantiating a single-character regex add FOUR SECONDS to compilation time? I think *that* is the question we should be answering. T -- People say I'm arrogant, and I'm proud of it.
Mar 26 2019
On 3/26/19 2:36 PM, H. S. Teoh wrote:Does caching the contents of import directories cause significant overhead? If not, why not just cache it anyway, regardless of whether the import happens across network mounts.Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.On a slightly different note, why are we paying so much attention to import speeds anyway?You destroy your own opening point: work should be put where there's potential for impact, not "regardless".We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo: import std.regex; void main() { version(withRegex) auto re = regex("a"); } This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile. We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere. Why does instantiating a single-character regex add FOUR SECONDS to compilation time? I think *that* is the question we should be answering.There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.
Mar 26 2019
On Tuesday, 26 March 2019 at 20:09:52 UTC, Andrei Alexandrescu wrote:On 3/26/19 2:36 PM, H. S. Teoh wrote:I've included a script below to generate and run a performance test. Save it to your box as "gen", then run "./gen" to generate then test, then "./build" to run it. I tried changing the "stat" calls to use "access" instead, but with around 70,000 system calls (found out using strace), it didn't make any noticeable difference. With "stat" it was around 2.2 seconds and was about the same with "access". So the issue is not with how much memory stat is returning, it the overhead of performing any system call. import os import stat mod_count = 1000 path_count = 20 def mkdir(dir): if not os.path.exists(dir): os.mkdir(dir) mkdir("out") for i in range(0, path_count): mkdir("out/lib{}".format(i)) mkdir("out/mods") for i in range(0, mod_count): with open("out/mods/mod{}.d".format(i), "w") as file: for j in range(0, mod_count): file.write("import mod{};\n".format(j)) with open("out/main.d", "w") as file: for i in range(0, mod_count): file.write("import mod{};\n".format(i)) file.write('void main() { import std.stdio;writeln("working"); }') with open("build", "w") as file: file.write('[ "$DMD" != "" ] || DMD=dmd\n') file.write("set -x\n") file.write("time $DMD \\\n") for i in range(0, path_count): file.write(" -I=out/lib{} \\\n".format(i)) file.write(" -I=out/mods out/main.d $ ") os.chmod("build", stat.S_IRWXU | stat.S_IRWXG | stat.S_IROTH | stat.S_IXOTH)Does caching the contents of import directories cause significant overhead? If not, why not just cache it anyway, regardless of whether the import happens across network mounts.Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.On a slightly different note, why are we paying so much attention to import speeds anyway?You destroy your own opening point: work should be put where there's potential for impact, not "regardless".We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo: import std.regex; void main() { version(withRegex) auto re = regex("a"); } This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile. We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere. Why does instantiating a single-character regex add FOUR SECONDS to compilation time? I think *that* is the question we should be answering.There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.
Mar 26 2019
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.I have some related experience with this: - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing. - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent. My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result. I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.So the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.Honestly, this sounds like you have a solution in search of a problem. [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740
Mar 26 2019
On 3/26/19 6:04 PM, Vladimir Panteleev wrote:On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:That's solid, thanks very much! What seems to be the case according to https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a bunch of "exists" are invoked (presumably those would call stat()). Then a filename is returned, which is used to create a File object, see https://github.com/dlang/dmd/blob/master/src/dmd/root/file.d. In turn, that calls open() and then fstat() again on the opened handle. Quite wasteful on the face of it, but hey if the measurable benefit is low not worth optimizing.On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.I have some related experience with this: - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing. - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent. My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result. I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.Agreed. Just looking for low-hanging fruit to pluck. AndreiSo the question is whether many projects are likely to import files over network mounts, which would motivate the optimization. Please share your thoughts, thanks.Honestly, this sounds like you have a solution in search of a problem. [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fc 62d/sys/file.d#L740
Mar 26 2019
On Wednesday, 27 March 2019 at 01:32:35 UTC, Andrei Alexandrescu wrote:What seems to be the case according to https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a bunch of "exists" are invoked (presumably those would call stat()).On windows it calls GetFileAttributesW.
Mar 28 2019
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Mar 27 2019
On 3/27/19 5:23 AM, Bastiaan Veelo wrote:On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n. #include<stdio.h> #include<string.h> #include<sys/stat.h> #include <sys/types.h> #include <fcntl.h> int main(int argc, char** argv) { size_t i; size_t n = 1000000; const char* s = "/home/user/gd/Google Photos/xyz"; //s = "/home/user/dir/xyz"; //s = "/run/user/1000/gvfs/mount/xyz"; struct stat sfile; for (i = 0; i < n; ++i) { stat(s, &sfile); } return 0; }On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Mar 27 2019
On Wednesday, 27 March 2019 at 12:06:11 UTC, Andrei Alexandrescu wrote:On 3/27/19 5:23 AM, Bastiaan Veelo wrote:On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings. Compiled with MSVC 2017, no options (cl teststat.c). Timed in PowerShell using `Measure-Command {.\teststat.exe}`. For "/home/user/gd/Google Photos/xyz", a directory that does not exist, total running time is 17 seconds (+/- 0.2). For "/Users/bastiaan/Documents/D/tests/stat/teststat.c", an existing file in a directory with two other files, total running time is a whopping 44 seconds (+/- 1.0). For "/Coin/Coin_source/src/nodes/xyz", a nonexisting file in an existing directory with 114 items, total running time is 19.5 seconds (+/- 1.0). So for me, 142 failed stats cost close to 2.8 milliseconds. Bastiaan.On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n. #include<stdio.h> #include<string.h> #include<sys/stat.h> #include <sys/types.h> #include <fcntl.h> int main(int argc, char** argv) { size_t i; size_t n = 1000000; const char* s = "/home/user/gd/Google Photos/xyz"; //s = "/home/user/dir/xyz"; //s = "/run/user/1000/gvfs/mount/xyz"; struct stat sfile; for (i = 0; i < n; ++i) { stat(s, &sfile); } return 0; }On a Linux moderately-loaded local directory (146 files) mounted from an SSD drive, one failed stat() takes only about 0.5 microseconds. That means e.g. if a module imports std.all (which fails 142 times), the overhead accountable to failed stat() calls is about 70 microseconds, i.e. negligible.It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Mar 27 2019
On Wednesday, 27 March 2019 at 22:52:09 UTC, Bastiaan Veelo wrote:On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings.File system is NTFS.
Mar 27 2019