www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A few measurements of stat()'s speed

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
The current process of searching for imports spans the following 
directories:

* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order

For each of these paths, first the ".di" extension is tried, then the 
".d" extension. The function used is stat(). For a majority of cases, 
the ".di" files doesn't exist so at least 50% of stat() calls fail. The 
number of failed stat() calls increases with the -I flags, i.e. with the 
size of the project. (For std imports, that means each will be looked up 
two times in each of the project directories.)

One alternative would be to use opendir()/readdir()/closedir() once for 
each directory searched, and cache the directory's contents. Then, 
subsequent attempts can simply look up the local cache and avoid stat() 
calls in directories that have been previously visited. This approach 
would accelerate imports if stat() is slow "enough".

On a Linux moderately-loaded local directory (146 files) mounted from an 
SSD drive, one failed stat() takes only about 0.5 microseconds. That 
means e.g. if a module imports std.all (which fails 142 times), the 
overhead accountable to failed stat() calls is about 70 microseconds, 
i.e. negligible.

The results change drastically when network mounts are tested. For sftp 
and sshfs mounts on a high speed local connection, one failed stat() 
takes 6-7 milliseconds, so an import like std.all (and many other 
imports liable to transitively pull others) would cause significant 
overheads.

So the question is whether many projects are likely to import files over 
network mounts, which would motivate the optimization. Please share your 
thoughts, thanks.


Andrei
Mar 26
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Mar 26, 2019 at 02:06:08PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 On a Linux moderately-loaded local directory (146 files) mounted from
 an SSD drive, one failed stat() takes only about 0.5 microseconds.
 That means e.g.  if a module imports std.all (which fails 142 times),
 the overhead accountable to failed stat() calls is about 70
 microseconds, i.e. negligible.
 
 The results change drastically when network mounts are tested. For
 sftp and sshfs mounts on a high speed local connection, one failed
 stat() takes 6-7 milliseconds, so an import like std.all (and many
 other imports liable to transitively pull others) would cause
 significant overheads.
 
 So the question is whether many projects are likely to import files
 over network mounts, which would motivate the optimization. Please
 share your thoughts, thanks.
[...] Does caching the contents of import directories cause significant overhead? If not, why not just cache it anyway, regardless of whether the import happens across network mounts. Making excessive OS roundtrips (calling stat() hundreds of times) should be reduced anyway. // On a slightly different note, why are we paying so much attention to import speeds anyway? We can optimize import speeds to hell and back again until they cost practically zero time, yet the act of actually *using* one of those imports -- ostensibly the reason you'd want to import anything in the first place -- immediately adds a huge amount of overhead that by far overshadows those niggly microseconds we pinched. Ergo: import std.regex; void main() { version(withRegex) auto re = regex("a"); } This takes about 0.5 seconds to compile without -version=withRegex on my machine. With -version=withRegex, it takes about *4.5 seconds* to compile. We have a 4 second bottleneck here and yet we're trying to shave off microseconds elsewhere. Why does instantiating a single-character regex add FOUR SECONDS to compilation time? I think *that* is the question we should be answering. T -- People say I'm arrogant, and I'm proud of it.
Mar 26
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/26/19 2:36 PM, H. S. Teoh wrote:
 Does caching the contents of import directories cause significant
 overhead?  If not, why not just cache it anyway, regardless of whether
 the import happens across network mounts.
Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.
 On a slightly different note, why are we paying so much attention to
 import speeds anyway?
You destroy your own opening point: work should be put where there's potential for impact, not "regardless".
 We can optimize import speeds to hell and back
 again until they cost practically zero time, yet the act of actually
 *using*  one of those imports -- ostensibly the reason you'd want to
 import anything in the first place -- immediately adds a huge amount of
 overhead that by far overshadows those niggly microseconds we pinched.
 Ergo:
 
 	import std.regex;
 	void main() {
 		version(withRegex)
 			auto re = regex("a");
 	}
 
 This takes about 0.5 seconds to compile without -version=withRegex on my
 machine. With -version=withRegex, it takes about *4.5 seconds* to
 compile.  We have a 4 second bottleneck here and yet we're trying to
 shave off microseconds elsewhere.  Why does instantiating a
 single-character regex add FOUR SECONDS to compilation time?  I think
 *that*  is the question we should be answering.
There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.
Mar 26
parent Jonathan Marler <johnnymarler gmail.com> writes:
On Tuesday, 26 March 2019 at 20:09:52 UTC, Andrei Alexandrescu 
wrote:
 On 3/26/19 2:36 PM, H. S. Teoh wrote:
 Does caching the contents of import directories cause 
 significant
 overhead?  If not, why not just cache it anyway, regardless of 
 whether
 the import happens across network mounts.
Because testing takes 10 minutes and implementation takes one day or more. We want to make sure there's impact.
 On a slightly different note, why are we paying so much 
 attention to
 import speeds anyway?
You destroy your own opening point: work should be put where there's potential for impact, not "regardless".
 We can optimize import speeds to hell and back
 again until they cost practically zero time, yet the act of 
 actually
 *using*  one of those imports -- ostensibly the reason you'd 
 want to
 import anything in the first place -- immediately adds a huge 
 amount of
 overhead that by far overshadows those niggly microseconds we 
 pinched.
 Ergo:
 
 	import std.regex;
 	void main() {
 		version(withRegex)
 			auto re = regex("a");
 	}
 
 This takes about 0.5 seconds to compile without 
 -version=withRegex on my
 machine. With -version=withRegex, it takes about *4.5 seconds* 
 to
 compile.  We have a 4 second bottleneck here and yet we're 
 trying to
 shave off microseconds elsewhere.  Why does instantiating a
 single-character regex add FOUR SECONDS to compilation time?  
 I think
 *that*  is the question we should be answering.
There's a matter of difficulty. I don't have a good attack on dramatically improving regexen. If you do, it would be of course a high impact project. There's also a matter of paying for what you don't use. Unused imports are, well, unused. Used imports should be paid for in proportion. Agreed, 4.5 seconds is not quite proportionate.
I've included a script below to generate and run a performance test. Save it to your box as "gen", then run "./gen" to generate then test, then "./build" to run it. I tried changing the "stat" calls to use "access" instead, but with around 70,000 system calls (found out using strace), it didn't make any noticeable difference. With "stat" it was around 2.2 seconds and was about the same with "access". So the issue is not with how much memory stat is returning, it the overhead of performing any system call. #!/usr/bin/env python3 # Run "./build" to run the test import os import stat mod_count = 1000 path_count = 20 def mkdir(dir): if not os.path.exists(dir): os.mkdir(dir) mkdir("out") for i in range(0, path_count): mkdir("out/lib{}".format(i)) mkdir("out/mods") for i in range(0, mod_count): with open("out/mods/mod{}.d".format(i), "w") as file: for j in range(0, mod_count): file.write("import mod{};\n".format(j)) with open("out/main.d", "w") as file: for i in range(0, mod_count): file.write("import mod{};\n".format(i)) file.write('void main() { import std.stdio;writeln("working"); }') with open("build", "w") as file: file.write("#!/usr/bin/env bash\n") file.write('[ "$DMD" != "" ] || DMD=dmd\n') file.write("set -x\n") file.write("time $DMD \\\n") for i in range(0, path_count): file.write(" -I=out/lib{} \\\n".format(i)) file.write(" -I=out/mods out/main.d $ ") os.chmod("build", stat.S_IRWXU | stat.S_IRWXG | stat.S_IROTH | stat.S_IXOTH)
Mar 26
prev sibling next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu 
wrote:
 On a Linux moderately-loaded local directory (146 files) 
 mounted from an SSD drive, one failed stat() takes only about 
 0.5 microseconds. That means e.g. if a module imports std.all 
 (which fails 142 times), the overhead accountable to failed 
 stat() calls is about 70 microseconds, i.e. negligible.
I have some related experience with this: - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing. - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent. My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result. I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.
 So the question is whether many projects are likely to import 
 files over network mounts, which would motivate the 
 optimization. Please share your thoughts, thanks.
Honestly, this sounds like you have a solution in search of a problem. [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740
Mar 26
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/26/19 6:04 PM, Vladimir Panteleev wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
 On a Linux moderately-loaded local directory (146 files) mounted from 
 an SSD drive, one failed stat() takes only about 0.5 microseconds. 
 That means e.g. if a module imports std.all (which fails 142 times), 
 the overhead accountable to failed stat() calls is about 70 
 microseconds, i.e. negligible.
I have some related experience with this: - The eternal battle of keeping The Server's load levels down involves some deal of I/O profiling. The pertinent observation was that opening a file by name can be much faster than enumerating files in a directory. The reason for that is many filesystems implementing directories using some variant of hash table, with accessing a file by name being one hash table lookup, while enumerating all files meaning reading the entire thing. - stat() is slow. It fetches a lot of information. Many filesystems do not have all of that information as readily accessible as a file name. This is observable through a simple test: on Ubuntu, drop caches, then, in a big directory, compare the execution time of `ls|cat` vs. `ls`. Explanation: when ls's output is a terminal, it will fetch extra information to colorize objects depending on their properties. These are fetched using stat(), but that's not done when it's piped into a file / another program. I had to take this into account when implementing a fast directory iterator [1] (stat only until necessary). dirEntries from std.file does some of this too, but not to the full extent. My suggestion is: if we are going to read the file if it exists, don't even stat(), just open it. It might result in faster total performance as a result. I would not recommend tricks like readdir() and caching. This ought to be done at the filesystem layer, and smells of problems like TOCTOU / cache invalidation. In any case, I would not suggest spending time on it unless someone encounters a specific, real-life situation where the additional complexity would make it worthwhile to research workarounds.
That's solid, thanks very much! What seems to be the case according to https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a bunch of "exists" are invoked (presumably those would call stat()). Then a filename is returned, which is used to create a File object, see https://github.com/dlang/dmd/blob/master/src/dmd/root/file.d. In turn, that calls open() and then fstat() again on the opened handle. Quite wasteful on the face of it, but hey if the measurable benefit is low not worth optimizing.
 So the question is whether many projects are likely to import files 
 over network mounts, which would motivate the optimization. Please 
 share your thoughts, thanks.
Honestly, this sounds like you have a solution in search of a problem. [1]: https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fc 62d/sys/file.d#L740
Agreed. Just looking for low-hanging fruit to pluck. Andrei
Mar 26
parent Kagamin <spam here.lot> writes:
On Wednesday, 27 March 2019 at 01:32:35 UTC, Andrei Alexandrescu 
wrote:
 What seems to be the case according to 
 https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is 
 that a bunch of "exists" are invoked (presumably those would 
 call stat()).
On windows it calls GetFileAttributesW.
Mar 28
prev sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu 
wrote:
 On a Linux moderately-loaded local directory (146 files) 
 mounted from an SSD drive, one failed stat() takes only about 
 0.5 microseconds. That means e.g. if a module imports std.all 
 (which fails 142 times), the overhead accountable to failed 
 stat() calls is about 70 microseconds, i.e. negligible.
It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Mar 27
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
 On a Linux moderately-loaded local directory (146 files) mounted from 
 an SSD drive, one failed stat() takes only about 0.5 microseconds. 
 That means e.g. if a module imports std.all (which fails 142 times), 
 the overhead accountable to failed stat() calls is about 70 
 microseconds, i.e. negligible.
It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n. #include<stdio.h> #include<string.h> #include<sys/stat.h> #include <sys/types.h> #include <fcntl.h> int main(int argc, char** argv) { size_t i; size_t n = 1000000; const char* s = "/home/user/gd/Google Photos/xyz"; //s = "/home/user/dir/xyz"; //s = "/run/user/1000/gvfs/mount/xyz"; struct stat sfile; for (i = 0; i < n; ++i) { stat(s, &sfile); } return 0; }
Mar 27
parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 27 March 2019 at 12:06:11 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu 
 wrote:
 On a Linux moderately-loaded local directory (146 files) 
 mounted from an SSD drive, one failed stat() takes only about 
 0.5 microseconds. That means e.g. if a module imports std.all 
 (which fails 142 times), the overhead accountable to failed 
 stat() calls is about 70 microseconds, i.e. negligible.
It could be interesting to know whether timings on Windows are more significant. If only I knew how to measure this within 10 minutes... Bastiaan.
Really simple. Here's the C code Eduard and I used. Run it a few times with a variety of paths (change of course to use Windows naming) and divide total run time by n. #include<stdio.h> #include<string.h> #include<sys/stat.h> #include <sys/types.h> #include <fcntl.h> int main(int argc, char** argv) { size_t i; size_t n = 1000000; const char* s = "/home/user/gd/Google Photos/xyz"; //s = "/home/user/dir/xyz"; //s = "/run/user/1000/gvfs/mount/xyz"; struct stat sfile; for (i = 0; i < n; ++i) { stat(s, &sfile); } return 0; }
On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I get ca. 40x worse timings. Compiled with MSVC 2017, no options (cl teststat.c). Timed in PowerShell using `Measure-Command {.\teststat.exe}`. For "/home/user/gd/Google Photos/xyz", a directory that does not exist, total running time is 17 seconds (+/- 0.2). For "/Users/bastiaan/Documents/D/tests/stat/teststat.c", an existing file in a directory with two other files, total running time is a whopping 44 seconds (+/- 1.0). For "/Coin/Coin_source/src/nodes/xyz", a nonexisting file in an existing directory with 114 items, total running time is 19.5 seconds (+/- 1.0). So for me, 142 failed stats cost close to 2.8 milliseconds. Bastiaan.
Mar 27
parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 27 March 2019 at 22:52:09 UTC, Bastiaan Veelo wrote:
 On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, 
 I get ca. 40x worse timings.
File system is NTFS.
Mar 27