digitalmars.D - A few measurements of stat()'s speed

Andrei Alexandrescu (30/30) Mar 26 2019 The current process of searching for imports spans the following

H. S. Teoh (29/44) Mar 26 2019 [...]

Andrei Alexandrescu (10/34) Mar 26 2019 Because testing takes 10 minutes and implementation takes one day or

Jonathan Marler (44/90) Mar 26 2019 I've included a script below to generate and run a performance

Vladimir Panteleev (34/42) Mar 26 2019 I have some related experience with this:

Andrei Alexandrescu (12/56) Mar 26 2019 That's solid, thanks very much!

Kagamin (3/7) Mar 28 2019 On windows it calls GetFileAttributesW.

Bastiaan Veelo (6/11) Mar 27 2019 It could be interesting to know whether timings on Windows are

Andrei Alexandrescu (23/34) Mar 27 2019 Really simple. Here's the C code Eduard and I used. Run it a few times

Bastiaan Veelo (16/52) Mar 27 2019 On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I

Bastiaan Veelo (2/4) Mar 27 2019 File system is NTFS.

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

The current process of searching for imports spans the following 
directories:

* the current directory
* each of the paths specified in the cmdline with -I, in that order
* each of the paths specified in DFLAGS, in that order

For each of these paths, first the ".di" extension is tried, then the 
".d" extension. The function used is stat(). For a majority of cases, 
the ".di" files doesn't exist so at least 50% of stat() calls fail. The 
number of failed stat() calls increases with the -I flags, i.e. with the 
size of the project. (For std imports, that means each will be looked up 
two times in each of the project directories.)

One alternative would be to use opendir()/readdir()/closedir() once for 
each directory searched, and cache the directory's contents. Then, 
subsequent attempts can simply look up the local cache and avoid stat() 
calls in directories that have been previously visited. This approach 
would accelerate imports if stat() is slow "enough".

On a Linux moderately-loaded local directory (146 files) mounted from an 
SSD drive, one failed stat() takes only about 0.5 microseconds. That 
means e.g. if a module imports std.all (which fails 142 times), the 
overhead accountable to failed stat() calls is about 70 microseconds, 
i.e. negligible.

The results change drastically when network mounts are tested. For sftp 
and sshfs mounts on a high speed local connection, one failed stat() 
takes 6-7 milliseconds, so an import like std.all (and many other 
imports liable to transitively pull others) would cause significant 
overheads.

So the question is whether many projects are likely to import files over 
network mounts, which would motivate the optimization. Please share your 
thoughts, thanks.


Andrei

Mar 26 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Mar 26, 2019 at 02:06:08PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 On a Linux moderately-loaded local directory (146 files) mounted from
 an SSD drive, one failed stat() takes only about 0.5 microseconds.
 That means e.g.  if a module imports std.all (which fails 142 times),
 the overhead accountable to failed stat() calls is about 70
 microseconds, i.e. negligible.
 
 The results change drastically when network mounts are tested. For
 sftp and sshfs mounts on a high speed local connection, one failed
 stat() takes 6-7 milliseconds, so an import like std.all (and many
 other imports liable to transitively pull others) would cause
 significant overheads.
 
 So the question is whether many projects are likely to import files
 over network mounts, which would motivate the optimization. Please
 share your thoughts, thanks.

[...]

Does caching the contents of import directories cause significant
overhead?  If not, why not just cache it anyway, regardless of whether
the import happens across network mounts. Making excessive OS roundtrips
(calling stat() hundreds of times) should be reduced anyway.

//

On a slightly different note, why are we paying so much attention to
import speeds anyway?  We can optimize import speeds to hell and back
again until they cost practically zero time, yet the act of actually
*using* one of those imports -- ostensibly the reason you'd want to
import anything in the first place -- immediately adds a huge amount of
overhead that by far overshadows those niggly microseconds we pinched.
Ergo:

	import std.regex;
	void main() {
		version(withRegex)
			auto re = regex("a");
	}

This takes about 0.5 seconds to compile without -version=withRegex on my
machine. With -version=withRegex, it takes about *4.5 seconds* to
compile.  We have a 4 second bottleneck here and yet we're trying to
shave off microseconds elsewhere.  Why does instantiating a
single-character regex add FOUR SECONDS to compilation time?  I think
*that* is the question we should be answering.


T

-- 
People say I'm arrogant, and I'm proud of it.

Mar 26 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/26/19 2:36 PM, H. S. Teoh wrote:
 Does caching the contents of import directories cause significant
 overhead?  If not, why not just cache it anyway, regardless of whether
 the import happens across network mounts.

Because testing takes 10 minutes and implementation takes one day or 
more. We want to make sure there's impact.

 On a slightly different note, why are we paying so much attention to
 import speeds anyway?

You destroy your own opening point: work should be put where there's 
potential for impact, not "regardless".

 We can optimize import speeds to hell and back
 again until they cost practically zero time, yet the act of actually
 *using*  one of those imports -- ostensibly the reason you'd want to
 import anything in the first place -- immediately adds a huge amount of
 overhead that by far overshadows those niggly microseconds we pinched.
 Ergo:
 
 	import std.regex;
 	void main() {
 		version(withRegex)
 			auto re = regex("a");
 	}
 
 This takes about 0.5 seconds to compile without -version=withRegex on my
 machine. With -version=withRegex, it takes about *4.5 seconds* to
 compile.  We have a 4 second bottleneck here and yet we're trying to
 shave off microseconds elsewhere.  Why does instantiating a
 single-character regex add FOUR SECONDS to compilation time?  I think
 *that*  is the question we should be answering.

There's a matter of difficulty. I don't have a good attack on 
dramatically improving regexen. If you do, it would be of course a high 
impact project. There's also a matter of paying for what you don't use. 
Unused imports are, well, unused. Used imports should be paid for in 
proportion. Agreed, 4.5 seconds is not quite proportionate.

Mar 26 2019

Jonathan Marler <johnnymarler gmail.com> writes:

On Tuesday, 26 March 2019 at 20:09:52 UTC, Andrei Alexandrescu 
wrote:
 On 3/26/19 2:36 PM, H. S. Teoh wrote:
 Does caching the contents of import directories cause 
 significant
 overhead?  If not, why not just cache it anyway, regardless of 
 whether
 the import happens across network mounts.

 Because testing takes 10 minutes and implementation takes one 
 day or more. We want to make sure there's impact.

 On a slightly different note, why are we paying so much 
 attention to
 import speeds anyway?

 You destroy your own opening point: work should be put where 
 there's potential for impact, not "regardless".

 We can optimize import speeds to hell and back
 again until they cost practically zero time, yet the act of 
 actually
 *using*  one of those imports -- ostensibly the reason you'd 
 want to
 import anything in the first place -- immediately adds a huge 
 amount of
 overhead that by far overshadows those niggly microseconds we 
 pinched.
 Ergo:
 
 	import std.regex;
 	void main() {
 		version(withRegex)
 			auto re = regex("a");
 	}
 
 This takes about 0.5 seconds to compile without 
 -version=withRegex on my
 machine. With -version=withRegex, it takes about *4.5 seconds* 
 to
 compile.  We have a 4 second bottleneck here and yet we're 
 trying to
 shave off microseconds elsewhere.  Why does instantiating a
 single-character regex add FOUR SECONDS to compilation time?  
 I think
 *that*  is the question we should be answering.

 There's a matter of difficulty. I don't have a good attack on 
 dramatically improving regexen. If you do, it would be of 
 course a high impact project. There's also a matter of paying 
 for what you don't use. Unused imports are, well, unused. Used 
 imports should be paid for in proportion. Agreed, 4.5 seconds 
 is not quite proportionate.

I've included a script below to generate and run a performance 
test.  Save it to your box as "gen", then run "./gen" to generate 
then test, then "./build" to run it.

I tried changing the "stat" calls to use "access" instead, but 
with around 70,000 system calls (found out using strace), it 
didn't make any noticeable difference.  With "stat" it was around 
2.2 seconds and was about the same with "access". So the issue is 
not with how much memory stat is returning, it the overhead of 
performing any system call.





import os
import stat

mod_count = 1000
path_count = 20

def mkdir(dir):
     if not os.path.exists(dir):
         os.mkdir(dir)

mkdir("out")
for i in range(0, path_count):
     mkdir("out/lib{}".format(i))

mkdir("out/mods")
for i in range(0, mod_count):
     with open("out/mods/mod{}.d".format(i), "w") as file:
     	for j in range(0, mod_count):
             file.write("import mod{};\n".format(j))

with open("out/main.d", "w") as file:
     for i in range(0, mod_count):
         file.write("import mod{};\n".format(i))
     file.write('void main() { import 
std.stdio;writeln("working"); }')

with open("build", "w") as file:

     file.write('[ "$DMD" != "" ] || DMD=dmd\n')
     file.write("set -x\n")
     file.write("time $DMD \\\n")
     for i in range(0, path_count):
         file.write("  -I=out/lib{} \\\n".format(i))
     file.write("  -I=out/mods out/main.d $ ")
os.chmod("build", stat.S_IRWXU | stat.S_IRWXG | stat.S_IROTH | 
stat.S_IXOTH)

Mar 26 2019

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu
wrote:
On a Linux moderately-loaded local directory (146 files)
mounted from an SSD drive, one failed stat() takes only about
0.5 microseconds. That means e.g. if a module imports std.all
(which fails 142 times), the overhead accountable to failed
stat() calls is about 70 microseconds, i.e. negligible.

I have some related experience with this:

- The eternal battle of keeping The Server's load levels down
involves some deal of I/O profiling. The pertinent observation
was that opening a file by name can be much faster than
enumerating files in a directory. The reason for that is many
filesystems implementing directories using some variant of hash
table, with accessing a file by name being one hash table lookup,
while enumerating all files meaning reading the entire thing.

- stat() is slow. It fetches a lot of information. Many
filesystems do not have all of that information as readily
accessible as a file name. This is observable through a simple
test: on Ubuntu, drop caches, then, in a big directory, compare
the execution time of `ls|cat` vs. `ls`. Explanation: when ls's
output is a terminal, it will fetch extra information to colorize
objects depending on their properties. These are fetched using
stat(), but that's not done when it's piped into a file / another
program. I had to take this into account when implementing a fast
directory iterator [1] (stat only until necessary). dirEntries
from std.file does some of this too, but not to the full extent.

My suggestion is: if we are going to read the file if it exists,
don't even stat(), just open it. It might result in faster total
performance as a result.

I would not recommend tricks like readdir() and caching. This
ought to be done at the filesystem layer, and smells of problems
like TOCTOU / cache invalidation. In any case, I would not
suggest spending time on it unless someone encounters a specific,
real-life situation where the additional complexity would make it
worthwhile to research workarounds.

So the question is whether many projects are likely to import
files over network mounts, which would motivate the
optimization. Please share your thoughts, thanks.

Honestly, this sounds like you have a solution in search of a
problem.

[1]:
https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fcc62d/sys/file.d#L740

Mar 26 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/26/19 6:04 PM, Vladimir Panteleev wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
 On a Linux moderately-loaded local directory (146 files) mounted from 
 an SSD drive, one failed stat() takes only about 0.5 microseconds. 
 That means e.g. if a module imports std.all (which fails 142 times), 
 the overhead accountable to failed stat() calls is about 70 
 microseconds, i.e. negligible.

 
 I have some related experience with this:
 
 - The eternal battle of keeping The Server's load levels down involves 
 some deal of I/O profiling. The pertinent observation was that opening a 
 file by name can be much faster than enumerating files in a directory. 
 The reason for that is many filesystems implementing directories using 
 some variant of hash table, with accessing a file by name being one hash 
 table lookup, while enumerating all files meaning reading the entire thing.
 
 - stat() is slow. It fetches a lot of information. Many filesystems do 
 not have all of that information as readily accessible as a file name. 
 This is observable through a simple test: on Ubuntu, drop caches, then, 
 in a big directory, compare the execution time of `ls|cat` vs. `ls`. 
 Explanation: when ls's output is a terminal, it will fetch extra 
 information to colorize objects depending on their properties. These are 
 fetched using stat(), but that's not done when it's piped into a file / 
 another program. I had to take this into account when implementing a 
 fast directory iterator [1] (stat only until necessary). dirEntries from 
 std.file does some of this too, but not to the full extent.
 
 My suggestion is: if we are going to read the file if it exists, don't 
 even stat(), just open it. It might result in faster total performance 
 as a result.
 
 I would not recommend tricks like readdir() and caching. This ought to 
 be done at the filesystem layer, and smells of problems like TOCTOU / 
 cache invalidation. In any case, I would not suggest spending time on it 
 unless someone encounters a specific, real-life situation where the 
 additional complexity would make it worthwhile to research workarounds.

That's solid, thanks very much!

What seems to be the case according to 
https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is that a 
bunch of "exists" are invoked (presumably those would call stat()). Then 
a filename is returned, which is used to create a File object, see 
https://github.com/dlang/dmd/blob/master/src/dmd/root/file.d. In turn, 
that calls open() and then fstat() again on the opened handle. Quite 
wasteful on the face of it, but hey if the measurable benefit is low not 
worth optimizing.

 So the question is whether many projects are likely to import files 
 over network mounts, which would motivate the optimization. Please 
 share your thoughts, thanks.

 
 Honestly, this sounds like you have a solution in search of a problem.
 
 [1]: 
 https://github.com/CyberShadow/ae/blob/25850209e03ee97640a9b0715efe7e25b1fc
62d/sys/file.d#L740 

Agreed. Just looking for low-hanging fruit to pluck.


Andrei

Mar 26 2019

Kagamin <spam here.lot> writes:

On Wednesday, 27 March 2019 at 01:32:35 UTC, Andrei Alexandrescu 
wrote:
 What seems to be the case according to 
 https://github.com/dlang/dmd/blob/master/src/dmd/dmodule.d is 
 that a bunch of "exists" are invoked (presumably those would 
 call stat()).

On windows it calls GetFileAttributesW.

Mar 28 2019

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu 
wrote:
 On a Linux moderately-loaded local directory (146 files) 
 mounted from an SSD drive, one failed stat() takes only about 
 0.5 microseconds. That means e.g. if a module imports std.all 
 (which fails 142 times), the overhead accountable to failed 
 stat() calls is about 70 microseconds, i.e. negligible.

It could be interesting to know whether timings on Windows are 
more significant. If only I knew how to measure this within 10 
minutes...

Bastiaan.

Mar 27 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu wrote:
 On a Linux moderately-loaded local directory (146 files) mounted from 
 an SSD drive, one failed stat() takes only about 0.5 microseconds. 
 That means e.g. if a module imports std.all (which fails 142 times), 
 the overhead accountable to failed stat() calls is about 70 
 microseconds, i.e. negligible.

 
 It could be interesting to know whether timings on Windows are more 
 significant. If only I knew how to measure this within 10 minutes...
 
 Bastiaan.

Really simple. Here's the C code Eduard and I used. Run it a few times 
with a variety of paths (change of course to use Windows naming) and 
divide total run time by n.

#include<stdio.h>
#include<string.h>
#include<sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
     size_t i;
     size_t n = 1000000;
     const char* s = "/home/user/gd/Google Photos/xyz";
     //s = "/home/user/dir/xyz";
     //s = "/run/user/1000/gvfs/mount/xyz";

     struct stat sfile;
     for (i = 0; i < n; ++i)
     {
       stat(s, &sfile);
     }
     return 0;
}

Mar 27 2019

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Wednesday, 27 March 2019 at 12:06:11 UTC, Andrei Alexandrescu 
wrote:
 On 3/27/19 5:23 AM, Bastiaan Veelo wrote:
 On Tuesday, 26 March 2019 at 18:06:08 UTC, Andrei Alexandrescu 
 wrote:
 On a Linux moderately-loaded local directory (146 files) 
 mounted from an SSD drive, one failed stat() takes only about 
 0.5 microseconds. That means e.g. if a module imports std.all 
 (which fails 142 times), the overhead accountable to failed 
 stat() calls is about 70 microseconds, i.e. negligible.

 
 It could be interesting to know whether timings on Windows are 
 more significant. If only I knew how to measure this within 10 
 minutes...
 
 Bastiaan.

 Really simple. Here's the C code Eduard and I used. Run it a 
 few times with a variety of paths (change of course to use 
 Windows naming) and divide total run time by n.

 #include<stdio.h>
 #include<string.h>
 #include<sys/stat.h>
 #include <sys/types.h>
 #include <fcntl.h>

 int main(int argc, char** argv)
 {
     size_t i;
     size_t n = 1000000;
     const char* s = "/home/user/gd/Google Photos/xyz";
     //s = "/home/user/dir/xyz";
     //s = "/run/user/1000/gvfs/mount/xyz";

     struct stat sfile;
     for (i = 0; i < n; ++i)
     {
       stat(s, &sfile);
     }
     return 0;
 }

On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, I 
get ca. 40x worse timings. Compiled with MSVC 2017, no options 
(cl teststat.c). Timed in PowerShell using `Measure-Command 
{.\teststat.exe}`.

For "/home/user/gd/Google Photos/xyz", a directory that does not 
exist, total running time is 17 seconds (+/- 0.2).

For "/Users/bastiaan/Documents/D/tests/stat/teststat.c", an 
existing file in a directory with two other files, total running 
time is a whopping 44 seconds (+/- 1.0).

For "/Coin/Coin_source/src/nodes/xyz", a nonexisting file in an 
existing directory with 114 items, total running time is 19.5 
seconds (+/- 1.0). So for me, 142 failed stats cost close to 2.8 
milliseconds.

Bastiaan.

Mar 27 2019

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Wednesday, 27 March 2019 at 22:52:09 UTC, Bastiaan Veelo wrote:
 On Windows 10, i7-7700HQ, M.2 SSD, provided I did things right, 
 I get ca. 40x worse timings.

File system is NTFS.

Mar 27 2019

D Programming

C/C++ Programming

Other

digitalmars.D - A few measurements of stat()'s speed