www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Speeding up DCD in big projects

reply WebFreak001 <d.forum webfreak.org> writes:
So it turns out the `resolveImportLocation` function, which is 
called on the first semantic pass, currently on Windows takes 
like 50ms to complete once for just about 60 import paths. In a 
big project like serve-d (as source code project) this would sum 
up to over 7 minutes. (in release mode just around 2 minutes)

I optimized the import path check code and now in DCD debug mode 
you might get a 83x speed up for your first completion and in DCD 
release mode you might get a 26x speed up for your first 
completion. In serve-d (as source code project) this translated 
to the first completion now only taking 5 seconds.

I think in big projects optimizing the startup time too is pretty 
essential and this was a very easy improvement I immediately 
found using the debugger. The example times are made using DCD 
~master with my dsymbol PR on and off inside the source code of 
serve-d trying to auto complete extension.d:860

This was a very easy to find and improve spot, there might be 
more low hanging fruit to get the first parse and completion even 
quicker than 5s.

The PR: https://github.com/dlang-community/dsymbol/pull/151
Jul 21 2020
next sibling parent reply WebFreak001 <d.forum webfreak.org> writes:
Essential to this were `nothrow` overloads of isFile/isDir in 
std.file, which I now manually implemented. Having functions for 
those in phobos would be awesome and clearly would offer huge 
performance improvements.
Jul 21 2020
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in 
 std.file, which I now manually implemented. Having functions 
 for those in phobos would be awesome and clearly would offer 
 huge performance improvements.
Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
Jul 21 2020
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in
 std.file, which I now manually implemented. Having functions for
 those in phobos would be awesome and clearly would offer huge
 performance improvements.
Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
[...] I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O. An I/O roundtrip far outweighs whatever meager savings you may have won with nothrow. I have a hard time conceiving of a situation where nothrow would confer significant performance improvements to isFile/isDir. IMO you'd get much better savings by reducing the number of I/O roundtrips instead. T -- Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.
Jul 21 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/21/20 1:02 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d
wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in
 std.file, which I now manually implemented. Having functions for
 those in phobos would be awesome and clearly would offer huge
 performance improvements.
Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
[...] I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O. An I/O roundtrip far outweighs whatever meager savings you may have won with nothrow. I have a hard time conceiving of a situation where nothrow would confer significant performance improvements to isFile/isDir. IMO you'd get much better savings by reducing the number of I/O roundtrips instead.
If you look at the change, it's not the nothrow optimization that Walter always talks about (where code that is marked nothrow performs slightly better than code that isn't marked nothrow but doesn't end up throwing), it's that the COMMON CASE was that an exception is thrown and caught, and then bool is returned. Instead, just return the bool. Also, just because it involves file info, doesn't mean it's doing disk i/o for every call. -Steve
Jul 21 2020
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 1:02 PM, H. S. Teoh wrote:
[...]
 I'm skeptical of "huge performance improvements" with nothrow in
 anything that involves disk I/O.
[...]
 If you look at the change, it's not the nothrow optimization that
 Walter always talks about (where code that is marked nothrow performs
 slightly better than code that isn't marked nothrow but doesn't end up
 throwing), it's that the COMMON CASE was that an exception is thrown
 and caught, and then bool is returned. Instead, just return the bool.
[...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design. T -- Век живи - век учись. А дураком помрёшь.
Jul 21 2020
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/21/20 1:32 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 1:02 PM, H. S. Teoh wrote:
[...]
 I'm skeptical of "huge performance improvements" with nothrow in
 anything that involves disk I/O.
[...]
 If you look at the change, it's not the nothrow optimization that
 Walter always talks about (where code that is marked nothrow performs
 slightly better than code that isn't marked nothrow but doesn't end up
 throwing), it's that the COMMON CASE was that an exception is thrown
 and caught, and then bool is returned. Instead, just return the bool.
[...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.
Agree. Returning false seems appropriate as something that doesn't exist is clearly not a file or directory. -Steve
Jul 21 2020
prev sibling next sibling parent reply aberba <karabutaworld gmail.com> writes:
On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer 
 via Digitalmars-d wrote:
 [...]
[...]
 [...]
[...]
 [...]
[...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.
I would say a work in progress
 T
Jul 21 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 05:59:53PM +0000, aberba via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:
[...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome
 of isFile/isDir, throwing in that case seems excessively
 heavy-handed. It's probably a case of bad API design.
I would say a work in progress
[...] Hardly. AFAICT the API has been like this for many years now. If it's WIP then "progress" has come to a standstill and we need to do something about it. And as it turns out, there's a bug filed for it: https://issues.dlang.org/show_bug.cgi?id=10240 I am of the opinion that isDir/isFile should not throw if the file doesn't exist. That kind of ivory-tower-ideal API is just excessively heavy-handed and not useful in practice. T -- EMACS = Extremely Massive And Cumbersome System
Jul 21 2020
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2020-07-21 19:32, H. S. Teoh wrote:

 Ahh I see.  That makes sense then.
 
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
If it shouldn't throw, how should it deal with other IO errors than non-existing file/dir? What if it exists but you don't have permission? How do you tell the difference between the following conditions when it only returns a bool and doesn't throw: * Exists and is a file // success * Exists and is a file but you don't have permission // error * Does not exist // error There are probably other error conditions as well.
 It's probably a case of bad API design.
No, exceptions is the main error reporting system in D. Using any other system should have really good reasons to not use exceptions. In my opinion, any other system (that I can come up with) results in bad and cumbersome APIs. Returning error codes: Cons: * Steals the return channel * Forces to use out parameters * By default errors are not checked (the compiler won't force you do look at the return value) * Cumbersome to check errors * Cumbersome to propagate errors * Error handling code and regular code uses the same syntax Pros: * Better performance (some argue this is not true) * Does not require any runtime support Returning an optional: Cons: * If an error occurred you only know that, not what error occurred * Cumbersome to check errors without language support * Cumbersome to propagate errors without language support Pros: * Error checking is better then error codes because you will be forced to check the error to get to the returned value * The return channel can still be used * No need to use out parameters Returning a result object: * More or less the same as "optional" but with the advantage of the error that occurred is available * Another advantage over error codes is that it's usually possible to use an object for the error and not just an int Cocoa error reporting: This model returns a bool to indicate error or not. It "returns" an error object through an out parameter. Similar to returning a result object, but since the out value and the error are separate the is no enforcement to check the error before accessing the out value D Exceptions: Pros: * Leaves the return channel open for what it is intended for, returning values * Not necessary to use out parameters * Errors are handled one way or another. Either by the developer or by the runtime * Easy to propagate errors * Does not clutter regular code with error handling code * Standardized error system Cons: * Slow * Requires runtime support * Any function not marked with nothrow may throw In my opinion the best of solution would be to use the "result object" style of error reporting with syntax sugar supported in the language. This is what's proposed for C++ [1]. The advantage is that we can use the existing syntax which are used for exceptions. In D, it could look like this: enum CopyError { permissionDenied } void copy(string src, string dest) throw(CopyError) { throw CopyError.permissionDenied; } void main() { try copy("foo", "bar"); catch (CopyError e) writeln(e); } Which would be lowered to something like the equivalent of the following code: struct Result(Value, Error) { bool isValue; union { Value value; Error error; } this(Value value) { this.value = value; isValue = true; } this(Error errro) { this.error = error; isValue = true; } } Result!(void, CopyError) copy(string src, string dest) { return Result!(void, CopyError)(CopyError.permissionDenied); } void main() { auto result = copy("foo", "bar"); if (!result.isValue) goto L1; L1: writeln(result.error); } [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0709r0.pdf -- /Jacob Carlborg
Jul 22 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 22, 2020 at 01:28:56PM +0200, Jacob Carlborg via Digitalmars-d
wrote:
 On 2020-07-21 19:32, H. S. Teoh wrote:
 
 Ahh I see.  That makes sense then.
 
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome
 of isFile/isDir, throwing in that case seems excessively
 heavy-handed.
If it shouldn't throw, how should it deal with other IO errors than non-existing file/dir? What if it exists but you don't have permission?
A non-existent file/dir is an expected outcome; an I/O error (e.g. disk corruption) is not. The former should be a normal return value; the latter should be an exception. Similarly, if a program is testing whether some pathname is a file, the expectation is that it has the permission to access that path; so a permission error is likewise an exception.
 How do you tell the difference between the following conditions when
 it only returns a bool and doesn't throw:
 * Exists and is a file // success
 * Exists and is a file but you don't have permission // error
 * Does not exist // error
I didn't say it should never throw, just that it should not throw merely because the file doesn't exist. The typical usage of isFile/isDir is that if a given pathname exists and is a file/dir, then perform some operations on it. In such scenarios, the expectation is that if the file doesn't exist, isFile/isDir should simply return false (it doesn't exist, so it follows logically that it's not a file/dir). If an I/O error should occur, however, then an exception should still be thrown (something is wrong with the filesystem, or the program doesn't have the permissions it thought it had, which is not an expected outcome). The distinction between expected outcome and unexpected outcome is important here. You wouldn't want a function that tests whether some number is negative to throw an exception when it's negative, after all, since that's a normally-expected outcome. But if an memory error was encountered when performing that test, then you *would* want to throw an exception instead of returning true or false, because it's an unexpected outcome, signalling that something is potentially very wrong with the system, so normal computation should not continue. T -- Государство делает вид, что платит нам зарплату, а мы делаем вид, что работаем.
Aug 01 2020
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
 It's probably a case of bad API design.
[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile. T -- Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
Jul 21 2020
next sibling parent reply WebFreak001 <d.forum webfreak.org> writes:
On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via 
 Digitalmars-d wrote: [...]
 Which also begs the question, *why* does it even throw in the 
 first place.  The non-existence of a file is a 
 normally-expected outcome of isFile/isDir, throwing in that 
 case seems excessively heavy-handed. It's probably a case of 
 bad API design.
[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile. T
don't do this, this introduces a race condition (and is the reason why this was a function using try-catch in the first place)
Jul 21 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 07:24:19PM +0000, WebFreak001 via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:
[...]
 There is the workaround of writing:
 
 	if (f.exists && f.isDir) ...
 
 but it's ugly and involves an extra unnecessary syscall roundtrip.
 
 Basically, this kind of API design is just hideous. I propose we
 remove the throw from isDir/isFile.
[...]
 don't do this, this introduces a race condition (and is the reason why
 this was a function using try-catch in the first place)
Yet another reason isDir should not throw! T -- Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
Jul 21 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/21/20 3:03 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote:
 [...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
 It's probably a case of bad API design.
[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile.
We'd probably need a different name. Unfortunately. -Steve
Jul 21 2020
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jul 21, 2020 at 03:25:21PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 3:03 PM, H. S. Teoh wrote:
[...]
 Basically, this kind of API design is just hideous. I propose we
 remove the throw from isDir/isFile.
We'd probably need a different name. Unfortunately.
[...] Ugh. :-( T -- I am Ohm of Borg. Resistance is voltage over current.
Jul 21 2020
prev sibling parent WebFreak001 <d.forum webfreak.org> writes:
On Tuesday, 21 July 2020 at 16:46:23 UTC, Bruce Carneal wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in 
 std.file, which I now manually implemented. Having functions 
 for those in phobos would be awesome and clearly would offer 
 huge performance improvements.
Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
before code was like this: try return isFile(file); catch (FileException) return false; now it's like this: uint attr; bool exists = getAttrs(file, &attr); return exists && attrIsFile(attr); so the useless throwing was released which was the main alteration. I further optimized the logic of the caller (removed useless array pushing, added early returns and reduced FS calls) but that was probably relatively small. I only really checked in the debugger how fast the loop was roughly, where it was before like 40-60ms and only after the nothrow change it was <1ms. So the nothrow change definitely made most of this, considering this was pretty much hot code (called for every import, recursively into all D files imported + calling FS calls like 3 times for every import path where I had 60 import paths)
Jul 21 2020
prev sibling parent aberba <karabutaworld gmail.com> writes:
On Tuesday, 21 July 2020 at 12:51:50 UTC, WebFreak001 wrote:
 So it turns out the `resolveImportLocation` function, which is 
 called on the first semantic pass, currently on Windows takes 
 like 50ms to complete once for just about 60 import paths. In a 
 big project like serve-d (as source code project) this would 
 sum up to over 7 minutes. (in release mode just around 2 
 minutes)

 [...]
Nice. DCD improvements bubbles down to many dev tools.
Jul 21 2020