digitalmars.D - Speeding up DCD in big projects

WebFreak001 (19/19) Jul 21 2020 So it turns out the `resolveImportLocation` function, which is

WebFreak001 (4/4) Jul 21 2020 Essential to this were `nothrow` overloads of isFile/isDir in

Bruce Carneal (5/9) Jul 21 2020 Was the change to nothrow the only alteration? I ask because I

H. S. Teoh (11/22) Jul 21 2020 [...]

Steven Schveighoffer (9/29) Jul 21 2020 If you look at the change, it's not the nothrow optimization that Walter...

H. S. Teoh (12/20) Jul 21 2020 [...]

Steven Schveighoffer (4/24) Jul 21 2020 Agree. Returning false seems appropriate as something that doesn't exist...
aberba (2/17) Jul 21 2020

H. S. Teoh (14/21) Jul 21 2020 [...]

Jacob Carlborg (115/121) Jul 22 2020 If it shouldn't throw, how should it deal with other IO errors than

H. S. Teoh (27/44) Aug 01 2020 A non-existent file/dir is an expected outcome; an I/O error (e.g. disk

H. S. Teoh (11/15) Jul 21 2020 [...]

WebFreak001 (3/18) Jul 21 2020 don't do this, this introduces a race condition (and is the

H. S. Teoh (7/18) Jul 21 2020 [...]

Steven Schveighoffer (3/19) Jul 21 2020 We'd probably need a different name. Unfortunately.

H. S. Teoh (7/12) Jul 21 2020 [...]

WebFreak001 (21/30) Jul 21 2020 before code was like this:

aberba (2/9) Jul 21 2020 Nice. DCD improvements bubbles down to many dev tools.

WebFreak001 <d.forum webfreak.org> writes:

So it turns out the `resolveImportLocation` function, which is 
called on the first semantic pass, currently on Windows takes 
like 50ms to complete once for just about 60 import paths. In a 
big project like serve-d (as source code project) this would sum 
up to over 7 minutes. (in release mode just around 2 minutes)

I optimized the import path check code and now in DCD debug mode 
you might get a 83x speed up for your first completion and in DCD 
release mode you might get a 26x speed up for your first 
completion. In serve-d (as source code project) this translated 
to the first completion now only taking 5 seconds.

I think in big projects optimizing the startup time too is pretty 
essential and this was a very easy improvement I immediately 
found using the debugger. The example times are made using DCD 
~master with my dsymbol PR on and off inside the source code of 
serve-d trying to auto complete extension.d:860

This was a very easy to find and improve spot, there might be 
more low hanging fruit to get the first parse and completion even 
quicker than 5s.

The PR: https://github.com/dlang-community/dsymbol/pull/151

Jul 21 2020

WebFreak001 <d.forum webfreak.org> writes:

Essential to this were `nothrow` overloads of isFile/isDir in 
std.file, which I now manually implemented. Having functions for 
those in phobos would be awesome and clearly would offer huge 
performance improvements.

Jul 21 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in 
 std.file, which I now manually implemented. Having functions 
 for those in phobos would be awesome and clearly would offer 
 huge performance improvements.

Was the change to nothrow the only alteration?  I ask because I 
thought that a move to nothrow only wins big if it enables 
inlining within a hot loop.

Perhaps your manual implementation wins in other ways?

Jul 21 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in
 std.file, which I now manually implemented. Having functions for
 those in phobos would be awesome and clearly would offer huge
 performance improvements.

 
 Was the change to nothrow the only alteration?  I ask because I
 thought that a move to nothrow only wins big if it enables inlining
 within a hot loop.
 
 Perhaps your manual implementation wins in other ways?

[...]

I'm skeptical of "huge performance improvements" with nothrow in
anything that involves disk I/O.  An I/O roundtrip far outweighs
whatever meager savings you may have won with nothrow.  I have a hard
time conceiving of a situation where nothrow would confer significant
performance improvements to isFile/isDir.  IMO you'd get much better
savings by reducing the number of I/O roundtrips instead.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more
about less and less. Eventually, you know everything about nothing.

Jul 21 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 7/21/20 1:02 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d
wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in
 std.file, which I now manually implemented. Having functions for
 those in phobos would be awesome and clearly would offer huge
 performance improvements.

 Was the change to nothrow the only alteration?  I ask because I
 thought that a move to nothrow only wins big if it enables inlining
 within a hot loop.

 Perhaps your manual implementation wins in other ways?

 [...]
 
 I'm skeptical of "huge performance improvements" with nothrow in
 anything that involves disk I/O.  An I/O roundtrip far outweighs
 whatever meager savings you may have won with nothrow.  I have a hard
 time conceiving of a situation where nothrow would confer significant
 performance improvements to isFile/isDir.  IMO you'd get much better
 savings by reducing the number of I/O roundtrips instead.

If you look at the change, it's not the nothrow optimization that Walter 
always talks about (where code that is marked nothrow performs slightly 
better than code that isn't marked nothrow but doesn't end up throwing), 
it's that the COMMON CASE was that an exception is thrown and caught, 
and then bool is returned. Instead, just return the bool.

Also, just because it involves file info, doesn't mean it's doing disk 
i/o for every call.

-Steve

Jul 21 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 1:02 PM, H. S. Teoh wrote:

[...]
 I'm skeptical of "huge performance improvements" with nothrow in
 anything that involves disk I/O.


[...]
 If you look at the change, it's not the nothrow optimization that
 Walter always talks about (where code that is marked nothrow performs
 slightly better than code that isn't marked nothrow but doesn't end up
 throwing), it's that the COMMON CASE was that an exception is thrown
 and caught, and then bool is returned. Instead, just return the bool.

[...]

Ahh I see.  That makes sense then.

Which also begs the question, *why* does it even throw in the first
place.  The non-existence of a file is a normally-expected outcome of
isFile/isDir, throwing in that case seems excessively heavy-handed.
It's probably a case of bad API design.


T

-- 
Век живи - век учись. А дураком помрёшь.

Jul 21 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 7/21/20 1:32 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 1:02 PM, H. S. Teoh wrote:

 [...]
 I'm skeptical of "huge performance improvements" with nothrow in
 anything that involves disk I/O.


 [...]
 If you look at the change, it's not the nothrow optimization that
 Walter always talks about (where code that is marked nothrow performs
 slightly better than code that isn't marked nothrow but doesn't end up
 throwing), it's that the COMMON CASE was that an exception is thrown
 and caught, and then bool is returned. Instead, just return the bool.

 [...]
 
 Ahh I see.  That makes sense then.
 
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
 It's probably a case of bad API design.
 

Agree. Returning false seems appropriate as something that doesn't exist 
is clearly not a file or directory.

-Steve

Jul 21 2020

aberba <karabutaworld gmail.com> writes:

On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer 
 via Digitalmars-d wrote:
 [...]

 [...]
 [...]


 [...]
 [...]

 [...]

 Ahh I see.  That makes sense then.

 Which also begs the question, *why* does it even throw in the 
 first place.  The non-existence of a file is a 
 normally-expected outcome of isFile/isDir, throwing in that 
 case seems excessively heavy-handed. It's probably a case of 
 bad API design.

I would say a work in progress
 T

Jul 21 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 05:59:53PM +0000, aberba via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:

[...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome
 of isFile/isDir, throwing in that case seems excessively
 heavy-handed. It's probably a case of bad API design.

 
 I would say a work in progress

[...]

Hardly.  AFAICT the API has been like this for many years now. If it's
WIP then "progress" has come to a standstill and we need to do something
about it.

And as it turns out, there's a bug filed for it:

	https://issues.dlang.org/show_bug.cgi?id=10240

I am of the opinion that isDir/isFile should not throw if the file
doesn't exist.  That kind of ivory-tower-ideal API is just excessively
heavy-handed and not useful in practice.


T

-- 
EMACS = Extremely Massive And Cumbersome System

Jul 21 2020

Jacob Carlborg <doob me.com> writes:

On 2020-07-21 19:32, H. S. Teoh wrote:

 Ahh I see.  That makes sense then.
 
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.

If it shouldn't throw, how should it deal with other IO errors than 
non-existing file/dir? What if it exists but you don't have permission? 
How do you tell the difference between the following conditions when it 
only returns a bool and doesn't throw:

* Exists and is a file // success
* Exists and is a file but you don't have permission // error
* Does not exist // error

There are probably other error conditions as well.

 It's probably a case of bad API design.

No, exceptions is the main error reporting system in D. Using any other 
system should have really good reasons to not use exceptions.

In my opinion, any other system (that I can come up with) results in bad 
and cumbersome APIs.

Returning error codes:
   Cons:
     * Steals the return channel
     * Forces to use out parameters
     * By default errors are not checked (the compiler won't force you 
do look at the return value)
     * Cumbersome to check errors
     * Cumbersome to propagate errors
     * Error handling code and regular code uses the same syntax
   Pros:
     * Better performance (some argue this is not true)
     * Does not require any runtime support

Returning an optional:
   Cons:
     * If an error occurred you only know that, not what error occurred
     * Cumbersome to check errors without language support
     * Cumbersome to propagate errors without language support
   Pros:
     * Error checking is better then error codes because you will be 
forced to check the error to get to the returned value
     * The return channel can still be used
     * No need to use out parameters

Returning a result object:
     * More or less the same as "optional" but with the advantage of the 
error that occurred is available
     * Another advantage over error codes is that it's usually possible 
to use an object for the error and not just an int

Cocoa error reporting:
     This model returns a bool to indicate error or not. It "returns" an 
error object through an out parameter. Similar to returning a result 
object, but since the out value and the error are separate the is no 
enforcement to check the error before accessing the out value

D Exceptions:
   Pros:
     * Leaves the return channel open for what it is intended for, 
returning values
     * Not necessary to use out parameters
     * Errors are handled one way or another. Either by the developer or 
by the runtime
     * Easy to propagate errors
     * Does not clutter regular code with error handling code
     * Standardized error system
   Cons:
     * Slow
     * Requires runtime support
     * Any function not marked with  nothrow may throw

In my opinion the best of solution would be to use the "result object" 
style of error reporting with syntax sugar supported in the language. 
This is what's proposed for C++ [1]. The advantage is that we can use 
the existing syntax which are used for exceptions. In D, it could look 
like this:

enum CopyError
{
     permissionDenied
}

void copy(string src, string dest) throw(CopyError)
{
     throw CopyError.permissionDenied;
}

void main()
{
     try
         copy("foo", "bar");
     catch (CopyError e)
         writeln(e);
}

Which would be lowered to something like the equivalent of the following 
code:

struct Result(Value, Error)
{
     bool isValue;
     union
     {
         Value value;
         Error error;
     }

     this(Value value)
     {
         this.value = value;
         isValue = true;
     }

     this(Error errro)
     {
         this.error = error;
         isValue = true;
     }
}

Result!(void, CopyError) copy(string src, string dest)
{
     return Result!(void, CopyError)(CopyError.permissionDenied);
}

void main()
{
     auto result = copy("foo", "bar");

     if (!result.isValue)
         goto L1;
     L1:
         writeln(result.error);
}


[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0709r0.pdf

-- 
/Jacob Carlborg

Jul 22 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 22, 2020 at 01:28:56PM +0200, Jacob Carlborg via Digitalmars-d
wrote:
 On 2020-07-21 19:32, H. S. Teoh wrote:
 
 Ahh I see.  That makes sense then.
 
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome
 of isFile/isDir, throwing in that case seems excessively
 heavy-handed.

 
 If it shouldn't throw, how should it deal with other IO errors than
 non-existing file/dir? What if it exists but you don't have
 permission?

A non-existent file/dir is an expected outcome; an I/O error (e.g. disk
corruption) is not. The former should be a normal return value; the
latter should be an exception.  Similarly, if a program is testing
whether some pathname is a file, the expectation is that it has the
permission to access that path; so a permission error is likewise an
exception.


 How do you tell the difference between the following conditions when
 it only returns a bool and doesn't throw:

 * Exists and is a file // success
 * Exists and is a file but you don't have permission // error
 * Does not exist // error

I didn't say it should never throw, just that it should not throw merely
because the file doesn't exist.  The typical usage of isFile/isDir is
that if a given pathname exists and is a file/dir, then perform some
operations on it. In such scenarios, the expectation is that if the file
doesn't exist, isFile/isDir should simply return false (it doesn't
exist, so it follows logically that it's not a file/dir).

If an I/O error should occur, however, then an exception should still be
thrown (something is wrong with the filesystem, or the program doesn't
have the permissions it thought it had, which is not an expected
outcome).  The distinction between expected outcome and unexpected
outcome is important here.  You wouldn't want a function that tests
whether some number is negative to throw an exception when it's
negative, after all, since that's a normally-expected outcome. But if an
memory error was encountered when performing that test, then you *would*
want to throw an exception instead of returning true or false, because
it's an unexpected outcome, signalling that something is potentially
very wrong with the system, so normal computation should not continue.


T

-- 
Государство делает вид, что платит нам
зарплату, а мы делаем вид, что работаем.

Aug 01 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote:
[...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
 It's probably a case of bad API design.

[...]

There is the workaround of writing:

	if (f.exists && f.isDir) ...

but it's ugly and involves an extra unnecessary syscall roundtrip.

Basically, this kind of API design is just hideous. I propose we remove
the throw from isDir/isFile.


T

-- 
Never step over a puddle, always step around it. Chances are that whatever made
it is still dripping.

Jul 21 2020

WebFreak001 <d.forum webfreak.org> writes:

On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via 
 Digitalmars-d wrote: [...]
 Which also begs the question, *why* does it even throw in the 
 first place.  The non-existence of a file is a 
 normally-expected outcome of isFile/isDir, throwing in that 
 case seems excessively heavy-handed. It's probably a case of 
 bad API design.

 [...]

 There is the workaround of writing:

 	if (f.exists && f.isDir) ...

 but it's ugly and involves an extra unnecessary syscall 
 roundtrip.

 Basically, this kind of API design is just hideous. I propose 
 we remove the throw from isDir/isFile.


 T

don't do this, this introduces a race condition (and is the 
reason why this was a function using try-catch in the first place)

Jul 21 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 07:24:19PM +0000, WebFreak001 via Digitalmars-d wrote:
 On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:

[...]
 There is the workaround of writing:
 
 	if (f.exists && f.isDir) ...
 
 but it's ugly and involves an extra unnecessary syscall roundtrip.
 
 Basically, this kind of API design is just hideous. I propose we
 remove the throw from isDir/isFile.


[...]
 don't do this, this introduces a race condition (and is the reason why
 this was a function using try-catch in the first place)

Yet another reason isDir should not throw!


T

-- 
Never ascribe to malice that which is adequately explained by incompetence. --
Napoleon Bonaparte

Jul 21 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 7/21/20 3:03 PM, H. S. Teoh wrote:
 On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote:
 [...]
 Which also begs the question, *why* does it even throw in the first
 place.  The non-existence of a file is a normally-expected outcome of
 isFile/isDir, throwing in that case seems excessively heavy-handed.
 It's probably a case of bad API design.

 [...]
 
 There is the workaround of writing:
 
 	if (f.exists && f.isDir) ...
 
 but it's ugly and involves an extra unnecessary syscall roundtrip.
 
 Basically, this kind of API design is just hideous. I propose we remove
 the throw from isDir/isFile.

We'd probably need a different name. Unfortunately.

-Steve

Jul 21 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 21, 2020 at 03:25:21PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 7/21/20 3:03 PM, H. S. Teoh wrote:

[...]
 Basically, this kind of API design is just hideous. I propose we
 remove the throw from isDir/isFile.

 
 We'd probably need a different name. Unfortunately.

[...]

Ugh. :-(


T

-- 
I am Ohm of Borg. Resistance is voltage over current.

Jul 21 2020

WebFreak001 <d.forum webfreak.org> writes:

On Tuesday, 21 July 2020 at 16:46:23 UTC, Bruce Carneal wrote:
 On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:
 Essential to this were `nothrow` overloads of isFile/isDir in 
 std.file, which I now manually implemented. Having functions 
 for those in phobos would be awesome and clearly would offer 
 huge performance improvements.

 Was the change to nothrow the only alteration?  I ask because I 
 thought that a move to nothrow only wins big if it enables 
 inlining within a hot loop.

 Perhaps your manual implementation wins in other ways?

before code was like this:

try
     return isFile(file);
catch (FileException)
     return false;


now it's like this:

uint attr;
bool exists = getAttrs(file, &attr);
return exists && attrIsFile(attr);


so the useless throwing was released which was the main 
alteration. I further optimized the logic of the caller (removed 
useless array pushing, added early returns and reduced FS calls) 
but that was probably relatively small.

I only really checked in the debugger how fast the loop was 
roughly, where it was before like 40-60ms and only after the 
nothrow change it was <1ms. So the nothrow change definitely made 
most of this, considering this was pretty much hot code (called 
for every import, recursively into all D files imported + calling 
FS calls like 3 times for every import path where I had 60 import 
paths)

Jul 21 2020

aberba <karabutaworld gmail.com> writes:

On Tuesday, 21 July 2020 at 12:51:50 UTC, WebFreak001 wrote:
 So it turns out the `resolveImportLocation` function, which is 
 called on the first semantic pass, currently on Windows takes 
 like 50ms to complete once for just about 60 import paths. In a 
 big project like serve-d (as source code project) this would 
 sum up to over 7 minutes. (in release mode just around 2 
 minutes)

 [...]


Nice. DCD improvements bubbles down to many dev tools.

Jul 21 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Speeding up DCD in big projects