digitalmars.D - Speeding up DCD in big projects
- WebFreak001 (19/19) Jul 21 2020 So it turns out the `resolveImportLocation` function, which is
- WebFreak001 (4/4) Jul 21 2020 Essential to this were `nothrow` overloads of isFile/isDir in
- Bruce Carneal (5/9) Jul 21 2020 Was the change to nothrow the only alteration? I ask because I
- H. S. Teoh (11/22) Jul 21 2020 [...]
- Steven Schveighoffer (9/29) Jul 21 2020 If you look at the change, it's not the nothrow optimization that Walter...
- H. S. Teoh (12/20) Jul 21 2020 [...]
- Steven Schveighoffer (4/24) Jul 21 2020 Agree. Returning false seems appropriate as something that doesn't exist...
- aberba (2/17) Jul 21 2020
- H. S. Teoh (14/21) Jul 21 2020 [...]
- Jacob Carlborg (115/121) Jul 22 2020 If it shouldn't throw, how should it deal with other IO errors than
- H. S. Teoh (27/44) Aug 01 2020 A non-existent file/dir is an expected outcome; an I/O error (e.g. disk
- H. S. Teoh (11/15) Jul 21 2020 [...]
- WebFreak001 (3/18) Jul 21 2020 don't do this, this introduces a race condition (and is the
- H. S. Teoh (7/18) Jul 21 2020 [...]
- Steven Schveighoffer (3/19) Jul 21 2020 We'd probably need a different name. Unfortunately.
- H. S. Teoh (7/12) Jul 21 2020 [...]
- WebFreak001 (21/30) Jul 21 2020 before code was like this:
- aberba (2/9) Jul 21 2020 Nice. DCD improvements bubbles down to many dev tools.
So it turns out the `resolveImportLocation` function, which is called on the first semantic pass, currently on Windows takes like 50ms to complete once for just about 60 import paths. In a big project like serve-d (as source code project) this would sum up to over 7 minutes. (in release mode just around 2 minutes) I optimized the import path check code and now in DCD debug mode you might get a 83x speed up for your first completion and in DCD release mode you might get a 26x speed up for your first completion. In serve-d (as source code project) this translated to the first completion now only taking 5 seconds. I think in big projects optimizing the startup time too is pretty essential and this was a very easy improvement I immediately found using the debugger. The example times are made using DCD ~master with my dsymbol PR on and off inside the source code of serve-d trying to auto complete extension.d:860 This was a very easy to find and improve spot, there might be more low hanging fruit to get the first parse and completion even quicker than 5s. The PR: https://github.com/dlang-community/dsymbol/pull/151
Jul 21 2020
Essential to this were `nothrow` overloads of isFile/isDir in std.file, which I now manually implemented. Having functions for those in phobos would be awesome and clearly would offer huge performance improvements.
Jul 21 2020
On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:Essential to this were `nothrow` overloads of isFile/isDir in std.file, which I now manually implemented. Having functions for those in phobos would be awesome and clearly would offer huge performance improvements.Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
Jul 21 2020
On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d wrote:On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:[...] I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O. An I/O roundtrip far outweighs whatever meager savings you may have won with nothrow. I have a hard time conceiving of a situation where nothrow would confer significant performance improvements to isFile/isDir. IMO you'd get much better savings by reducing the number of I/O roundtrips instead. T -- Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.Essential to this were `nothrow` overloads of isFile/isDir in std.file, which I now manually implemented. Having functions for those in phobos would be awesome and clearly would offer huge performance improvements.Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
Jul 21 2020
On 7/21/20 1:02 PM, H. S. Teoh wrote:On Tue, Jul 21, 2020 at 04:46:23PM +0000, Bruce Carneal via Digitalmars-d wrote:If you look at the change, it's not the nothrow optimization that Walter always talks about (where code that is marked nothrow performs slightly better than code that isn't marked nothrow but doesn't end up throwing), it's that the COMMON CASE was that an exception is thrown and caught, and then bool is returned. Instead, just return the bool. Also, just because it involves file info, doesn't mean it's doing disk i/o for every call. -SteveOn Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:[...] I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O. An I/O roundtrip far outweighs whatever meager savings you may have won with nothrow. I have a hard time conceiving of a situation where nothrow would confer significant performance improvements to isFile/isDir. IMO you'd get much better savings by reducing the number of I/O roundtrips instead.Essential to this were `nothrow` overloads of isFile/isDir in std.file, which I now manually implemented. Having functions for those in phobos would be awesome and clearly would offer huge performance improvements.Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
Jul 21 2020
On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 7/21/20 1:02 PM, H. S. Teoh wrote:[...][...]I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O.If you look at the change, it's not the nothrow optimization that Walter always talks about (where code that is marked nothrow performs slightly better than code that isn't marked nothrow but doesn't end up throwing), it's that the COMMON CASE was that an exception is thrown and caught, and then bool is returned. Instead, just return the bool.[...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design. T -- Век живи - век учись. А дураком помрёшь.
Jul 21 2020
On 7/21/20 1:32 PM, H. S. Teoh wrote:On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via Digitalmars-d wrote:Agree. Returning false seems appropriate as something that doesn't exist is clearly not a file or directory. -SteveOn 7/21/20 1:02 PM, H. S. Teoh wrote:[...][...]I'm skeptical of "huge performance improvements" with nothrow in anything that involves disk I/O.If you look at the change, it's not the nothrow optimization that Walter always talks about (where code that is marked nothrow performs slightly better than code that isn't marked nothrow but doesn't end up throwing), it's that the COMMON CASE was that an exception is thrown and caught, and then bool is returned. Instead, just return the bool.[...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.
Jul 21 2020
On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:On Tue, Jul 21, 2020 at 01:12:04PM -0400, Steven Schveighoffer via Digitalmars-d wrote:I would say a work in progress[...][...][...][...][...][...] Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.T
Jul 21 2020
On Tue, Jul 21, 2020 at 05:59:53PM +0000, aberba via Digitalmars-d wrote:On Tuesday, 21 July 2020 at 17:32:39 UTC, H. S. Teoh wrote:[...][...] Hardly. AFAICT the API has been like this for many years now. If it's WIP then "progress" has come to a standstill and we need to do something about it. And as it turns out, there's a bug filed for it: https://issues.dlang.org/show_bug.cgi?id=10240 I am of the opinion that isDir/isFile should not throw if the file doesn't exist. That kind of ivory-tower-ideal API is just excessively heavy-handed and not useful in practice. T -- EMACS = Extremely Massive And Cumbersome SystemWhich also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.I would say a work in progress
Jul 21 2020
On 2020-07-21 19:32, H. S. Teoh wrote:Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed.If it shouldn't throw, how should it deal with other IO errors than non-existing file/dir? What if it exists but you don't have permission? How do you tell the difference between the following conditions when it only returns a bool and doesn't throw: * Exists and is a file // success * Exists and is a file but you don't have permission // error * Does not exist // error There are probably other error conditions as well.It's probably a case of bad API design.No, exceptions is the main error reporting system in D. Using any other system should have really good reasons to not use exceptions. In my opinion, any other system (that I can come up with) results in bad and cumbersome APIs. Returning error codes: Cons: * Steals the return channel * Forces to use out parameters * By default errors are not checked (the compiler won't force you do look at the return value) * Cumbersome to check errors * Cumbersome to propagate errors * Error handling code and regular code uses the same syntax Pros: * Better performance (some argue this is not true) * Does not require any runtime support Returning an optional: Cons: * If an error occurred you only know that, not what error occurred * Cumbersome to check errors without language support * Cumbersome to propagate errors without language support Pros: * Error checking is better then error codes because you will be forced to check the error to get to the returned value * The return channel can still be used * No need to use out parameters Returning a result object: * More or less the same as "optional" but with the advantage of the error that occurred is available * Another advantage over error codes is that it's usually possible to use an object for the error and not just an int Cocoa error reporting: This model returns a bool to indicate error or not. It "returns" an error object through an out parameter. Similar to returning a result object, but since the out value and the error are separate the is no enforcement to check the error before accessing the out value D Exceptions: Pros: * Leaves the return channel open for what it is intended for, returning values * Not necessary to use out parameters * Errors are handled one way or another. Either by the developer or by the runtime * Easy to propagate errors * Does not clutter regular code with error handling code * Standardized error system Cons: * Slow * Requires runtime support * Any function not marked with nothrow may throw In my opinion the best of solution would be to use the "result object" style of error reporting with syntax sugar supported in the language. This is what's proposed for C++ [1]. The advantage is that we can use the existing syntax which are used for exceptions. In D, it could look like this: enum CopyError { permissionDenied } void copy(string src, string dest) throw(CopyError) { throw CopyError.permissionDenied; } void main() { try copy("foo", "bar"); catch (CopyError e) writeln(e); } Which would be lowered to something like the equivalent of the following code: struct Result(Value, Error) { bool isValue; union { Value value; Error error; } this(Value value) { this.value = value; isValue = true; } this(Error errro) { this.error = error; isValue = true; } } Result!(void, CopyError) copy(string src, string dest) { return Result!(void, CopyError)(CopyError.permissionDenied); } void main() { auto result = copy("foo", "bar"); if (!result.isValue) goto L1; L1: writeln(result.error); } [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0709r0.pdf -- /Jacob Carlborg
Jul 22 2020
On Wed, Jul 22, 2020 at 01:28:56PM +0200, Jacob Carlborg via Digitalmars-d wrote:On 2020-07-21 19:32, H. S. Teoh wrote:A non-existent file/dir is an expected outcome; an I/O error (e.g. disk corruption) is not. The former should be a normal return value; the latter should be an exception. Similarly, if a program is testing whether some pathname is a file, the expectation is that it has the permission to access that path; so a permission error is likewise an exception.Ahh I see. That makes sense then. Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed.If it shouldn't throw, how should it deal with other IO errors than non-existing file/dir? What if it exists but you don't have permission?How do you tell the difference between the following conditions when it only returns a bool and doesn't throw:* Exists and is a file // success * Exists and is a file but you don't have permission // error * Does not exist // errorI didn't say it should never throw, just that it should not throw merely because the file doesn't exist. The typical usage of isFile/isDir is that if a given pathname exists and is a file/dir, then perform some operations on it. In such scenarios, the expectation is that if the file doesn't exist, isFile/isDir should simply return false (it doesn't exist, so it follows logically that it's not a file/dir). If an I/O error should occur, however, then an exception should still be thrown (something is wrong with the filesystem, or the program doesn't have the permissions it thought it had, which is not an expected outcome). The distinction between expected outcome and unexpected outcome is important here. You wouldn't want a function that tests whether some number is negative to throw an exception when it's negative, after all, since that's a normally-expected outcome. But if an memory error was encountered when performing that test, then you *would* want to throw an exception instead of returning true or false, because it's an unexpected outcome, signalling that something is potentially very wrong with the system, so normal computation should not continue. T -- Государство делает вид, что платит нам зарплату, а мы делаем вид, что работаем.
Aug 01 2020
On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote: [...]Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile. T -- Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
Jul 21 2020
On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote: [...]don't do this, this introduces a race condition (and is the reason why this was a function using try-catch in the first place)Which also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile. T
Jul 21 2020
On Tue, Jul 21, 2020 at 07:24:19PM +0000, WebFreak001 via Digitalmars-d wrote:On Tuesday, 21 July 2020 at 19:03:05 UTC, H. S. Teoh wrote:[...][...]There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile.don't do this, this introduces a race condition (and is the reason why this was a function using try-catch in the first place)Yet another reason isDir should not throw! T -- Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
Jul 21 2020
On 7/21/20 3:03 PM, H. S. Teoh wrote:On Tue, Jul 21, 2020 at 10:32:39AM -0700, H. S. Teoh via Digitalmars-d wrote: [...]We'd probably need a different name. Unfortunately. -SteveWhich also begs the question, *why* does it even throw in the first place. The non-existence of a file is a normally-expected outcome of isFile/isDir, throwing in that case seems excessively heavy-handed. It's probably a case of bad API design.[...] There is the workaround of writing: if (f.exists && f.isDir) ... but it's ugly and involves an extra unnecessary syscall roundtrip. Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile.
Jul 21 2020
On Tue, Jul 21, 2020 at 03:25:21PM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 7/21/20 3:03 PM, H. S. Teoh wrote:[...][...] Ugh. :-( T -- I am Ohm of Borg. Resistance is voltage over current.Basically, this kind of API design is just hideous. I propose we remove the throw from isDir/isFile.We'd probably need a different name. Unfortunately.
Jul 21 2020
On Tuesday, 21 July 2020 at 16:46:23 UTC, Bruce Carneal wrote:On Tuesday, 21 July 2020 at 12:53:10 UTC, WebFreak001 wrote:before code was like this: try return isFile(file); catch (FileException) return false; now it's like this: uint attr; bool exists = getAttrs(file, &attr); return exists && attrIsFile(attr); so the useless throwing was released which was the main alteration. I further optimized the logic of the caller (removed useless array pushing, added early returns and reduced FS calls) but that was probably relatively small. I only really checked in the debugger how fast the loop was roughly, where it was before like 40-60ms and only after the nothrow change it was <1ms. So the nothrow change definitely made most of this, considering this was pretty much hot code (called for every import, recursively into all D files imported + calling FS calls like 3 times for every import path where I had 60 import paths)Essential to this were `nothrow` overloads of isFile/isDir in std.file, which I now manually implemented. Having functions for those in phobos would be awesome and clearly would offer huge performance improvements.Was the change to nothrow the only alteration? I ask because I thought that a move to nothrow only wins big if it enables inlining within a hot loop. Perhaps your manual implementation wins in other ways?
Jul 21 2020
On Tuesday, 21 July 2020 at 12:51:50 UTC, WebFreak001 wrote:So it turns out the `resolveImportLocation` function, which is called on the first semantic pass, currently on Windows takes like 50ms to complete once for just about 60 import paths. In a big project like serve-d (as source code project) this would sum up to over 7 minutes. (in release mode just around 2 minutes) [...]Nice. DCD improvements bubbles down to many dev tools.
Jul 21 2020