digitalmars.D - Codecov and CyberShadow failure

RazvanN (4/4) Feb 08 2017 I've noticed a couple of days ago that the 2 components mentioned

Jack Stouffer (3/7) Feb 08 2017 Trying to narrow it down here:

Jack Stouffer (7/8) Feb 09 2017 Still can't find the root cause. I'm also unable to recreate the

Vladimir Panteleev (37/45) Feb 12 2017 Apologies for that. I made the documentation tester mandatory a

Sebastiaan Koppe (4/13) Feb 13 2017 I bet that wasn't easy to find; you must have been tearing your

Joakim (5/9) Feb 08 2017 CyberShadow is Vladimir Panteleev's nickname:

RazvanN <razvan.nitu1305 gmail.com> writes:

I've noticed a couple of days ago that the 2 components mentioned 
in $title aren't working when making PRs. I don't have any 
experience with this, so what is there to be done?

RazvanN

Feb 08 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 8 February 2017 at 17:30:53 UTC, RazvanN wrote:
 I've noticed a couple of days ago that the 2 components 
 mentioned in $title aren't working when making PRs. I don't 
 have any experience with this, so what is there to be done?

 RazvanN

Trying to narrow it down here: 
https://github.com/dlang/phobos/pull/5099

Feb 08 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 8 February 2017 at 21:05:45 UTC, Jack Stouffer 
wrote:
 ...

Still can't find the root cause. I'm also unable to recreate the 
problem locally using the same commands as the doc builder.

We currently have nine PRs in the pipe ready to be merged once 
this error is nailed down. If anyone could lend a hand here, it 
would be very helpful.

Feb 09 2017

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Thursday, 9 February 2017 at 17:41:09 UTC, Jack Stouffer wrote:
 On Wednesday, 8 February 2017 at 21:05:45 UTC, Jack Stouffer 
 wrote:
 ...

 Still can't find the root cause. I'm also unable to recreate 
 the problem locally using the same commands as the doc builder.

 We currently have nine PRs in the pipe ready to be merged once 
 this error is nailed down. If anyone could lend a hand here, it 
 would be very helpful.

Apologies for that. I made the documentation tester mandatory a 
while ago, so extended downtime like this is unacceptable.

In the interest of public disclosure, here is the timeline and 
problems encountered:

- In response to some complaints about forum performance, I 
investigated sources of high I/O on the server, and identified 
the documentation tester as a major culprit. On 2017-02-06, I 
moved the working directory to a tmpfs (/dev/shm), which resulted 
in a dramatic improvement of I/O operations: 
https://dump.thecybershadow.net/d41c095b6a0dcdb7b827499a487b7c65/16%3A42%3A10-upload.png

- I've begun receiving reports on the autotester malfunctioning. 
In the process of debugging this problem, I've discovered a 
second problem: some files on the tmpfs would periodically 
disappear. This is what caused intermittent "file not found" 
errors.

- After some trial and error, I've identified the source of the 
second problem (an unusual systemd behaviour). I've adjusted the 
server configuration on 2017-02-09 to disable the behaviour.

- However, the first problem persisted (which manifested as 
compilation errors in the 2.073.0 version of Phobos). Finally, 
yesterday (2017-02-11) with some experimentation I've discovered 
that the root problem was a latent DMD bug which manifested only 
when the Phobos source files were being passed to it in a certain 
order, which happened to be the file iteration order on tmpfs. 
Details in the pull request: 
https://github.com/dlang/dlang.org/pull/1568

- Now that the PR is merged, master and stable are green again.

I accept that this shouldn't have taken a week to fix, and the 
initial change in question (tmpfs move) would have been better 
done in a test environment. FWIW, in parallel I've been working 
on a full-disk backup strategy to prepare for having one of the 
server's HDDs replaced. (We already have backups of critical 
data, but rebuilding from backups and reinstalling the system 
would result in downtime that can be avoided. The HDDs are 
already in RAID1 configuration, so the full disk backup is a 
precaution.)

Feb 12 2017

Sebastiaan Koppe <mail skoppe.eu> writes:

On Sunday, 12 February 2017 at 15:30:40 UTC, Vladimir Panteleev 
wrote:
 I accept that this shouldn't have taken a week to fix, and the 
 initial change in question (tmpfs move) would have been better 
 done in a test environment. FWIW, in parallel I've been working 
 on a full-disk backup strategy to prepare for having one of the 
 server's HDDs replaced. (We already have backups of critical 
 data, but rebuilding from backups and reinstalling the system 
 would result in downtime that can be avoided. The HDDs are 
 already in RAID1 configuration, so the full disk backup is a 
 precaution.)

I bet that wasn't easy to find; you must have been tearing your 
hair out while debugging. Also kudos for the disclosure.

Feb 13 2017

Joakim <dlang joakim.fea.st> writes:

On Wednesday, 8 February 2017 at 17:30:53 UTC, RazvanN wrote:
 I've noticed a couple of days ago that the 2 components 
 mentioned in $title aren't working when making PRs. I don't 
 have any experience with this, so what is there to be done?

 RazvanN

CyberShadow is Vladimir Panteleev's nickname:

https://github.com/cybershadow

DAutoTest is his automated documentation tester that isn't 
working.

Feb 08 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Codecov and CyberShadow failure