digitalmars.D - code.dlang.org is down

Bastiaan Veelo (3/3) Apr 10 code.dlang.org has not been responding all day.

Sergey (4/7) Apr 10 All credits to Elias:
matheus (3/6) Apr 10 This too: https://run.dlang.io/
=?UTF-8?Q?S=C3=B6nke_Ludwig?= (16/21) Apr 11 I'm not sure what exactly causes it, but the process is in a state where...

Paolo Invernizzi (2/9) Apr 11 https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic...
=?UTF-8?Q?S=C3=B6nke_Ludwig?= (8/33) Apr 11 It's a little bit better now with one source of GC allocations

Luna (3/38) May 10 Anubis would also be an option, at least for the site frontend.

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (4/19) May 10 The problem is that the frontend is CloudFlare now, so I think the only

kdevel (17/25) May 12 I would like to ask you if you thought about some of the following

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (30/62) May 12 In the situation that lead to the issues that would blow up memory usage...

Paolo Invernizzi (5/9) May 12 By the way, I've found this guy fighting back specifically to
kdevel (25/75) May 12 Can those processes be factored out of the vibe.d process?

Mathias Lang (5/6) May 13 Another radical option: Move away from the registry and use an

Denis Feklushkin (4/11) May 13 It seems this approach will solidify GitHub as SPOF for us,

Richard (Rikki) Andrew Cattermole (4/9) May 13 If done right, all dub has is a HTTP url to download the index from.

Bastiaan Veelo <Bastiaan Veelo.net> writes:

code.dlang.org has not been responding all day.

https://www.isitdownrightnow.com/code.dlang.org.html

-- Bastiaan.

Apr 10

Sergey <kornburn yandex.ru> writes:

On Thursday, 10 April 2025 at 18:22:40 UTC, Bastiaan Veelo wrote:
 code.dlang.org has not been responding all day.

 https://www.isitdownrightnow.com/code.dlang.org.html

 -- Bastiaan.

All credits to Elias:

Maybe not all day, but it has some issues
https://updown.io/9st4

Apr 10

matheus <matheus gmail.com> writes:

On Thursday, 10 April 2025 at 18:22:40 UTC, Bastiaan Veelo wrote:
 code.dlang.org has not been responding all day.

 https://www.isitdownrightnow.com/code.dlang.org.html

 -- Bastiaan.

This too: https://run.dlang.io/

Matheus.

Apr 10

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo:
 code.dlang.org has not been responding all day.
 
 https://www.isitdownrightnow.com/code.dlang.org.html
 
 -- Bastiaan.

I'm not sure what exactly causes it, but the process is in a state where 
it's mostly busy with garbage collection. It seems like there is maybe a 
self-reinforcing effect, where incoming connections that time out lead 
to higher GC pressure.

The other thing is that there appear to be some very aggressive crawlers 
that go through the whole site at maximum speed with what looks like 8 
parallel requests. Maybe it's possible to get that under control through 
the Cloudflare frontend? Of course they are using a Safari user agent 
string instead of something that would identify them as bots.

Finally, the fallback server logic for dub doesn't appear to work 
correctly anymore - at least for me it hangs more or less indefinitely 
instead of falling back to codemirror.dlang.org.

I don't have a lot of time to look into this right now, but I'll see if 
I can do something. It would be good if someone with Cloudflare access 
could look into a possible mitigation there.

Apr 11

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote:
 Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo:
 [...]

 I'm not sure what exactly causes it, but the process is in a 
 state where it's mostly busy with garbage collection. It seems 
 like there is maybe a self-reinforcing effect, where incoming 
 connections that time out lead to higher GC pressure.

 [...]

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries

Apr 11

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 11.04.2025 um 11:06 schrieb Sönke Ludwig:
 Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo:
 code.dlang.org has not been responding all day.

 https://www.isitdownrightnow.com/code.dlang.org.html

 -- Bastiaan.

 
 I'm not sure what exactly causes it, but the process is in a state where 
 it's mostly busy with garbage collection. It seems like there is maybe a 
 self-reinforcing effect, where incoming connections that time out lead 
 to higher GC pressure.
 
 The other thing is that there appear to be some very aggressive crawlers 
 that go through the whole site at maximum speed with what looks like 8 
 parallel requests. Maybe it's possible to get that under control through 
 the Cloudflare frontend? Of course they are using a Safari user agent 
 string instead of something that would identify them as bots.
 
 Finally, the fallback server logic for dub doesn't appear to work 
 correctly anymore - at least for me it hangs more or less indefinitely 
 instead of falling back to codemirror.dlang.org.
 
 I don't have a lot of time to look into this right now, but I'll see if 
 I can do something. It would be good if someone with Cloudflare access 
 could look into a possible mitigation there.

It's a little bit better now with one source of GC allocations 
temporarily eliminated. As a workaround, you can manually configure 
codemirror.dlang.org to take precedence by putting this in 
~/.dub/settings.json:

{
	"registryUrls": ["https://codemirror.dlang.org/"]
}

Apr 11

Luna <luna foxgirls.gay> writes:

On Friday, 11 April 2025 at 15:19:18 UTC, Sönke Ludwig wrote:
 Am 11.04.2025 um 11:06 schrieb Sönke Ludwig:
 Am 10.04.2025 um 20:22 schrieb Bastiaan Veelo:
 code.dlang.org has not been responding all day.

 https://www.isitdownrightnow.com/code.dlang.org.html

 -- Bastiaan.

 
 I'm not sure what exactly causes it, but the process is in a 
 state where it's mostly busy with garbage collection. It seems 
 like there is maybe a self-reinforcing effect, where incoming 
 connections that time out lead to higher GC pressure.
 
 The other thing is that there appear to be some very 
 aggressive crawlers that go through the whole site at maximum 
 speed with what looks like 8 parallel requests. Maybe it's 
 possible to get that under control through the Cloudflare 
 frontend? Of course they are using a Safari user agent string 
 instead of something that would identify them as bots.
 
 Finally, the fallback server logic for dub doesn't appear to 
 work correctly anymore - at least for me it hangs more or less 
 indefinitely instead of falling back to codemirror.dlang.org.
 
 I don't have a lot of time to look into this right now, but 
 I'll see if I can do something. It would be good if someone 
 with Cloudflare access could look into a possible mitigation 
 there.

 It's a little bit better now with one source of GC allocations 
 temporarily eliminated. As a workaround, you can manually 
 configure codemirror.dlang.org to take precedence by putting 
 this in ~/.dub/settings.json:

 {
 	"registryUrls": ["https://codemirror.dlang.org/"]
 }

Anubis would also be an option, at least for the site frontend.
https://anubis.techaro.lol/

May 10

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 10.05.2025 um 14:20 schrieb Luna:
 On Friday, 11 April 2025 at 15:19:18 UTC, Sönke Ludwig wrote:
 (...)

 It's a little bit better now with one source of GC allocations 
 temporarily eliminated. As a workaround, you can manually configure 
 codemirror.dlang.org to take precedence by putting this in ~/.dub/ 
 settings.json:

 {
     "registryUrls": ["https://codemirror.dlang.org/"]
 }

 
 Anubis would also be an option, at least for the site frontend.
 https://anubis.techaro.lol/
 

The problem is that the frontend is CloudFlare now, so I think the only 
solution would be to use their own bot labyrinth functionality. I don't 
have access to the Cloudflare account, though.

May 10

kdevel <kdevel vogtner.de> writes:

I would like to ask you if you thought about some of the following
measures (not necessarily in their order of appearance):

On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote:
 [...]
 I'm not sure what exactly causes it, but the process is in a 
 state where it's mostly busy with garbage collection. It seems 
 like there is maybe a self-reinforcing effect, where incoming 
 connections that time out lead to higher GC pressure.

- Turn off the GC.

- Run vibe.d under hard memory/cpu limit (which terminate
   the process if exceeded).

- Deploy a monitoring process which checks if vibe.d responds
   in say 20 ms. If not kill vibe.d with SIGKILL.

- flock with LOCK_EX on an open fd fopend on the vibe.d binary
   right before vibe.d issues the bind call.

- Have a second vibe.d process running blocking on it's flock call
   (= auto restart).

 The other thing is that there appear to be some very aggressive 
 crawlers that go through the whole site at maximum speed with 
 what looks like 8 parallel requests.

To me it seems that code.dlang.org is mostly not a web app but
a web site with static content. Have you been thinking of serving
that static content with apache/nginx using rate limiting
(mod_throttle etc.)? Or putting this content directly to CF who is
also a web hoster?

May 12

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 12.05.2025 um 13:49 schrieb kdevel:
 I would like to ask you if you thought about some of the following
 measures (not necessarily in their order of appearance):
 
 On Friday, 11 April 2025 at 09:06:01 UTC, Sönke Ludwig wrote:
 [...]
 I'm not sure what exactly causes it, but the process is in a state 
 where it's mostly busy with garbage collection. It seems like there is 
 maybe a self-reinforcing effect, where incoming connections that time 
 out lead to higher GC pressure.

 
 - Turn off the GC.

In the situation that lead to the issues that would blow up memory usage 
within a very short amount of time, replacing the bad responsiveness 
with a non-responsive system or in the OOM killer terminating some process.

 
 - Run vibe.d under hard memory/cpu limit (which terminate
    the process if exceeded).

The problem here was not hardware resources, but the fact that the GC 
was stopping the process for a large amount of time, as well as 
allocation overhead. The concurrent GC might improve this, but the 
question is whether that would then lead to excessive memory usage.

 
 - Deploy a monitoring process which checks if vibe.d responds
    in say 20 ms. If not kill vibe.d with SIGKILL.

There is a certain startup overhead and there are some requests that can 
take longer (registry dump being the longest, but also requesting 
information about a dependency graph, which is done by dub). I think 
this should really only be a last-resort approach (e.g. response time > 
5s), because it replaces bad response times with failed requests.

 - flock with LOCK_EX on an open fd fopend on the vibe.d binary
    right before vibe.d issues the bind call.
 
 - Have a second vibe.d process running blocking on it's flock call
    (= auto restart).

This would still mean that active connections will fail, which is not 
ideal in a situation where the restart would be frequently necessary.

 
 The other thing is that there appear to be some very aggressive 
 crawlers that go through the whole site at maximum speed with what 
 looks like 8 parallel requests.

 
 To me it seems that code.dlang.org is mostly not a web app but
 a web site with static content. Have you been thinking of serving
 that static content with apache/nginx using rate limiting
 (mod_throttle etc.)? Or putting this content directly to CF who is
 also a web hoster?

This is not really true, the truly static content has been moved to 
dub.pm a while ago and the rest is made up of dynamic views on the 
package database. Of course it would be possible to cache pages, but 
that wouldn't help against crawlers.

Writing out all package and package list pages and then serving them as 
static content would result in a huge amount of files and occupied 
memory and would be time consuming. This would only really make sense 
when the number of pages would get reduced massively (e.g. no 
per-version package pages, limiting the number of result pages for the 
popular/new/updated package lists).

Optimizing the memory allocation patterns I think is the most efficient 
approach to improve the situation in the short term. Redesigning the 
package update process so that it runs in a separate process that 
communicates with one or more web frontends would enable scaling and 
load balancing, but would be more involved.

May 12

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Monday, 12 May 2025 at 12:07:50 UTC, Sönke Ludwig wrote:

 This is not really true, the truly static content has been 
 moved to dub.pm a while ago and the rest is made up of dynamic 
 views on the package database. Of course it would be possible 
 to cache pages, but that wouldn't help against crawlers.

By the way, I've found this guy fighting back specifically to 
aggressive AI crawlers with a bizzarre technique: gzip bombs.

https://idiallo.com/blog/zipbomb-protection

/Paolo

May 12

kdevel <kdevel vogtner.de> writes:

On Monday, 12 May 2025 at 12:07:50 UTC, Sönke Ludwig wrote:
[...]
 - Turn off the GC.

 In the situation that lead to the issues that would blow up 
 memory usage within a very short amount of time, replacing the 
 bad responsiveness with a non-responsive system or in the OOM 
 killer terminating some process.

Can those processes be factored out of the vibe.d process?

 - Run vibe.d under hard memory/cpu limit (which terminate
    the process if exceeded).

 The problem here was not hardware resources, but the fact that 
 the GC was stopping the process for a large amount of time, as 
 well as allocation overhead.

Why does a/the (CGI) process have to live longer than the socket
on which the response is sent to the client? The cleanup can
easily be done by the kernel.

 [...]
 
 - Deploy a monitoring process which checks if vibe.d responds
    in say 20 ms. If not kill vibe.d with SIGKILL.

 There is a certain startup overhead and there are some requests 
 that can take longer (registry dump being the longest, but also 
 requesting information about a dependency graph, which is done 
 by dub).

Can these time consuming startup tasks be performed before the
bind call is issued?

 I think this should really only be a last-resort approach (e.g. 
 response time > 5s), because it replaces bad response times 
 with failed requests.

Does all this concern only code behind the login?

 - flock with LOCK_EX on an open fd fopend on the vibe.d binary
    right before vibe.d issues the bind call.
 
 - Have a second vibe.d process running blocking on it's flock 
 call
    (= auto restart).

 This would still mean that active connections will fail, which 
 is not ideal in a situation where the restart would be 
 frequently necessary.

Not necessarily AFAIR but I would have to look that up (socket
sharing, hot restart ...).

 [...]
 To me it seems that code.dlang.org is mostly not a web app but
 a web site with static content. Have you been thinking of 
 serving
 that static content with apache/nginx using rate limiting
 (mod_throttle etc.)? Or putting this content directly to CF 
 who is
 also a web hoster?

 This is not really true, the truly static content has been 
 moved to dub.pm

With "static content" I mean data/information that only changes
slowly or rarely.

 a while ago and the rest is made up of dynamic views on the 
 package database. Of course it would be possible to cache 
 pages, but that wouldn't help against crawlers.

- Block them with robots.txt by their UA strings?
- Throttle request per sec by UA string/IP/Network/AS?
- Require a cookie for accessing the page. That is easy to 
reinvent.

 Writing out all package and package list pages and then serving 
 them as static content would result in a huge amount of files 
 and occupied memory and would be time consuming.

If the final pages are assembled in the user agent it will suffice
to put only the "variable" (nontheless static) content in JSON 
files
on the server.

Like it is done in http://wiki.c2.com/

BTW: Are we talking about less or more than 1 million files?

 This would only really make sense when the number of pages 
 would get reduced massively (e.g. no per-version package pages, 
 limiting the number of result pages for the popular/new/updated 
 package lists).

How many files are changed or created per day? What amount of
memory (MB) is stored on the machine per day?

May 12

Mathias Lang <geod24 gmail.com> writes:

On Monday, 12 May 2025 at 12:59:52 UTC, kdevel wrote:
 [...]

Another radical option: Move away from the registry and use an 
index instead, like Homebrew, Nix, or Cargo are doing. I have a 
WIP to do it, which is 2.2k LoC and counting (but almost there): 
https://github.com/dlang/dub/pull/3023

May 13

Denis Feklushkin <feklushkin.denis gmail.com> writes:

On Tuesday, 13 May 2025 at 11:21:31 UTC, Mathias Lang wrote:
 On Monday, 12 May 2025 at 12:59:52 UTC, kdevel wrote:
 [...]

 Another radical option: Move away from the registry and use an 
 index instead, like Homebrew, Nix, or Cargo are doing.

Can you describe more?

 It allows us to remove a SPOF in our critical infrastructure, 
 as a Github outage would always cause a registry being unusable 
 anyway.

It seems this approach will solidify GitHub as SPOF for us, 
because for now we aren't forced to use Github

May 13

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 14/05/2025 2:13 AM, Denis Feklushkin wrote:
     It allows us to remove a SPOF in our critical infrastructure, as a
     Github outage would always cause a registry being unusable anyway.
 
 It seems this approach will solidify GitHub as SPOF for us, because for 
 now we aren't forced to use Github

If done right, all dub has is a HTTP url to download the index from.

With that capability you can host anywhere you like.

Even with a registry.

May 13

D Programming

C/C++ Programming

Other

digitalmars.D - code.dlang.org is down