digitalmars.D.learn - NGINX Unit and vibe.d Integration Performance

Kyle Ingraham (37/37) Oct 27 2024 Hi there,

ryuukk_ (4/4) Oct 27 2024 Let's take a moment to appreciate how easy it was for you to use

Kyle Ingraham (4/5) Oct 28 2024 It really is. Most of my time setting it up was on getting

Salih Dincer (54/61) Oct 28 2024 Apparently, vibe.d's event loop is not fully compatible with

Kyle Ingraham (15/20) Oct 28 2024 You are right that they aren't compatible. Running them in the

Salih Dincer (4/11) Oct 28 2024 Semaphore?

Salih Dincer (3/4) Oct 28 2024 Please see: https://dlang.org/phobos/core_sync_semaphore.html
Kyle Ingraham (20/31) Oct 31 2024 I went back to try using a semaphore and ended up using a mutex,

Kyle Ingraham (8/13) Oct 31 2024 I forgot to add that once you add delays to my demonstrator and a
Kyle Ingraham (11/12) Oct 31 2024 Here are images as promised:

Kyle Ingraham (20/22) Nov 02 2024 Sönke Ludwig solved this for me here:

Salih Dincer (5/18) Nov 04 2024 I'm glad to hear that. In the world of software, there is

monkyyy (4/8) Nov 04 2024 ?

Kyle Ingraham <kyle kyleingraham.com> writes:

Hi there,
I'm looking for help with the performance of an integration I'm 
trying to write between NGINX Unit and D. Here are two minimal 
demos I've put together:
- https://github.com/kyleingraham/unit-d-hello-world (NGINX 
Unit/D)
- https://github.com/kyleingraham/unit-vibed-hello-world (NGINX 
Unit/vibe.d)

The first integration achieves ~43k requests per second on my 
computer. That matches what I've been able to achieve with a 
minimal vibe.d project and is I believe the max my benchmark 
configuration on macOS can hit.

The second though only achieves ~20k requests per second. In that 
demo I try to make vibe.d's concurrency system available during 
request handling. NGINX Unit's event loop is run in its own 
thread. When requests arrive, Unit sends them to the main thread 
for handling on vibe.d's event loop. I've tried a few methods to 
increase performance but none have been successful:
- Batching messages when sending new request messages to minimize 
overhead. This increased latency and didn't improve on throughput.
- Using vibe.d channels to pass requests. This achieved the same 
performance as message passing. I wasn't able to use the channel 
config that prioritized minimizing overhead as the API didn't 
jive with my use case.
- Using a lock-free queue 
(https://github.com/MartinNowak/lock-free) between threads with a 
loop in the vibe.d thread that constantly polled for requests. 
This method achieves ~43k requests per second but results in 
atrocious CPU usage.

~20k requests per second seems to be the best I can hit with all 
that I've tried. I know vibe.d can do better so I'm thinking 
there's something I'm missing. In profiling I can see that the 
vibe.d thread spends a third of its time in what seems to be 
event loop management code. Am I seeing the effects of Unit's and 
vibe.d's loops being 'out-of-sync' i.e. there being some slack 
time between a message being sent and then being acted upon? Is 
there a better way to integrate NGINX Unit with vibe.d?

Oct 27 2024

ryuukk_ <ryuukk.dev gmail.com> writes:

Let's take a moment to appreciate how easy it was for you to use 
nginx unit from D

https://github.com/kyleingraham/unit-d-hello-world/blob/main/source/unit_integration.c

ImportC is great

Oct 27 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Monday, 28 October 2024 at 05:56:32 UTC, ryuukk_ wrote:
 ImportC is great

It really is. Most of my time setting it up was on getting 
include and linking flags working. Which is exactly what you’d 
run into using C from C.

Oct 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 28 October 2024 at 01:06:58 UTC, Kyle Ingraham wrote:
  ...

 The second though only achieves ~20k requests per second. In 
 that demo I try to make vibe.d's concurrency system available 
 during request handling. NGINX Unit's event loop is run in its 
 own thread. When requests arrive, Unit sends them to the main 
 thread for handling on vibe.d's event loop. I've tried a few 
 methods to increase performance...

Apparently, vibe.d's event loop is not fully compatible with 
NGINX Unit's loop, causing performance loss. I wonder if it would 
be wise to use something like an IntrusiveQueue or task pool to 
make it compatible? For example, something like this:

```d
alias IQ = IntrusiveQueue;
struct IntrusiveQueue(T)
{
   import core.atomic;

   private {
     T[] buffer;
     size_t head, tail;
     alias acq = MemoryOrder.acq;
     alias rel = MemoryOrder.rel;
   }

   size_t capacity;
   this(size_t capacity) {
     this.capacity = capacity;
     buffer.length = capacity;
   }

   alias push = enqueue;
   bool enqueue(T item) {
     auto currTail = tail.atomicLoad!acq;
     auto nextTail = (currTail + 1) % capacity;

     if (nextTail == head.atomicLoad!acq)
       return false;

     buffer[currTail] = item;
     atomicStore!rel(tail, nextTail);

     return true;
   }

   alias fetch = dequeue;
   bool dequeue(ref T item) {
     auto currHead = head.atomicLoad!acq;

     if (currHead == tail.atomicLoad!acq)
       return false;

     auto nextTail = (currHead + 1) % capacity;
     item = buffer[currHead];
     atomicStore!rel(head, nextTail);

     return true;
   }
}

unittest
{
   enum start = 41;
   auto queue = IQ!int(10);
        queue.push(start);
        queue.push(start + 1);

   int item;
   if (queue.fetch(item)) assert(item == start);
   if (queue.fetch(item)) assert(item == start + 1);
}
```

SDB 79

Oct 28 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Monday, 28 October 2024 at 18:37:18 UTC, Salih Dincer wrote:
 Apparently, vibe.d's event loop is not fully compatible with 
 NGINX Unit's loop, causing performance loss. I wonder if it 
 would be wise to use something like an IntrusiveQueue or task 
 pool to make it compatible? For example, something like this:
 ...

You are right that they aren't compatible. Running them in the 
same thread was a no-go (which makes sense given they both want 
to control when code is run).

How would you suggest reading from the queue you provided in the 
vibe.d thread? I tried something similar with 
[lock-free](https://code.dlang.org/packages/lock-free). It was 
easy to push into the queue efficiently from Unit's thread but 
popping from it in vibe.d's was difficult:
- Polling too little killed performance and too often wrecked CPU 
usage.
- Using message passing reduced performance quite a bit.
- Batching reads was hard because it was tricky balancing 
performance for single requests with performance for streams of 
them.

Oct 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 28 October 2024 at 19:57:41 UTC, Kyle Ingraham wrote:
 
 - Polling too little killed performance and too often wrecked 
 CPU usage.
 - Using message passing reduced performance quite a bit.
 - Batching reads was hard because it was tricky balancing 
 performance for single requests with performance for streams of 
 them.

Semaphore?

https://demirten-gitbooks-io.translate.goog/linux-sistem-programlama/content/semaphore/operations.html?_x_tr_sl=tr&_x_tr_tl=en&_x_tr_hl=tr&_x_tr_pto=wapp

SDB 79

Oct 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 28 October 2024 at 20:53:32 UTC, Salih Dincer wrote:
 Semaphore?

Please see: https://dlang.org/phobos/core_sync_semaphore.html

SDB 79

Oct 28 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Monday, 28 October 2024 at 20:53:32 UTC, Salih Dincer wrote:
On Monday, 28 October 2024 at 19:57:41 UTC, Kyle Ingraham wrote:

- Polling too little killed performance and too often wrecked
CPU usage.
- Using message passing reduced performance quite a bit.
- Batching reads was hard because it was tricky balancing
performance for single requests with performance for streams
of them.

Semaphore?

https://demirten-gitbooks-io.translate.goog/linux-sistem-programlama/content/semaphore/operations.html?_x_tr_sl=tr&_x_tr_tl=en&_x_tr_hl=tr&_x_tr_pto=wapp

SDB 79

I went back to try using a semaphore and ended up using a mutex,
an event, and a lock-free queue. My aim was to limit the amount
of vibe.d events emitted to hopefully limit event loop overhead.
It works as follows:

- Requests come in on the Unit thread and are added to the
lock-free queue.
- The Unit thread tries to obtain the mutex. If it cannot, it
assumes request processing is in progress on the vibe.d thread
and does not emit an event.
- In the vibe.d thread it waits on an event. Once it arrives, it
obtains the mutex and pulls from the lock-free queue until it is
empty.
- Once the queue is empty the vibe.d thread releases the mutex
and waits for another event.

This approach increased requests processed per events
emitted/waited from 1:1 to 10:1. This had no impact on event loop
overhead however. The entire program still spends ~50% of its
runtime in this function:
https://github.com/vibe-d/eventcore/blob/0cdddc475965824f32d32c9e4a1dfa58bd616cc9/source/eventcore/drivers/po
ix/cfrunloop.d#L38. I'll see if I can get images here of my profiling. I'm sure
I'm missing something obvious here.

Oct 31 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Thursday, 31 October 2024 at 16:43:09 UTC, Kyle Ingraham wrote:
 This approach increased requests processed per events 
 emitted/waited from 1:1 to 10:1. This had no impact on event 
 loop overhead however. The entire program still spends ~50% of 
 its runtime in this function: 
 https://github.com/vibe-d/eventcore/blob/0cdddc475965824f32d32c9e4a1dfa58bd616cc9/source/eventcore/drivers/po
ix/cfrunloop.d#L38. I'll see if I can get images here of my profiling. I'm sure
I'm missing something obvious here.

I forgot to add that once you add delays to my demonstrator and a 
program using vibe.d's web framework the two have similar 
performance numbers. Adding a 10ms sleep resulted in 600 req/s 
for my demonstrator and 630 req/s for vibe.d.

It's encouraging to see the benefit of vibe.d's concurrency 
system with delays added. I'd like to be able to use it without 
drastically affecting throughput for no-delay cases however.

Oct 31 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Thursday, 31 October 2024 at 16:43:09 UTC, Kyle Ingraham wrote:
 ..I'll see if I can get images here of my profiling...

Here are images as promised:
- A flame graph - 
https://blog.kyleingraham.com/wp-content/uploads/2024/10/screenshot-2024-10-30-at-11.47.57e280afpm.png
- A call tree - 
https://blog.kyleingraham.com/wp-content/uploads/2024/10/screenshot-2024-10-30-at-11.53.46e280afpm.png

In the flame graph there are two threads: Main Thread and 
thread_entryPoint. NGINX Unit runs in thread_entryPoint. vibe.d 
and my request handing code run in Main Thread. My request 
handing code is grouped under fiber_entryPoint within Main 
Thread. vibe.d's code is grouped under 'start' in Main Thread.

Oct 31 2024

Kyle Ingraham <kyle kyleingraham.com> writes:

On Monday, 28 October 2024 at 01:06:58 UTC, Kyle Ingraham wrote:
 I know vibe.d can do better so I'm thinking there's something 
 I'm missing.

Sönke Ludwig solved this for me here: 
https://github.com/vibe-d/vibe.d/issues/2807#issue-2630501194

The solution was to switch to a configuration for eventcore that 
uses kqueue directly instead of CFRunLoop. Doing that brought 
performance back to the stratosphere.

Solution from the GitHub issue:

"You can add an explicit sub configuration to dub.json:

```json
"dependencies": {
	"vibe-d": "~>0.10.1",
	"eventcore": "~>0.9.34"
},
"subConfigurations": {
	"eventcore": "kqueue"
},
```
Or you could pass --override-config=eventcore/kqueue to the dub 
invocation to try it out temporarily."

I elected to go with the command line flag approach.

Nov 02 2024

Salih Dincer <salihdb hotmail.com> writes:

On Sunday, 3 November 2024 at 00:42:44 UTC, Kyle Ingraham wrote:
 
 "You can add an explicit sub configuration to dub.json:

 ```json
 "dependencies": {
 	"vibe-d": "~>0.10.1",
 	"eventcore": "~>0.9.34"
 },
 "subConfigurations": {
 	"eventcore": "kqueue"
 },
 ```
 Or you could pass --override-config=eventcore/kqueue to the dub 
 invocation to try it out temporarily."


I'm glad to hear that. In the world of software, there is 
actually no problem that cannot be solved; except for the halting 
problem :)

SDB 79

Nov 04 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Monday, 4 November 2024 at 18:05:25 UTC, Salih Dincer wrote:
 
 I'm glad to hear that. In the world of software, there is 
 actually no problem that cannot be solved; except for the 
 halting problem :)

?

Id argue theres entire families of problems that are unsolvable; 
the halting problem may just be a root

Nov 04 2024

D Programming

C/C++ Programming

Other

digitalmars.D.learn - NGINX Unit and vibe.d Integration Performance