digitalmars.D.announce - vibe.d-lite v0.1.0 powered by photon

Dmitry Olshansky (55/55) Sep 18 I have been building Photon[1] scheduler library with the aim to

Steven Schveighoffer (11/14) Sep 18 I think this is fantastic! This is good evidence you are

Dmitry Olshansky (9/22) Sep 18 I might need help getting std.concurrency to run with photon,

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (23/30) Sep 19 I guess vibe-stream/-inet/-http just need to be adjusted due to

Dmitry Olshansky (25/64) Sep 19 I think stream/inet is just updating the deps to be “light”.

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (35/78) Sep 19 Shouldn't it still be possible to set an "interrupted" flag somewhere

Dmitry Olshansky (23/116) Sep 19 Since vibe-core-light depends on syscalls this would mean

Richard (Rikki) Andrew Cattermole (4/14) Sep 19 And more importantly you don't pay for anywhere the same number of
=?UTF-8?Q?S=C3=B6nke_Ludwig?= (37/109) Sep 19 So you don't support timeouts when waiting for an event at all?

Dmitry Olshansky (33/155) Sep 22 Photon's API is the syscall interface. So to wait on an event you

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (45/172) Sep 22 Why can't you then use poll() to for example implement `ManualEvent`

Dmitry Olshansky (76/250) Sep 23 Yes, recv with timeout is basically poll+recv. The problem is

Dmitry Olshansky (24/48) Sep 24 That 15% speed was suspicious, so I looked closer into what I was
=?UTF-8?Q?S=C3=B6nke_Ludwig?= (45/228) Sep 25 I'd probably create an additional event FD per thread used to signal

Dmitry Olshansky (34/160) Sep 25 poll could be made interruptible w/o any additions it's really a

=?UTF-8?Q?S=C3=B6nke_Ludwig?= (16/173) Sep 26 Yes, I think that should be enough to make the semantics compatible.

IchorDev (4/16) Sep 21 I'm dying to see some statistics to show which approach is more

Hipreme (7/12) Sep 19 Conrgatulations on your amazing work! I also agree with you that
Kagamin (2/2) Oct 27 FYI since you use fibers, linux 6.13 got efficient guard page

Dmitry Olshansky (2/4) Oct 28 Cool stuff would be nice to upstream to druntime though.

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

I have been building Photon[1] scheduler library with the aim to 
build high performance servers but since we are already have 
vibe.d and everybody is using it might as well try to speed it 
up. Thus vibe.d-light idea was born - port vibe.d framework on 
top of photon.

So in the last couple of weeks I've been porting vibe-core, 
vibe-stream, vibe-inet and vibe-http to:
a) work with Photon instead of eventcore
b) speed things up as I go, since I like fast things

The end result is that running bench-http-server from vibe-http 
examples I get 1.6M rps with my version vs 243k rps on vanila 
running on 48 rather weak cores.

Ofc I want people to try it and see how it works in terms of 
correctness and speed on more complex projects:

https://code.dlang.org/packages/vibe-d-light

https://github.com/DmitryOlshansky/vibe.d

Though most of work goes on in deps:
https://github.com/DmitryOlshansky/vibe-http
https://github.com/DmitryOlshansky/vibe-core

See also Photon the machinery behind it all:
https://github.com/DmitryOlshansky/photon

Warning - this is likely Linux only at the moment, though I 
expect MacOS to also work.

Key differences so far:

1. photon powered *-light versions always run multi-threaded 
utilizing whatever number of cores you gave them with e.g. 
taskset or all of them by default. No need to muck around with 
setting up multiple event loops and if you did - don't worry 
it'll still do the right thing.
2. There is no Interruptible* mutexes, condvars or anything 
photon doesn't support the notion and code that relies on 
interrupt needs to be rethought (including some part of vibe.d 
itself).
3. UDP is stubbed out, because I do not have much of sensible 
examples utilizing UDP and felt wrong to port it only to leave it 
untested. Anyone using UDP with vibe.d is welcome to give me good 
examples preferably with multicast.
4. Timers... Photon has timer related functionality in particular 
sleeps but not quite what vibe.d wants so at this point timers 
are stubbed out.
5. Fibers are scheduled roughly to the least loaded cores so all 
of LocalThis LocalThat are in fact SharedThis and SharedThat, 
simplifying the whole thing and making it easier to scale.
6. Processes and process management is stubbed out until I find a 
way to implement them in Photon.
7. Files work but may block the thread in some cases, still need 
a little bit more support in Photon.
8. Worker threads - there is none at the moment, all is scheduled 
on the same pool. Practically speaking this should only affect 
CPU intensive tasks, the rest already avoids blocking on syscall 
so the primary need for workers is nil.
9. Maybe something else I forgot in the midst of it all.

So closing thoughts - is anyone willing to help me iron out the 
inevitable bugs and improve things beyond this proof of concept? 
What the community thinks of the idea in general?

Sep 18

Steven Schveighoffer <schveiguy gmail.com> writes:

On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry Olshansky 
wrote:
 So closing thoughts - is anyone willing to help me iron out the 
 inevitable bugs and improve things beyond this proof of 
 concept? What the community thinks of the idea in general?

I think this is fantastic! This is good evidence you are 
following the right path.

And it's a great test for the concept. If there's anything that 
you run into that might be difficult to solve or understand, 
please post it!

If I get a chance, I'll try it on my vibe-d server, but probably 
not for production at this point. The RPS isn't high anyways, 
probably more like 1 request every few minutes.

-Steve

Sep 18

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 19 September 2025 at 01:43:04 UTC, Steven 
Schveighoffer wrote:
 On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry 
 Olshansky wrote:
 So closing thoughts - is anyone willing to help me iron out 
 the inevitable bugs and improve things beyond this proof of 
 concept? What the community thinks of the idea in general?

 I think this is fantastic! This is good evidence you are 
 following the right path.

 And it's a great test for the concept. If there's anything that 
 you run into that might be difficult to solve or understand, 
 please post it!

I might need help getting std.concurrency to run with photon, 
vibe.d kind of works with it but I’m missing something. I got 
scheduler implemented forwarding to the right photon’s condvars 
and mutexes but I have no idea about Tid management i.e. how do I 
register/setup a fiber for Tid?

 If I get a chance, I'll try it on my vibe-d server, but 
 probably not for production at this point. The RPS isn't high 
 anyways, probably more like 1 request every few minutes.

Yeah, absolutely not in production! Cannot stress enough, this is 
all alpha quality.

Sep 18

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

 So in the last couple of weeks I've been porting vibe-core, vibe-stream,
vibe-inet and vibe-http to: 

I guess vibe-stream/-inet/-http just need to be adjusted due to 
limitations of vibe-core-lite? Would it make sense to upstream those in 
some way (`version (Have_vibe_core_lite)` if necessary) to avoid 
diverging more than necessary?

More broadly, it would be interesting how to best organize this in a way 
that avoids code duplication as much as possible and ensures that the 
APIs don't deviate (although vibe-core has been very stable).

 2. There is no Interruptible* mutexes, condvars or anything photon 
 doesn't support the notion and code that relies on interrupt needs to be 
 rethought (including some part of vibe.d itself).

Is this a fundamental limitation, or could it be implemented in the 
future? I know interruption/cancellation is generally problematic to get 
to work across platforms, but interruptible sleep() could at least be 
implemented by waiting on an an event with timeout, and I guess sleep() 
is the most important candidate to start with.

 5. Fibers are scheduled roughly to the least loaded cores so all of 
 LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying 
 the whole thing and making it easier to scale.

This is okay for `runWorkerTask`, but would be a fundamental deviation 
from vibe-core's threading model. Having the basic `runTask` schedule 
fibers on the calling thread is absolutely critical if there is to be 
any kind of meaningful compatibility with "non-lite" code.

In general, considering that TLS is the default in D, and also 
considering that many libraries are either not thread-safe, or 
explicitly thread-local, I think it's also the right default to schedule 
thread-local and only schedule across multiple threads in situations 
where CPU load is the guiding factor. But being able to get rid of 
low-level synchronization can also be a big performance win.

Anyway, it's great to see this progress, as well as the performance numbers!

Sep 19

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 19 September 2025 at 08:01:35 UTC, Sönke Ludwig wrote:
 So in the last couple of weeks I've been porting vibe-core, 
 vibe-stream, vibe-inet and vibe-http to:

 I guess vibe-stream/-inet/-http just need to be adjusted due to 
 limitations of vibe-core-lite? Would it make sense to upstream 
 those in some way (`version (Have_vibe_core_lite)` if 
 necessary) to avoid diverging more than necessary?

I think stream/inet is just updating the deps to be “light”.  
Maybe some Interruptible* change.
It would be interesting to have vibe-core-light / vibe-core 
compatibility. Http had some less than minor changes but yes the 
most changes are in core.

 More broadly, it would be interesting how to best organize this 
 in a way that avoids code duplication as much as possible and 
 ensures that the APIs don't deviate (although vibe-core has 
 been very stable).

Agreed.

 2. There is no Interruptible* mutexes, condvars or anything 
 photon doesn't support the notion and code that relies on 
 interrupt needs to be rethought (including some part of vibe.d 
 itself).

 Is this a fundamental limitation, or could it be implemented in 
 the future?

The limitation is this - photon operates inside of syscall 
wrappers, those are nothrow so if we get interrupted there is no 
way to throw anything. Plus this could be deep in some C library, 
not sure how exception would propagate but likely missing cleanup 
in the C side.

 I know interruption/cancellation is generally problematic to 
 get to work across platforms, but interruptible sleep() could 
 at least be implemented by waiting on an an event with timeout, 
 and I guess sleep() is the most important candidate to start 
 with.

Sleep is trivial but also kind of pointless, if you want to 
interrupt why not wait on the event and trigger that?

 5. Fibers are scheduled roughly to the least loaded cores so 
 all of LocalThis LocalThat are in fact SharedThis and 
 SharedThat, simplifying the whole thing and making it easier 
 to scale.

 This is okay for `runWorkerTask`, but would be a fundamental 
 deviation from vibe-core's threading model. Having the basic 
 `runTask` schedule fibers on the calling thread is absolutely 
 critical if there is to be any kind of meaningful compatibility 
 with "non-lite" code.

I on the other hand imagine that it’s not. In year 2025 not 
utilizing all of available cores is shameful. The fact that I had 
to dig around to find how vibe.d is supposed to run on multiple 
cores is telling.

 In general, considering that TLS is the default in D, and also 
 considering that many libraries are either not thread-safe, or 
 explicitly thread-local, I think it's also the right default to 
 schedule thread-local and only schedule across multiple threads 
 in situations where CPU load is the guiding factor. But being 
 able to get rid of low-level synchronization can also be a big 
 performance win.

Most TLS using libs would work just fine as long as they are not 
pretending to be “globals” and the whole program to be single 
threaded. Say TLS random has thread-local state but there is no 
problem with multiple fibers sharing this state nor any problem 
that  fibers in different threads do not “see” each other changes 
to this state.


 Anyway, it's great to see this progress, as well as the 
 performance numbers!

Yeah, but I still think there is potential to go faster ;)

Sep 19

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 19.09.25 um 12:33 schrieb Dmitry Olshansky:
 2. There is no Interruptible* mutexes, condvars or anything photon 
 doesn't support the notion and code that relies on interrupt needs to 
 be rethought (including some part of vibe.d itself).

 Is this a fundamental limitation, or could it be implemented in the 
 future?

 
 The limitation is this - photon operates inside of syscall wrappers, 
 those are nothrow so if we get interrupted there is no way to throw 
 anything. Plus this could be deep in some C library, not sure how 
 exception would propagate but likely missing cleanup in the C side.

Shouldn't it still be possible to set an "interrupted" flag somewhere 
and let only the vibe-core-lite APIs throw? Low level C functions should 
of course stay unaffected.

 I know interruption/cancellation is generally problematic to get to 
 work across platforms, but interruptible sleep() could at least be 
 implemented by waiting on an an event with timeout, and I guess 
 sleep() is the most important candidate to start with.

 
 Sleep is trivial but also kind of pointless, if you want to interrupt 
 why not wait on the event and trigger that?

It's more of a timeout pattern that I've seen multiple times, there are 
certainly multiple (better) alternatives, but if compatibility with 
existing code is the goal then this would still be important.

 
 5. Fibers are scheduled roughly to the least loaded cores so all of 
 LocalThis LocalThat are in fact SharedThis and SharedThat, 
 simplifying the whole thing and making it easier to scale.

 This is okay for `runWorkerTask`, but would be a fundamental deviation 
 from vibe-core's threading model. Having the basic `runTask` schedule 
 fibers on the calling thread is absolutely critical if there is to be 
 any kind of meaningful compatibility with "non-lite" code.

 
 I on the other hand imagine that it’s not. In year 2025 not utilizing 
 all of available cores is shameful. The fact that I had to dig around to 
 find how vibe.d is supposed to run on multiple cores is telling.

Telling in what way? It's really quite simple, you can use plain D 
threads as normal, or you can use task pools, either explicitly, or 
through the default worker task pool using `runWorkerTask` or 
`runWorkerTaskDist`. (Then there are also higher level concepts, such as 
async, performInWorker or parallel(Unordered)Map)

Not everything is CPU bound and using threads "just because" doesn't 
make sense either. This is especially true, because of low level race 
conditions that require special care. D's shared/immutable helps with 
that, but that also means that your whole application suddenly needs to 
use shared/immutable when passing data between tasks.

 In general, considering that TLS is the default in D, and also 
 considering that many libraries are either not thread-safe, or 
 explicitly thread-local, I think it's also the right default to 
 schedule thread-local and only schedule across multiple threads in 
 situations where CPU load is the guiding factor. But being able to get 
 rid of low-level synchronization can also be a big performance win.

 
 Most TLS using libs would work just fine as long as they are not 
 pretending to be “globals” and the whole program to be single threaded. 
 Say TLS random has thread-local state but there is no problem with 
 multiple fibers sharing this state nor any problem that  fibers in 
 different threads do not “see” each other changes to this state.

But TLS variables are always "globals" in the sense that they outlive 
the scope that accesses them. A modification in one thread would 
obviously not be visible in another thread, meaning that you may or may 
not have a semantic connection when you access such a library 
sequentially from multiple tasks.

And then there are said libraries that are not thread-safe at all, or 
are bound to the thread where you initialize them. Or handles returned 
from a library may be bound to the thread that created them. Dealing 
with all of this just becomes needlessly complicated and error-prone, 
especially if CPU cycles are not a concern.

By robbing the user the control over where a task spawns, you are also 
forcing synchronization everywhere, which can quickly become more 
expensive than any benefits you would gain from using multiple threads.

Finally, in the case of web applications, in my opinion the better 
approach for using multiple CPU cores is *usually* by running multiple 
*processes* in parallel, as opposed to multiple threads within a single 
process. Of course, every application is different and there is no 
one-size-fits-all approach.

Sep 19

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 19 September 2025 at 13:22:48 UTC, Sönke Ludwig wrote:
 Am 19.09.25 um 12:33 schrieb Dmitry Olshansky:
 2. There is no Interruptible* mutexes, condvars or anything 
 photon doesn't support the notion and code that relies on 
 interrupt needs to be rethought (including some part of 
 vibe.d itself).

 Is this a fundamental limitation, or could it be implemented 
 in the future?

 
 The limitation is this - photon operates inside of syscall 
 wrappers, those are nothrow so if we get interrupted there is 
 no way to throw anything. Plus this could be deep in some C 
 library, not sure how exception would propagate but likely 
 missing cleanup in the C side.

 Shouldn't it still be possible to set an "interrupted" flag 
 somewhere and let only the vibe-core-lite APIs throw? Low level 
 C functions should of course stay unaffected.

Since vibe-core-light depends on syscalls this would mean 
creating a separate set of API for vibe-core-light which is not 
something I’d like to do.

 I know interruption/cancellation is generally problematic to 
 get to work across platforms, but interruptible sleep() could 
 at least be implemented by waiting on an an event with 
 timeout, and I guess sleep() is the most important candidate 
 to start with.

 
 Sleep is trivial but also kind of pointless, if you want to 
 interrupt why not wait on the event and trigger that?

 It's more of a timeout pattern that I've seen multiple times, 
 there are certainly multiple (better) alternatives, but if 
 compatibility with existing code is the goal then this would 
 still be important.

I guess, again most likely I’d need to create API specifically 
for vibe. Also that would mean interrupt becomes part of photon 
but only works when certain APIs are used. This is bad.
 
 5. Fibers are scheduled roughly to the least loaded cores so 
 all of LocalThis LocalThat are in fact SharedThis and 
 SharedThat, simplifying the whole thing and making it easier 
 to scale.

 This is okay for `runWorkerTask`, but would be a fundamental 
 deviation from vibe-core's threading model. Having the basic 
 `runTask` schedule fibers on the calling thread is absolutely 
 critical if there is to be any kind of meaningful 
 compatibility with "non-lite" code.

 
 I on the other hand imagine that it’s not. In year 2025 not 
 utilizing all of available cores is shameful. The fact that I 
 had to dig around to find how vibe.d is supposed to run on 
 multiple cores is telling.

 Telling in what way?

That running single threaded is the intended model.

 It's really quite simple, you can use plain D threads as 
 normal, or you can use task pools, either explicitly, or 
 through the default worker task pool using `runWorkerTask` or 
 `runWorkerTaskDist`. (Then there are also higher level 
 concepts, such as async, performInWorker or 
 parallel(Unordered)Map)

This does little to the most important case - handling requests 
in parallel. Yeah there are pool and such for cases where going 
parallel inside of a single request makes sense.

 Not everything is CPU bound and using threads "just because" 
 doesn't make sense either. This is especially true, because of 
 low level race conditions that require special care. D's 
 shared/immutable helps with that, but that also means that your 
 whole application suddenly needs to use shared/immutable when 
 passing data between tasks.

I’m dying to know which application not being cpu bound still 
needs to pass data between tasks that are all running on a single 
thread.

 In general, considering that TLS is the default in D, and 
 also considering that many libraries are either not 
 thread-safe, or explicitly thread-local, I think it's also 
 the right default to schedule thread-local and only schedule 
 across multiple threads in situations where CPU load is the 
 guiding factor. But being able to get rid of low-level 
 synchronization can also be a big performance win.

 
 Most TLS using libs would work just fine as long as they are 
 not pretending to be “globals” and the whole program to be 
 single threaded. Say TLS random has thread-local state but 
 there is no problem with multiple fibers sharing this state 
 nor any problem that  fibers in different threads do not “see” 
 each other changes to this state.

 But TLS variables are always "globals" in the sense that they 
 outlive the scope that accesses them. A modification in one 
 thread would obviously not be visible in another thread, 
 meaning that you may or may not have a semantic connection when 
 you access such a library sequentially from multiple tasks.

 And then there are said libraries that are not thread-safe at 
 all, or are bound to the thread where you initialize them. Or 
 handles returned from a library may be bound to the thread that 
 created them. Dealing with all of this just becomes needlessly 
 complicated and error-prone, especially if CPU cycles are not a 
 concern.

TLS is fine for using not thread safe library - just make sure 
you initialize it for all threads. I do not switch or otherwise 
play dirty tricks with TLS.

 By robbing the user the control over where a task spawns, you 
 are also forcing synchronization everywhere, which can quickly 
 become more expensive than any benefits you would gain from 
 using multiple threads.

Either of default kind of rob user of control of where the task 
spawns. Which is sensible a user shouldn’t really care.

 Finally, in the case of web applications, in my opinion the 
 better approach for using multiple CPU cores is *usually* by 
 running multiple *processes* in parallel, as opposed to 
 multiple threads within a single process. Of course, every 
 application is different and there is no one-size-fits-all 
 approach.

There we differ, not only load balancing is simpler within a 
single application but also processes are more expansive. Current 
D GC situation kind of sucks on multithreaded workloads but that 
is the only reason to go multiprocess IMHO.

Sep 19

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 20/09/2025 4:29 AM, Dmitry Olshansky wrote:
     Finally, in the case of web applications, in my opinion the better
     approach for using multiple CPU cores is /usually/ by running
     multiple /processes/ in parallel, as opposed to multiple threads
     within a single process. Of course, every application is different
     and there is no one-size-fits-all approach.
 
 There we differ, not only load balancing is simpler within a single 
 application but also processes are more expansive. Current D GC 
 situation kind of sucks on multithreaded workloads but that is the only 
 reason to go multiprocess IMHO.

And more importantly you don't pay for anywhere the same number of 
context switches if you can let IOCP/epoll handle scheduling.

But alas, that means thread safety which fibers can't do.

Sep 19

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 19.09.25 um 18:29 schrieb Dmitry Olshansky:
 Shouldn't it still be possible to set an "interrupted" flag somewhere 
 and let only the vibe-core-lite APIs throw? Low level C functions 
 should of course stay unaffected.

 
 Since vibe-core-light depends on syscalls this would mean creating a 
 separate set of API for vibe-core-light which is not something I’d like 
 to do.
 It's more of a timeout pattern that I've seen multiple times, there 
 are certainly multiple (better) alternatives, but if compatibility 
 with existing code is the goal then this would still be important.

 
 I guess, again most likely I’d need to create API specifically for vibe. 
 Also that would mean interrupt becomes part of photon but only works 
 when certain APIs are used. This is bad.

So you don't support timeouts when waiting for an event at all? 
Otherwise I don't see why a separate API would be required, this should 
be implementable with plain Posix APIs within vibe-core-lite itself.

 I on the other hand imagine that it’s not. In year 2025 not utilizing 
 all of available cores is shameful. The fact that I had to dig around 
 to find how vibe.d is supposed to run on multiple cores is telling.

 Telling in what way?

 
 That running single threaded is the intended model.

Obviously this is wrong, though.

 It's really quite simple, you can use plain D threads as normal, or 
 you can use task pools, either explicitly, or through the default 
 worker task pool using `runWorkerTask` or `runWorkerTaskDist`. (Then 
 there are also higher level concepts, such as async, performInWorker 
 or parallel(Unordered)Map)

 
 This does little to the most important case - handling requests in 
 parallel. Yeah there are pool and such for cases where going parallel 
 inside of a single request makes sense.


```
runWorkerTaskDist({
	HTTPServerSettings settings;
	settings.options |= HTTPServerOption.reusePort;
	listenHTTP(settings);
});
```

 
 Not everything is CPU bound and using threads "just because" doesn't 
 make sense either. This is especially true, because of low level race 
 conditions that require special care. D's shared/immutable helps with 
 that, but that also means that your whole application suddenly needs 
 to use shared/immutable when passing data between tasks.

 
 I’m dying to know which application not being cpu bound still needs to 
 pass data between tasks that are all running on a single thread.

Anything client side involving a user interface has plenty of 
opportunities for employing secondary tasks or long-running sparsely 
updated state logic that are not CPU bound. Most of the time is spent 
idle there. Specific computations on the other hand can of course still 
be handed off to other threads.

 But TLS variables are always "globals" in the sense that they outlive 
 the scope that accesses them. A modification in one thread would 
 obviously not be visible in another thread, meaning that you may or 
 may not have a semantic connection when you access such a library 
 sequentially from multiple tasks.

 And then there are said libraries that are not thread-safe at all, or 
 are bound to the thread where you initialize them. Or handles returned 
 from a library may be bound to the thread that created them. Dealing 
 with all of this just becomes needlessly complicated and error-prone, 
 especially if CPU cycles are not a concern.

 
 TLS is fine for using not thread safe library - just make sure you 
 initialize it for all threads. I do not switch or otherwise play dirty 
 tricks with TLS.

The problem is that for example you might have a handle that was created 
in thread A and is not valid in thread B, or you set a state in thread A 
and thread B doesn't see that state. This would mean that you are 
limited to a single task for the complete library interaction.

 By robbing the user the control over where a task spawns, you are also 
 forcing synchronization everywhere, which can quickly become more 
 expensive than any benefits you would gain from using multiple threads.

 
 Either of default kind of rob user of control of where the task spawns. 
 Which is sensible a user shouldn’t really care.

This doesn't make sense, in the original vibe-core, you can simply 
choose between spawning in the same thread or in "any" thread. 
`shared`/`immutable` is correctly enforced in the latter case to avoid 
unintended data sharing.

 Finally, in the case of web applications, in my opinion the better 
 approach for using multiple CPU cores is *usually* by running multiple 
 *processes* in parallel, as opposed to multiple threads within a 
 single process. Of course, every application is different and there is 
 no one-size-fits-all approach.

 
 There we differ, not only load balancing is simpler within a single 
 application but also processes are more expansive. Current D GC 
 situation kind of sucks on multithreaded workloads but that is the only 
 reason to go multiprocess IMHO.

The GC/malloc is the main reason why this is mostly false in practice, 
but it extends to any central contention source within the process - 
yes, often you can avoid that, but often that takes a lot of extra work 
and processes sidestep that issue in the first place.

Also, in the usual case where the threads don't have to communicate with 
each other (apart from memory allocation synchronization), a separate 
process per core isn't any slower - except maybe when hyper-threading is 
in play, but whether that helps or hurts performance always depends on 
the concrete workload. Separate process also have the advantage of being 
more robust and enabling seamless restarts and updates of the 
executable. And they facilitate an application design that lends itself 
to scaling across multiple machines.

Sep 19

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:
 Am 19.09.25 um 18:29 schrieb Dmitry Olshansky:
 Shouldn't it still be possible to set an "interrupted" flag 
 somewhere and let only the vibe-core-lite APIs throw? Low 
 level C functions should of course stay unaffected.

 
 Since vibe-core-light depends on syscalls this would mean 
 creating a separate set of API for vibe-core-light which is 
 not something I’d like to do.
 It's more of a timeout pattern that I've seen multiple times, 
 there are certainly multiple (better) alternatives, but if 
 compatibility with existing code is the goal then this would 
 still be important.

 
 I guess, again most likely I’d need to create API specifically 
 for vibe. Also that would mean interrupt becomes part of 
 photon but only works when certain APIs are used. This is bad.

 So you don't support timeouts when waiting for an event at all? 
 Otherwise I don't see why a separate API would be required, 
 this should be implementable with plain Posix APIs within 
 vibe-core-lite itself.

Photon's API is the syscall interface. So to wait on an event you 
just call poll.
Behind the scenes it will just wait on the right fd to change 
state.

Now vibe-core-light wants something like read(buffer, timeout) 
which is not syscall API but maybe added. But since I'm going to 
add new API I'd rather have something consistent and sane not 
just a bunch of adhoc functions to satisfy vibe.d interface.

 I on the other hand imagine that it’s not. In year 2025 not 
 utilizing all of available cores is shameful. The fact that 
 I had to dig around to find how vibe.d is supposed to run on 
 multiple cores is telling.

 Telling in what way?

 
 That running single threaded is the intended model.

 Obviously this is wrong, though.

All the examples plus your last statement on process per core 
being better
makes me conclude that. I don't see how I'm wrong here.

 It's really quite simple, you can use plain D threads as 
 normal, or you can use task pools, either explicitly, or 
 through the default worker task pool using `runWorkerTask` or 
 `runWorkerTaskDist`. (Then there are also higher level 
 concepts, such as async, performInWorker or 
 parallel(Unordered)Map)

 
 This does little to the most important case - handling 
 requests in parallel. Yeah there are pool and such for cases 
 where going parallel inside of a single request makes sense.


 ```
 runWorkerTaskDist({
 	HTTPServerSettings settings;
 	settings.options |= HTTPServerOption.reusePort;
 	listenHTTP(settings);
 });
 ```

Yet this is not the default, and the default is basically single 
threaded.
We have different opinions on what the default should be 
obviously.

 Not everything is CPU bound and using threads "just because" 
 doesn't make sense either. This is especially true, because 
 of low level race conditions that require special care. D's 
 shared/immutable helps with that, but that also means that 
 your whole application suddenly needs to use shared/immutable 
 when passing data between tasks.

 
 I’m dying to know which application not being cpu bound still 
 needs to pass data between tasks that are all running on a 
 single thread.

 Anything client side involving a user interface has plenty of 
 opportunities for employing secondary tasks or long-running 
 sparsely updated state logic that are not CPU bound. Most of 
 the time is spent idle there. Specific computations on the 
 other hand can of course still be handed off to other threads.

Latency still going to be better if multiple cores are utilized.
And I'm still not sure what the example is.

 But TLS variables are always "globals" in the sense that they 
 outlive the scope that accesses them. A modification in one 
 thread would obviously not be visible in another thread, 
 meaning that you may or may not have a semantic connection 
 when you access such a library sequentially from multiple 
 tasks.

 And then there are said libraries that are not thread-safe at 
 all, or are bound to the thread where you initialize them. Or 
 handles returned from a library may be bound to the thread 
 that created them. Dealing with all of this just becomes 
 needlessly complicated and error-prone, especially if CPU 
 cycles are not a concern.

 
 TLS is fine for using not thread safe library - just make sure 
 you initialize it for all threads. I do not switch or 
 otherwise play dirty tricks with TLS.

 The problem is that for example you might have a handle that 
 was created in thread A and is not valid in thread B, or you 
 set a state in thread A and thread B doesn't see that state. 
 This would mean that you are limited to a single task for the 
 complete library interaction.

Or just initialize it lazily in all threads that happen to use it.
Otherwise, this is basically stick to one thread really.

 By robbing the user the control over where a task spawns, you 
 are also forcing synchronization everywhere, which can 
 quickly become more expensive than any benefits you would 
 gain from using multiple threads.

 
 Either of default kind of rob user of control of where the 
 task spawns. Which is sensible a user shouldn’t really care.

 This doesn't make sense, in the original vibe-core, you can 
 simply choose between spawning in the same thread or in "any" 
 thread. `shared`/`immutable` is correctly enforced in the 
 latter case to avoid unintended data sharing.

I have go and goOnSameThread. Guess which is the encouraged 
option.


 Finally, in the case of web applications, in my opinion the 
 better approach for using multiple CPU cores is *usually* by 
 running multiple *processes* in parallel, as opposed to 
 multiple threads within a single process. Of course, every 
 application is different and there is no one-size-fits-all 
 approach.

 
 There we differ, not only load balancing is simpler within a 
 single application but also processes are more expansive. 
 Current D GC situation kind of sucks on multithreaded 
 workloads but that is the only reason to go multiprocess IMHO.

 The GC/malloc is the main reason why this is mostly false in 
 practice, but it extends to any central contention source 
 within the process - yes, often you can avoid that, but often 
 that takes a lot of extra work and processes sidestep that 
 issue in the first place.

As is observable from the look on other languages and runtimes 
malloc is not the bottleneck it used to be. Our particular 
version of GC that doesn't have thread caches is a bottleneck.


 Also, in the usual case where the threads don't have to 
 communicate with each other (apart from memory allocation 
 synchronization), a separate process per core isn't any slower 
 - except maybe when hyper-threading is in play, but whether 
 that helps or hurts performance always depends on the concrete 
 workload.

The fact that context switch has to drop all of virtual address 
spaces does add a bit of overhead. Though to be certain of 
anything there better be a benchmark.

 Separate process also have the advantage of being more robust 
 and enabling seamless restarts and updates of the executable. 
 And they facilitate an application design that lends itself to 
 scaling across multiple machines.

Then give me the example code to run multiple vibe.d in parallel 
processes (should be simillar to runDist) and we can compare 
approaches. For all I know it could be faster then multi-threaded 
vibe.d-light. Also honestly if vibe.d's target is multiple 
processes it should probably start like this by default.

Sep 22

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
 On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:
 So you don't support timeouts when waiting for an event at all? 
 Otherwise I don't see why a separate API would be required, this 
 should be implementable with plain Posix APIs within vibe-core-lite 
 itself.

 
 Photon's API is the syscall interface. So to wait on an event you just 
 call poll.
 Behind the scenes it will just wait on the right fd to change state.
 
 Now vibe-core-light wants something like read(buffer, timeout) which is 
 not syscall API but maybe added. But since I'm going to add new API I'd 
 rather have something consistent and sane not just a bunch of adhoc 
 functions to satisfy vibe.d interface.

Why can't you then use poll() to for example implement `ManualEvent` 
with timeout and interrupt support? And shouldn't recv() with timeout be 
implementable the same way, poll with timeout and only read when ready?

 Telling in what way?

 That running single threaded is the intended model.

 Obviously this is wrong, though.

 
 All the examples plus your last statement on process per core being better
 makes me conclude that. I don't see how I'm wrong here.
 
 (...)

 ```
 runWorkerTaskDist({
     HTTPServerSettings settings;
     settings.options |= HTTPServerOption.reusePort;
     listenHTTP(settings);
 });
 ```

 
 Yet this is not the default, and the default is basically single threaded.
 We have different opinions on what the default should be obviously.

I think we have a misunderstanding of what vibe.d is supposed to be. It 
seems like you are only focused on the web/server role, while to me 
vibe-core is a general-purpose I/O and concurrency system with no 
particular specialization in server tasks. With that view, your 
statement to me sounds like "Clearly D is not meant to do 
multi-threading, since main() is only running in a single thread".

Of course, there could be a high-level component on top of vibe-d:web 
that makes some opinionated assumptions on how to structure a web 
application to ensure it is scalable, but that would go against the idea 
of being a toolkit with functional building blocks, as opposed to a 
framework that dictates your application structure.

 Not everything is CPU bound and using threads "just because" doesn't 
 make sense either. This is especially true, because of low level 
 race conditions that require special care. D's shared/immutable 
 helps with that, but that also means that your whole application 
 suddenly needs to use shared/immutable when passing data between tasks.

 I’m dying to know which application not being cpu bound still needs 
 to pass data between tasks that are all running on a single thread.

 Anything client side involving a user interface has plenty of 
 opportunities for employing secondary tasks or long-running sparsely 
 updated state logic that are not CPU bound. Most of the time is spent 
 idle there. Specific computations on the other hand can of course 
 still be handed off to other threads.

 
 Latency still going to be better if multiple cores are utilized.
 And I'm still not sure what the example is.

We are comparing fiber switches and working on data with a shared cache 
and no synchronization to synchronizing data access and control flow 
between threads/cores. There is such a broad spectrum of possibilities 
for one of those to be faster than the other that it's just silly to 
make a general statement like that.

The thing is that if you always share data between threads, you have to 
pay for that for every single data access, regardless of whether there 
is actual concurrency going on or not.

If you want a concrete example, take a simple download dialog with a 
progress bar. There is no gain in off-loading anything to a separate 
thread here, since this is fully I/O bound, but it adds quite some 
communication complexity if you do. CPU performance is simply not a 
concern here.

 But TLS variables are always "globals" in the sense that they 
 outlive the scope that accesses them. A modification in one thread 
 would obviously not be visible in another thread, meaning that you 
 may or may not have a semantic connection when you access such a 
 library sequentially from multiple tasks.

 And then there are said libraries that are not thread-safe at all, 
 or are bound to the thread where you initialize them. Or handles 
 returned from a library may be bound to the thread that created 
 them. Dealing with all of this just becomes needlessly complicated 
 and error-prone, especially if CPU cycles are not a concern.

 TLS is fine for using not thread safe library - just make sure you 
 initialize it for all threads. I do not switch or otherwise play 
 dirty tricks with TLS.

 The problem is that for example you might have a handle that was 
 created in thread A and is not valid in thread B, or you set a state 
 in thread A and thread B doesn't see that state. This would mean that 
 you are limited to a single task for the complete library interaction.

 
 Or just initialize it lazily in all threads that happen to use it.
 Otherwise, this is basically stick to one thread really.

But then it's a different handle representing a different object - 
that's not the same thing. I'm not just talking about initializing the 
library as a whole. But even if, there are a lot of libraries that don't 
use TLS and are simply not thread-safe at all.

 By robbing the user the control over where a task spawns, you are 
 also forcing synchronization everywhere, which can quickly become 
 more expensive than any benefits you would gain from using multiple 
 threads.

 Either of default kind of rob user of control of where the task 
 spawns. Which is sensible a user shouldn’t really care.

 This doesn't make sense, in the original vibe-core, you can simply 
 choose between spawning in the same thread or in "any" thread. 
 `shared`/`immutable` is correctly enforced in the latter case to avoid 
 unintended data sharing.

 
 I have go and goOnSameThread. Guess which is the encouraged option.

Does go() enforce proper use of shared/immutable when passing data to 
the scheduled "go routine"?

 Finally, in the case of web applications, in my opinion the better 
 approach for using multiple CPU cores is *usually* by running 
 multiple *processes* in parallel, as opposed to multiple threads 
 within a single process. Of course, every application is different 
 and there is no one-size-fits-all approach.

 There we differ, not only load balancing is simpler within a single 
 application but also processes are more expansive. Current D GC 
 situation kind of sucks on multithreaded workloads but that is the 
 only reason to go multiprocess IMHO.

 The GC/malloc is the main reason why this is mostly false in practice, 
 but it extends to any central contention source within the process - 
 yes, often you can avoid that, but often that takes a lot of extra 
 work and processes sidestep that issue in the first place.

 
 As is observable from the look on other languages and runtimes malloc is 
 not the bottleneck it used to be. Our particular version of GC that 
 doesn't have thread caches is a bottleneck.

malloc() will also always be a bottleneck with the right load. Just the 
n times larger amount of virtual address space required may start to 
become an issue for memory heavy applications. But even if ignore that, 
ruling out using the existing GC doesn't sound like a good idea to me. 
And the fact is that, even with relatively mild GC use, a web 
application will not scale properly with many cores.

 Also, in the usual case where the threads don't have to communicate 
 with each other (apart from memory allocation synchronization), a 
 separate process per core isn't any slower - except maybe when hyper- 
 threading is in play, but whether that helps or hurts performance 
 always depends on the concrete workload.

 
 The fact that context switch has to drop all of virtual address spaces 
 does add a bit of overhead. Though to be certain of anything there 
 better be a benchmark.

There is no context switch involved with each process running on its own 
core.

 Separate process also have the advantage of being more robust and 
 enabling seamless restarts and updates of the executable. And they 
 facilitate an application design that lends itself to scaling across 
 multiple machines.

 
 Then give me the example code to run multiple vibe.d in parallel 
 processes (should be simillar to runDist) and we can compare approaches. 
 For all I know it could be faster then multi-threaded vibe.d-light. Also 
 honestly if vibe.d's target is multiple processes it should probably 
 start like this by default.

Again, the "default" is a high-level issue and none of vibe-core's 
business. The simplest way to have that work is to use 
`HTTPServerOption.reusePort` and then start as many processes as desired.

Sep 22

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:
 Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
 On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig 
 wrote:
 So you don't support timeouts when waiting for an event at 
 all? Otherwise I don't see why a separate API would be 
 required, this should be implementable with plain Posix APIs 
 within vibe-core-lite itself.

 
 Photon's API is the syscall interface. So to wait on an event 
 you just call poll.
 Behind the scenes it will just wait on the right fd to change 
 state.
 
 Now vibe-core-light wants something like read(buffer, timeout) 
 which is not syscall API but maybe added. But since I'm going 
 to add new API I'd rather have something consistent and sane 
 not just a bunch of adhoc functions to satisfy vibe.d 
 interface.

 Why can't you then use poll() to for example implement 
 `ManualEvent` with timeout and interrupt support? And shouldn't 
 recv() with timeout be implementable the same way, poll with 
 timeout and only read when ready?

Yes, recv with timeout is basically poll+recv. The problem is 
that then I need to support interrupts in poll. Nothing really 
changed.
As far as manual event goes I've implemented that with custom 
cond var and mutex. That mutex is not interruptible as it's 
backed by semaphore on slow path in a form of eventfd.
I might create custom mutex that is interruptible I guess but the 
notion of interrupts would have to be introduced to photon. I do 
not really like it.

 I think we have a misunderstanding of what vibe.d is supposed 
 to be. It seems like you are only focused on the web/server 
 role, while to me vibe-core is a general-purpose I/O and 
 concurrency system with no particular specialization in server 
 tasks. With that view, your statement to me sounds like 
 "Clearly D is not meant to do multi-threading, since main() is 
 only running in a single thread".

The defaults are what is important. Go defaults to 
multi-threading for instance.
D defaults to multi-threading because TLS by default is certainly 
a mark of multi-threaded environment. std.concurrency defaults to 
new thread per spawn, again this tells me it's about 
multithreading. I intend to support multi-threading by default. I 
understand that we view this issue differently.

 Of course, there could be a high-level component on top of 
 vibe-d:web that makes some opinionated assumptions on how to 
 structure a web application to ensure it is scalable, but that 
 would go against the idea of being a toolkit with functional 
 building blocks, as opposed to a framework that dictates your 
 application structure.

Agreed.

 Not everything is CPU bound and using threads "just 
 because" doesn't make sense either. This is especially 
 true, because of low level race conditions that require 
 special care. D's shared/immutable helps with that, but 
 that also means that your whole application suddenly needs 
 to use shared/immutable when passing data between tasks.

 I’m dying to know which application not being cpu bound 
 still needs to pass data between tasks that are all running 
 on a single thread.

 Anything client side involving a user interface has plenty of 
 opportunities for employing secondary tasks or long-running 
 sparsely updated state logic that are not CPU bound. Most of 
 the time is spent idle there. Specific computations on the 
 other hand can of course still be handed off to other threads.

 
 Latency still going to be better if multiple cores are 
 utilized.
 And I'm still not sure what the example is.

 We are comparing fiber switches and working on data with a 
 shared cache and no synchronization to synchronizing data 
 access and control flow between threads/cores. There is such a 
 broad spectrum of possibilities for one of those to be faster 
 than the other that it's just silly to make a general statement 
 like that.

 The thing is that if you always share data between threads, you 
 have to pay for that for every single data access, regardless 
 of whether there is actual concurrency going on or not.

Obviously, we should strive to share responsibly. Photon has 
Channels much like vibe-core has Channel. Mine are MPSC though, 
mostly to model Input/Output range concepts.

 If you want a concrete example, take a simple download dialog 
 with a progress bar. There is no gain in off-loading anything 
 to a separate thread here, since this is fully I/O bound, but 
 it adds quite some communication complexity if you do. CPU 
 performance is simply not a concern here.

Channels tame the complexity. Yes, channels could get more 
expansive in multi-threaded scenario but we already agreed that 
it's not CPU bound.

 But TLS variables are always "globals" in the sense that 
 they outlive the scope that accesses them. A modification 
 in one thread would obviously not be visible in another 
 thread, meaning that you may or may not have a semantic 
 connection when you access such a library sequentially from 
 multiple tasks.

 And then there are said libraries that are not thread-safe 
 at all, or are bound to the thread where you initialize 
 them. Or handles returned from a library may be bound to 
 the thread that created them. Dealing with all of this just 
 becomes needlessly complicated and error-prone, especially 
 if CPU cycles are not a concern.

 TLS is fine for using not thread safe library - just make 
 sure you initialize it for all threads. I do not switch or 
 otherwise play dirty tricks with TLS.

 The problem is that for example you might have a handle that 
 was created in thread A and is not valid in thread B, or you 
 set a state in thread A and thread B doesn't see that state. 
 This would mean that you are limited to a single task for the 
 complete library interaction.

 
 Or just initialize it lazily in all threads that happen to use 
 it.
 Otherwise, this is basically stick to one thread really.

 But then it's a different handle representing a different 
 object - that's not the same thing. I'm not just talking about 
 initializing the library as a whole. But even if, there are a 
 lot of libraries that don't use TLS and are simply not 
 thread-safe at all.

Something that is not thread-safe at all is a dying breed. It's 
been 20 years that we have multi-cores. Most libraries can be 
initialized once per thread which is quite naturally modeled with 
TLS handle to said library. Communicating between fibers via 
shared TLS handle is not something I would recommend regardless 
of the default spawn behavior.

 By robbing the user the control over where a task spawns, 
 you are also forcing synchronization everywhere, which can 
 quickly become more expensive than any benefits you would 
 gain from using multiple threads.

 Either of default kind of rob user of control of where the 
 task spawns. Which is sensible a user shouldn’t really care.

 This doesn't make sense, in the original vibe-core, you can 
 simply choose between spawning in the same thread or in "any" 
 thread. `shared`/`immutable` is correctly enforced in the 
 latter case to avoid unintended data sharing.

 
 I have go and goOnSameThread. Guess which is the encouraged 
 option.

 Does go() enforce proper use of shared/immutable when passing 
 data to the scheduled "go routine"?

It goes with the same API as we have for threads - a delegate, so 
sharing becomes user's responsibility. I may add function + args 
for better handling of resources passed to the lambda.

 Finally, in the case of web applications, in my opinion the 
 better approach for using multiple CPU cores is *usually* 
 by running multiple *processes* in parallel, as opposed to 
 multiple threads within a single process. Of course, every 
 application is different and there is no one-size-fits-all 
 approach.

 There we differ, not only load balancing is simpler within a 
 single application but also processes are more expansive. 
 Current D GC situation kind of sucks on multithreaded 
 workloads but that is the only reason to go multiprocess 
 IMHO.

 The GC/malloc is the main reason why this is mostly false in 
 practice, but it extends to any central contention source 
 within the process - yes, often you can avoid that, but often 
 that takes a lot of extra work and processes sidestep that 
 issue in the first place.

 
 As is observable from the look on other languages and runtimes 
 malloc is not the bottleneck it used to be. Our particular 
 version of GC that doesn't have thread caches is a bottleneck.

 malloc() will also always be a bottleneck with the right load. 
 Just the n times larger amount of virtual address space 
 required may start to become an issue for memory heavy 
 applications. But even if ignore that, ruling out using the 
 existing GC doesn't sound like a good idea to me.

The existing GC is basically 20+ years old, ofc we need better GC 
and
thread cached allocation solves contention in multi-threaded 
environments.
Alternative memory allocator is doing great on 320 core machines. 
I cannot tell you which allocator that is or what exactly these 
servers are. Though even jemalloc does okayish.

 And the fact is that, even with relatively mild GC use, a web 
 application will not scale properly with many cores.

Only partially agree, Java's GC handles load just fine and runs 
faster than vibe.d(-light). It does allocations on its serving 
code path.

 Also, in the usual case where the threads don't have to 
 communicate with each other (apart from memory allocation 
 synchronization), a separate process per core isn't any 
 slower - except maybe when hyper- threading is in play, but 
 whether that helps or hurts performance always depends on the 
 concrete workload.

 
 The fact that context switch has to drop all of virtual 
 address spaces does add a bit of overhead. Though to be 
 certain of anything there better be a benchmark.

 There is no context switch involved with each process running 
 on its own core.

Yeah, pinning down cores works, I stand corrected.

 Separate process also have the advantage of being more robust 
 and enabling seamless restarts and updates of the executable. 
 And they facilitate an application design that lends itself 
 to scaling across multiple machines.

 
 Then give me the example code to run multiple vibe.d in 
 parallel processes (should be simillar to runDist) and we can 
 compare approaches. For all I know it could be faster then 
 multi-threaded vibe.d-light. Also honestly if vibe.d's target 
 is multiple processes it should probably start like this by 
 default.

 Again, the "default" is a high-level issue and none of 
 vibe-core's business. The simplest way to have that work is to 
 use `HTTPServerOption.reusePort` and then start as many 
 processes as desired.

So I did just that. To my surprise it indeed speeds up all of my 
D server examples.
The speed ups are roughly:

On vibe-http-light:
8 cores 1.14
12 cores 1.10
16 cores 1.08
24 cores 1.05
32 cores 1.06
48 cores 1.07

On vibe-http-classic:
8 cores 1.33
12 cores 1.45
16 cores 1.60
24 cores 2.54
32 cores 4.44
48 cores 8.56

On plain photon-http:
8 cores 1.15
12 cores 1.10
16 cores 1.09
24 cores 1.05
32 cores 1.07
48 cores 1.04

We should absolutely tweak vibe.d TechEmpower benchmark to run 
vibe.d as a process per core! As far as photon-powered versions 
go I see there is a point where per-process becomes less of a 
gain with more cores, so I would think there are 2 factors at 
play one positive and one negative, with negative being tied to 
the number of processes.

Lastly, I have found opportunities to speed up vibe-http even 
without switching to vibe-core-light. Will send PRs.

Sep 23

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 23 September 2025 at 15:35:47 UTC, Dmitry Olshansky 
wrote:
 On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig
 Again, the "default" is a high-level issue and none of 
 vibe-core's business. The simplest way to have that work is to 
 use `HTTPServerOption.reusePort` and then start as many 
 processes as desired.

 So I did just that. To my surprise it indeed speeds up all of 
 my D server examples.

That 15% speed was suspicious, so I looked closer into what I was 
doing and indeed I was launching N+1 processes due to a bug in my 
script. 9 / 8 ~ 1.15 so here goes.

 The speed ups are roughly:

 On vibe-http-light:
 8 cores 1.14
 12 cores 1.10
 16 cores 1.08
 24 cores 1.05
 32 cores 1.06
 48 cores 1.07

Proper numbers:

8 cores 1.01
12 cores 1.01
16 cores 1.02
24 cores 1.00
32 cores 1.02
48 cores 1.05

 On plain photon-http:
 8 cores 1.15
 12 cores 1.10
 16 cores 1.09
 24 cores 1.05
 32 cores 1.07
 48 cores 1.04

Proper numbers:

8 cores 1.02
12 cores 1.02
16 cores 1.01
24 cores 1.02
32 cores 1.04
48 cores 1.02

So it *seems* there is still a little bit of gain, I'm 
investigating where the benefit actually comes from, keeping in 
mind that there is some noise in the benchmark itself.

 Lastly, I have found opportunities to speed up vibe-http even 
 without switching to vibe-core-light. Will send PRs.

First one to go:
https://github.com/vibe-d/vibe-http/pull/65

Sep 24

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:
 On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:
 Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
 On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:
 So you don't support timeouts when waiting for an event at all? 
 Otherwise I don't see why a separate API would be required, this 
 should be implementable with plain Posix APIs within vibe-core-lite 
 itself.

 Photon's API is the syscall interface. So to wait on an event you 
 just call poll.
 Behind the scenes it will just wait on the right fd to change state.

 Now vibe-core-light wants something like read(buffer, timeout) which 
 is not syscall API but maybe added. But since I'm going to add new 
 API I'd rather have something consistent and sane not just a bunch of 
 adhoc functions to satisfy vibe.d interface.

 Why can't you then use poll() to for example implement `ManualEvent` 
 with timeout and interrupt support? And shouldn't recv() with timeout 
 be implementable the same way, poll with timeout and only read when 
 ready?

 
 Yes, recv with timeout is basically poll+recv. The problem is that then 
 I need to support interrupts in poll. Nothing really changed.
 As far as manual event goes I've implemented that with custom cond var 
 and mutex. That mutex is not interruptible as it's backed by semaphore 
 on slow path in a form of eventfd.
 I might create custom mutex that is interruptible I guess but the notion 
 of interrupts would have to be introduced to photon. I do not really 
 like it.

I'd probably create an additional event FD per thread used to signal 
interruption and also pass that to any poll() that is used for 
interruptible wait.

 I think we have a misunderstanding of what vibe.d is supposed to be. 
 It seems like you are only focused on the web/server role, while to me 
 vibe-core is a general-purpose I/O and concurrency system with no 
 particular specialization in server tasks. With that view, your 
 statement to me sounds like "Clearly D is not meant to do multi- 
 threading, since main() is only running in a single thread".

 
 The defaults are what is important. Go defaults to multi-threading for 
 instance.
 D defaults to multi-threading because TLS by default is certainly a mark 
 of multi-threaded environment. std.concurrency defaults to new thread 
 per spawn, again this tells me it's about multithreading. I intend to 
 support multi-threading by default. I understand that we view this issue 
 differently.

But you are comparing different defaults here. With plain D, you also 
have to import either `core.thread` or 
`std.concurrency`/`std.paralellism` to do any multi-threaded work. The 
same is true for vibe-core. What you propose would be more comparable to 
having foreach() operate like parallelForeach(), with far-reaching 
consequences.

If we are just talking about naming - runTask/runWorkerTask vs. 
go/goOnSameThread - that is of course debatable, but in that case I 
think it's blown very much out of proportion to take that as the basis 
to claim "it's meant to be used single-threaded".

 Anything client side involving a user interface has plenty of 
 opportunities for employing secondary tasks or long-running sparsely 
 updated state logic that are not CPU bound. Most of the time is 
 spent idle there. Specific computations on the other hand can of 
 course still be handed off to other threads.

 Latency still going to be better if multiple cores are utilized.
 And I'm still not sure what the example is.

 We are comparing fiber switches and working on data with a shared 
 cache and no synchronization to synchronizing data access and control 
 flow between threads/cores. There is such a broad spectrum of 
 possibilities for one of those to be faster than the other that it's 
 just silly to make a general statement like that.

 The thing is that if you always share data between threads, you have 
 to pay for that for every single data access, regardless of whether 
 there is actual concurrency going on or not.

 
 Obviously, we should strive to share responsibly. Photon has Channels 
 much like vibe-core has Channel. Mine are MPSC though, mostly to model 
 Input/Output range concepts.

True, but it's still not free (as in CPU cycles and code complexity) and 
you can't always control all code involved.

 If you want a concrete example, take a simple download dialog with a 
 progress bar. There is no gain in off-loading anything to a separate 
 thread here, since this is fully I/O bound, but it adds quite some 
 communication complexity if you do. CPU performance is simply not a 
 concern here.

 
 Channels tame the complexity. Yes, channels could get more expansive in 
 multi-threaded scenario but we already agreed that it's not CPU bound.

If you have code that does a lot of these things, this just degrades 
code readability for absolutely no practical gain, though.

 The problem is that for example you might have a handle that was 
 created in thread A and is not valid in thread B, or you set a state 
 in thread A and thread B doesn't see that state. This would mean 
 that you are limited to a single task for the complete library 
 interaction.

 Or just initialize it lazily in all threads that happen to use it.
 Otherwise, this is basically stick to one thread really.

 But then it's a different handle representing a different object - 
 that's not the same thing. I'm not just talking about initializing the 
 library as a whole. But even if, there are a lot of libraries that 
 don't use TLS and are simply not thread-safe at all.

 
 Something that is not thread-safe at all is a dying breed. It's been 20 
 years that we have multi-cores. Most libraries can be initialized once 
 per thread which is quite naturally modeled with TLS handle to said 
 library. Communicating between fibers via shared TLS handle is not 
 something I would recommend regardless of the default spawn behavior.

Unfortunately, those libraries are an unpleasant reality that you can't 
always avoid.

BTW, one of the worst offenders is Apple's whole Objective-C API. 
Auto-release pools in particular make it extremely fragile to work with 
fibers at all and of course there are all kinds of hidden thread 
dependencies inside.

 This doesn't make sense, in the original vibe-core, you can simply 
 choose between spawning in the same thread or in "any" thread. 
 `shared`/`immutable` is correctly enforced in the latter case to 
 avoid unintended data sharing.

 I have go and goOnSameThread. Guess which is the encouraged option.

 Does go() enforce proper use of shared/immutable when passing data to 
 the scheduled "go routine"?

 
 It goes with the same API as we have for threads - a delegate, so 
 sharing becomes user's responsibility. I may add function + args for 
 better handling of resources passed to the lambda.

That means that this is completely un` safe` - C++ level memory safety. 
IMO this is an unacceptable default for web applications.

 The GC/malloc is the main reason why this is mostly false in 
 practice, but it extends to any central contention source within the 
 process - yes, often you can avoid that, but often that takes a lot 
 of extra work and processes sidestep that issue in the first place.

 As is observable from the look on other languages and runtimes malloc 
 is not the bottleneck it used to be. Our particular version of GC 
 that doesn't have thread caches is a bottleneck.

 malloc() will also always be a bottleneck with the right load. Just 
 the n times larger amount of virtual address space required may start 
 to become an issue for memory heavy applications. But even if ignore 
 that, ruling out using the existing GC doesn't sound like a good idea 
 to me.

 
 The existing GC is basically 20+ years old, ofc we need better GC and
 thread cached allocation solves contention in multi-threaded environments.
 Alternative memory allocator is doing great on 320 core machines. I 
 cannot tell you which allocator that is or what exactly these servers 
 are. Though even jemalloc does okayish.
 
 And the fact is that, even with relatively mild GC use, a web 
 application will not scale properly with many cores.

 
 Only partially agree, Java's GC handles load just fine and runs faster 
 than vibe.d(-light). It does allocations on its serving code path.

I was just talking about the current D GC here. Once we have a better 
implementation, this can very well become a much weaker argument!

However, speaking more generally, the other arguments for preferring to 
scale using processes still stand, and even with a better GC I would 
still argue that leading library users to do multi-threaded request 
handling is not necessarily the best default (of course it still *can* 
be for some applications).

Anyway, the main point from my side is just that the semantics of what 
*is* in vibe-core-light should really match the corresponding functions 
in vibe-core. Apart from that, I was just telling you that your 
impression of it being intended to be used single-threaded is not right, 
which doesn't mean that the presentation shouldn't probably emphasize 
the multi-threaded functionality and multi-threaded request processing more.

 Separate process also have the advantage of being more robust and 
 enabling seamless restarts and updates of the executable. And they 
 facilitate an application design that lends itself to scaling across 
 multiple machines.

 Then give me the example code to run multiple vibe.d in parallel 
 processes (should be simillar to runDist) and we can compare 
 approaches. For all I know it could be faster then multi-threaded 
 vibe.d-light. Also honestly if vibe.d's target is multiple processes 
 it should probably start like this by default.

 Again, the "default" is a high-level issue and none of vibe-core's 
 business. The simplest way to have that work is to use 
 `HTTPServerOption.reusePort` and then start as many processes as desired.

 
 So I did just that. To my surprise it indeed speeds up all of my D 
 server examples.
 The speed ups are roughly:
 
 On vibe-http-light:
 8 cores 1.14
 12 cores 1.10
 16 cores 1.08
 24 cores 1.05
 32 cores 1.06
 48 cores 1.07
 
 On vibe-http-classic:
 8 cores 1.33
 12 cores 1.45
 16 cores 1.60
 24 cores 2.54
 32 cores 4.44
 48 cores 8.56
 
 On plain photon-http:
 8 cores 1.15
 12 cores 1.10
 16 cores 1.09
 24 cores 1.05
 32 cores 1.07
 48 cores 1.04
 
 We should absolutely tweak vibe.d TechEmpower benchmark to run vibe.d as 
 a process per core! As far as photon-powered versions go I see there is 
 a point where per-process becomes less of a gain with more cores, so I 
 would think there are 2 factors at play one positive and one negative, 
 with negative being tied to the number of processes.
 
 Lastly, I have found opportunities to speed up vibe-http even without 
 switching to vibe-core-light. Will send PRs.

Interesting, I wonder whether its the REUSE_PORT connection distribution 
that gets more expensive when it's working cross-process. Agreed that 
the TechEmpower benchmark is in dire need of being looked at. In fact I 
had the code checked out for a long while, intending to look into it, 
because it obviously didn't scale like my own benchmarks, but then I 
never got around to do it, being to busy with other things.

Sep 25

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Thursday, 25 September 2025 at 07:24:00 UTC, Sönke Ludwig 
wrote:
 Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:
 On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig 
 wrote:
 Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
 On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig 
 wrote:
 So you don't support timeouts when waiting for an event at 
 all? Otherwise I don't see why a separate API would be 
 required, this should be implementable with plain Posix 
 APIs within vibe-core-lite itself.

 Photon's API is the syscall interface. So to wait on an 
 event you just call poll.
 Behind the scenes it will just wait on the right fd to 
 change state.

 Now vibe-core-light wants something like read(buffer, 
 timeout) which is not syscall API but maybe added. But since 
 I'm going to add new API I'd rather have something 
 consistent and sane not just a bunch of adhoc functions to 
 satisfy vibe.d interface.

 Why can't you then use poll() to for example implement 
 `ManualEvent` with timeout and interrupt support? And 
 shouldn't recv() with timeout be implementable the same way, 
 poll with timeout and only read when ready?

 
 Yes, recv with timeout is basically poll+recv. The problem is 
 that then I need to support interrupts in poll. Nothing really 
 changed.
 As far as manual event goes I've implemented that with custom 
 cond var and mutex. That mutex is not interruptible as it's 
 backed by semaphore on slow path in a form of eventfd.
 I might create custom mutex that is interruptible I guess but 
 the notion of interrupts would have to be introduced to 
 photon. I do not really like it.

 I'd probably create an additional event FD per thread used to 
 signal interruption and also pass that to any poll() that is 
 used for interruptible wait.

poll could be made interruptible w/o any additions it's really a 
yield + waiting for events. I could implement interruptible 
things by simply waking fiber up with special flag
passed (it has such to determine which event waked us up anyway).

The problem is, I do not see how to provide adequate interface 
for this functionality.
I have an idea though - use EINTR or maybe devise my own code for 
interrupts, then I could interrupt at my syscall interface level. 
Next is the question of how do I throw from a nothrow context. 
Here I told you that I would need a separate APIs for 
interruptible things - the ones that allow interrupts. *This* is 
something I do not look forward to.

 I think we have a misunderstanding of what vibe.d is supposed 
 to be. It seems like you are only focused on the web/server 
 role, while to me vibe-core is a general-purpose I/O and 
 concurrency system with no particular specialization in 
 server tasks. With that view, your statement to me sounds 
 like "Clearly D is not meant to do multi- threading, since 
 main() is only running in a single thread".

 
 The defaults are what is important. Go defaults to 
 multi-threading for instance.
 D defaults to multi-threading because TLS by default is 
 certainly a mark of multi-threaded environment. 
 std.concurrency defaults to new thread per spawn, again this 
 tells me it's about multithreading. I intend to support 
 multi-threading by default. I understand that we view this 
 issue differently.

 But you are comparing different defaults here. With plain D, 
 you also have to import either `core.thread` or 
 `std.concurrency`/`std.paralellism` to do any multi-threaded 
 work. The same is true for vibe-core. What you propose would be 
 more comparable to having foreach() operate like 
 parallelForeach(), with far-reaching consequences.

 If we are just talking about naming - runTask/runWorkerTask vs. 
 go/goOnSameThread - that is of course debatable, but in that 
 case I think it's blown very much out of proportion to take 
 that as the basis to claim "it's meant to be used 
 single-threaded".

So runTask is assumed to run on the same core while runWorkerTask 
to be run on any available core? Didn't occur to me. I thought 
worker pool is for blocking tasks, as there is such a pool in 
photon. I can just switch runTask to goOnSameThread to maximize 
compatibility with vibed.

 
 Channels tame the complexity. Yes, channels could get more 
 expansive in multi-threaded scenario but we already agreed 
 that it's not CPU bound.

 If you have code that does a lot of these things, this just 
 degrades code readability for absolutely no practical gain, 
 though.

I humbly disagree. I'd take explicit channels over global TLS 
variables any day.

 This doesn't make sense, in the original vibe-core, you can 
 simply choose between spawning in the same thread or in 
 "any" thread. `shared`/`immutable` is correctly enforced in 
 the latter case to avoid unintended data sharing.

 I have go and goOnSameThread. Guess which is the encouraged 
 option.

 Does go() enforce proper use of shared/immutable when passing 
 data to the scheduled "go routine"?

 
 It goes with the same API as we have for threads - a delegate, 
 so sharing becomes user's responsibility. I may add function + 
 args for better handling of resources passed to the lambda.

 That means that this is completely un` safe` - C++ level memory 
 safety. IMO this is an unacceptable default for web 
 applications.

Yeah, I'm not in the  safe world mostly. But as I said to make it 
more upstreamable I will switch the defaults, so that 
vibe-core-light provides the same guarantees as regular vibe-core 
does.

 malloc() will also always be a bottleneck with the right 
 load. Just the n times larger amount of virtual address space 
 required may start to become an issue for memory heavy 
 applications. But even if ignore that, ruling out using the 
 existing GC doesn't sound like a good idea to me.

 
 The existing GC is basically 20+ years old, ofc we need better 
 GC and
 thread cached allocation solves contention in multi-threaded 
 environments.
 Alternative memory allocator is doing great on 320 core 
 machines. I cannot tell you which allocator that is or what 
 exactly these servers are. Though even jemalloc does okayish.
 
 And the fact is that, even with relatively mild GC use, a web 
 application will not scale properly with many cores.

 
 Only partially agree, Java's GC handles load just fine and 
 runs faster than vibe.d(-light). It does allocations on its 
 serving code path.

 I was just talking about the current D GC here. Once we have a 
 better implementation, this can very well become a much weaker 
 argument!

 However, speaking more generally, the other arguments for 
 preferring to scale using processes still stand, and even with 
 a better GC I would still argue that leading library users to 
 do multi-threaded request handling is not necessarily the best 
 default (of course it still *can* be for some applications).

I'm betting more on the threaded approach, but we are just 
different. See also my reply on the numbers - processes are only 
about 1-2% faster (and the noise is easily in 0.5% range) once 
the GC bottleneck is handled that is.

 Anyway, the main point from my side is just that the semantics 
 of what *is* in vibe-core-light should really match the 
 corresponding functions in vibe-core. Apart from that, I was 
 just telling you that your impression of it being intended to 
 be used single-threaded is not right, which doesn't mean that 
 the presentation shouldn't probably emphasize the 
 multi-threaded functionality and multi-threaded request 
 processing more.

Given the number of potential expectations from the user side it 
seems I need to update vibe-core-light to use goOnSameThread for 
runTask. I do not like it how I need to do extra work to launch a 
multi-threaded server though which is something that got me 
started on the whole "defaults argument".

Sep 25

=?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 25.09.25 um 14:25 schrieb Dmitry Olshansky:
 On Thursday, 25 September 2025 at 07:24:00 UTC, Sönke Ludwig wrote:
 Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:
 On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:
 Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
 On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:
 So you don't support timeouts when waiting for an event at all? 
 Otherwise I don't see why a separate API would be required, this 
 should be implementable with plain Posix APIs within vibe-core- 
 lite itself.

 Photon's API is the syscall interface. So to wait on an event you 
 just call poll.
 Behind the scenes it will just wait on the right fd to change state.

 Now vibe-core-light wants something like read(buffer, timeout) 
 which is not syscall API but maybe added. But since I'm going to 
 add new API I'd rather have something consistent and sane not just 
 a bunch of adhoc functions to satisfy vibe.d interface.

 Why can't you then use poll() to for example implement `ManualEvent` 
 with timeout and interrupt support? And shouldn't recv() with 
 timeout be implementable the same way, poll with timeout and only 
 read when ready?

 Yes, recv with timeout is basically poll+recv. The problem is that 
 then I need to support interrupts in poll. Nothing really changed.
 As far as manual event goes I've implemented that with custom cond 
 var and mutex. That mutex is not interruptible as it's backed by 
 semaphore on slow path in a form of eventfd.
 I might create custom mutex that is interruptible I guess but the 
 notion of interrupts would have to be introduced to photon. I do not 
 really like it.

 I'd probably create an additional event FD per thread used to signal 
 interruption and also pass that to any poll() that is used for 
 interruptible wait.

 
 poll could be made interruptible w/o any additions it's really a yield + 
 waiting for events. I could implement interruptible things by simply 
 waking fiber up with special flag
 passed (it has such to determine which event waked us up anyway).
 
 The problem is, I do not see how to provide adequate interface for this 
 functionality.
 I have an idea though - use EINTR or maybe devise my own code for 
 interrupts, then I could interrupt at my syscall interface level. Next 
 is the question of how do I throw from a nothrow context. Here I told 
 you that I would need a separate APIs for interruptible things - the 
 ones that allow interrupts. *This* is something I do not look forward to.
 
 I think we have a misunderstanding of what vibe.d is supposed to be. 
 It seems like you are only focused on the web/server role, while to 
 me vibe-core is a general-purpose I/O and concurrency system with no 
 particular specialization in server tasks. With that view, your 
 statement to me sounds like "Clearly D is not meant to do multi- 
 threading, since main() is only running in a single thread".

 The defaults are what is important. Go defaults to multi-threading 
 for instance.
 D defaults to multi-threading because TLS by default is certainly a 
 mark of multi-threaded environment. std.concurrency defaults to new 
 thread per spawn, again this tells me it's about multithreading. I 
 intend to support multi-threading by default. I understand that we 
 view this issue differently.

 But you are comparing different defaults here. With plain D, you also 
 have to import either `core.thread` or `std.concurrency`/ 
 `std.paralellism` to do any multi-threaded work. The same is true for 
 vibe-core. What you propose would be more comparable to having 
 foreach() operate like parallelForeach(), with far-reaching consequences.

 If we are just talking about naming - runTask/runWorkerTask vs. go/ 
 goOnSameThread - that is of course debatable, but in that case I think 
 it's blown very much out of proportion to take that as the basis to 
 claim "it's meant to be used single-threaded".

 
 So runTask is assumed to run on the same core while runWorkerTask to be 
 run on any available core? Didn't occur to me. I thought worker pool is 
 for blocking tasks, as there is such a pool in photon. I can just switch 
 runTask to goOnSameThread to maximize compatibility with vibed.

Yes, I think that should be enough to make the semantics compatible. 
runWorkerTask is kind of dual-use in that regard and is mostly meant for 
CPU workloads. There is a separate I/O worker pool for blocking I/O 
operations to avoid computationally expensive worker tasks getting 
blocked by I/O. This is definitely the area where Photon can shine, 
working fine for all kinds of workloads with just a single pool.

 Channels tame the complexity. Yes, channels could get more expansive 
 in multi-threaded scenario but we already agreed that it's not CPU 
 bound.

 If you have code that does a lot of these things, this just degrades 
 code readability for absolutely no practical gain, though.

 
 I humbly disagree. I'd take explicit channels over global TLS variables 
 any day.

It wouldn't usually be TLS, but just a delegate that gets passed from 
the UI task to the I/O task for example, implicitly operating on stack 
data, or on some UI structures referenced from there.

 This doesn't make sense, in the original vibe-core, you can simply 
 choose between spawning in the same thread or in "any" thread. 
 `shared`/`immutable` is correctly enforced in the latter case to 
 avoid unintended data sharing.

 I have go and goOnSameThread. Guess which is the encouraged option.

 Does go() enforce proper use of shared/immutable when passing data 
 to the scheduled "go routine"?

 It goes with the same API as we have for threads - a delegate, so 
 sharing becomes user's responsibility. I may add function + args for 
 better handling of resources passed to the lambda.

 That means that this is completely un` safe` - C++ level memory 
 safety. IMO this is an unacceptable default for web applications.

 
 Yeah, I'm not in the  safe world mostly. But as I said to make it more 
 upstreamable I will switch the defaults, so that vibe-core-light 
 provides the same guarantees as regular vibe-core does.
 
 malloc() will also always be a bottleneck with the right load. Just 
 the n times larger amount of virtual address space required may 
 start to become an issue for memory heavy applications. But even if 
 ignore that, ruling out using the existing GC doesn't sound like a 
 good idea to me.

 The existing GC is basically 20+ years old, ofc we need better GC and
 thread cached allocation solves contention in multi-threaded 
 environments.
 Alternative memory allocator is doing great on 320 core machines. I 
 cannot tell you which allocator that is or what exactly these servers 
 are. Though even jemalloc does okayish.

 And the fact is that, even with relatively mild GC use, a web 
 application will not scale properly with many cores.

 Only partially agree, Java's GC handles load just fine and runs 
 faster than vibe.d(-light). It does allocations on its serving code 
 path.

 I was just talking about the current D GC here. Once we have a better 
 implementation, this can very well become a much weaker argument!

 However, speaking more generally, the other arguments for preferring 
 to scale using processes still stand, and even with a better GC I 
 would still argue that leading library users to do multi-threaded 
 request handling is not necessarily the best default (of course it 
 still *can* be for some applications).

 
 I'm betting more on the threaded approach, but we are just different. 
 See also my reply on the numbers - processes are only about 1-2% faster 
 (and the noise is easily in 0.5% range) once the GC bottleneck is 
 handled that is.
 
 Anyway, the main point from my side is just that the semantics of what 
 *is* in vibe-core-light should really match the corresponding 
 functions in vibe-core. Apart from that, I was just telling you that 
 your impression of it being intended to be used single-threaded is not 
 right, which doesn't mean that the presentation shouldn't probably 
 emphasize the multi-threaded functionality and multi-threaded request 
 processing more.

 
 Given the number of potential expectations from the user side it seems I 
 need to update vibe-core-light to use goOnSameThread for runTask. I do 
 not like it how I need to do extra work to launch a multi-threaded 
 server though which is something that got me started on the whole 
 "defaults argument".

Maybe we can at least think about a possible reintroduction of a direct 
`listenHTTPDist`/`ListenHTTPMultiThreaded`/... API that provides a 
` safe` interface - there used to be a `HTTPServerOption.distribute` 
that did that, but it didn't enforce `shared` properly and lead to 
race-conditions in practical applications, because people were not aware 
of the implicitly shared data or of the implications thereof.

Sep 26

IchorDev <zxinsworld gmail.com> writes:

On Friday, 19 September 2025 at 16:29:11 UTC, Dmitry Olshansky 
wrote:
 I’m dying to know which application not being cpu bound still 
 needs to pass data between tasks that are all running on a 
 single thread.

 TLS is fine for using not thread safe library - just make sure 
 you initialize it for all threads. I do not switch or otherwise 
 play dirty tricks with TLS.

 Either of default kind of rob user of control of where the task 
 spawns. Which is sensible a user shouldn’t really care.

 There we differ, not only load balancing is simpler within a 
 single application but also processes are more expansive. 
 Current D GC situation kind of sucks on multithreaded workloads 
 but that is the only reason to go multiprocess IMHO.

I'm dying to see some statistics to show which approach is more 
performant in different scenarios.

Sep 21

Hipreme <msnmancini hotmail.com> writes:

On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry Olshansky 
wrote:
 I have been building Photon[1] scheduler library with the aim 
 to build high performance servers but since we are already have 
 vibe.d and everybody is using it might as well try to speed it 
 up. Thus vibe.d-light idea was born - port vibe.d framework on 
 top of photon.


Conrgatulations on your amazing work! I also agree with you that 
there's no point into protecting users from threading today and 
this actually reflects D philosophy since it both uses TLS (so 
the language needs threading anyway) and being easy to write 
parallel code :)

Sep 19

Kagamin <spam here.lot> writes:

FYI since you use fibers, linux 6.13 got efficient guard page 
implementation - MADV_GUARD_INSTALL.

Oct 27

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 27 October 2025 at 11:05:02 UTC, Kagamin wrote:
 FYI since you use fibers, linux 6.13 got efficient guard page 
 implementation - MADV_GUARD_INSTALL.

Cool stuff would be nice to upstream to druntime though.

Oct 28

D Programming

C/C++ Programming

Other

digitalmars.D.announce - vibe.d-lite v0.1.0 powered by photon