digitalmars.D.announce - vibe.d-lite v0.1.0 powered by photon
- Dmitry Olshansky (55/55) Sep 18 I have been building Photon[1] scheduler library with the aim to
- Steven Schveighoffer (11/14) Sep 18 I think this is fantastic! This is good evidence you are
- Dmitry Olshansky (9/22) Sep 18 I might need help getting std.concurrency to run with photon,
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (23/30) Sep 19 I guess vibe-stream/-inet/-http just need to be adjusted due to
- Dmitry Olshansky (25/64) Sep 19 I think stream/inet is just updating the deps to be “light”.
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (35/78) Sep 19 Shouldn't it still be possible to set an "interrupted" flag somewhere
- Dmitry Olshansky (23/116) Sep 19 Since vibe-core-light depends on syscalls this would mean
- Richard (Rikki) Andrew Cattermole (4/14) Sep 19 And more importantly you don't pay for anywhere the same number of
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (37/109) Sep 19 So you don't support timeouts when waiting for an event at all?
- Dmitry Olshansky (33/155) Sep 22 Photon's API is the syscall interface. So to wait on an event you
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (45/172) Sep 22 Why can't you then use poll() to for example implement `ManualEvent`
- Dmitry Olshansky (76/250) Sep 23 Yes, recv with timeout is basically poll+recv. The problem is
- Dmitry Olshansky (24/48) Sep 24 That 15% speed was suspicious, so I looked closer into what I was
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (45/228) Sep 25 I'd probably create an additional event FD per thread used to signal
- Dmitry Olshansky (34/160) Sep 25 poll could be made interruptible w/o any additions it's really a
- =?UTF-8?Q?S=C3=B6nke_Ludwig?= (16/173) Sep 26 Yes, I think that should be enough to make the semantics compatible.
- IchorDev (4/16) Sep 21 I'm dying to see some statistics to show which approach is more
- Hipreme (7/12) Sep 19 Conrgatulations on your amazing work! I also agree with you that
I have been building Photon[1] scheduler library with the aim to build high performance servers but since we are already have vibe.d and everybody is using it might as well try to speed it up. Thus vibe.d-light idea was born - port vibe.d framework on top of photon. So in the last couple of weeks I've been porting vibe-core, vibe-stream, vibe-inet and vibe-http to: a) work with Photon instead of eventcore b) speed things up as I go, since I like fast things The end result is that running bench-http-server from vibe-http examples I get 1.6M rps with my version vs 243k rps on vanila running on 48 rather weak cores. Ofc I want people to try it and see how it works in terms of correctness and speed on more complex projects: https://code.dlang.org/packages/vibe-d-light https://github.com/DmitryOlshansky/vibe.d Though most of work goes on in deps: https://github.com/DmitryOlshansky/vibe-http https://github.com/DmitryOlshansky/vibe-core See also Photon the machinery behind it all: https://github.com/DmitryOlshansky/photon Warning - this is likely Linux only at the moment, though I expect MacOS to also work. Key differences so far: 1. photon powered *-light versions always run multi-threaded utilizing whatever number of cores you gave them with e.g. taskset or all of them by default. No need to muck around with setting up multiple event loops and if you did - don't worry it'll still do the right thing. 2. There is no Interruptible* mutexes, condvars or anything photon doesn't support the notion and code that relies on interrupt needs to be rethought (including some part of vibe.d itself). 3. UDP is stubbed out, because I do not have much of sensible examples utilizing UDP and felt wrong to port it only to leave it untested. Anyone using UDP with vibe.d is welcome to give me good examples preferably with multicast. 4. Timers... Photon has timer related functionality in particular sleeps but not quite what vibe.d wants so at this point timers are stubbed out. 5. Fibers are scheduled roughly to the least loaded cores so all of LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying the whole thing and making it easier to scale. 6. Processes and process management is stubbed out until I find a way to implement them in Photon. 7. Files work but may block the thread in some cases, still need a little bit more support in Photon. 8. Worker threads - there is none at the moment, all is scheduled on the same pool. Practically speaking this should only affect CPU intensive tasks, the rest already avoids blocking on syscall so the primary need for workers is nil. 9. Maybe something else I forgot in the midst of it all. So closing thoughts - is anyone willing to help me iron out the inevitable bugs and improve things beyond this proof of concept? What the community thinks of the idea in general?
Sep 18
On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry Olshansky wrote:So closing thoughts - is anyone willing to help me iron out the inevitable bugs and improve things beyond this proof of concept? What the community thinks of the idea in general?I think this is fantastic! This is good evidence you are following the right path. And it's a great test for the concept. If there's anything that you run into that might be difficult to solve or understand, please post it! If I get a chance, I'll try it on my vibe-d server, but probably not for production at this point. The RPS isn't high anyways, probably more like 1 request every few minutes. -Steve
Sep 18
On Friday, 19 September 2025 at 01:43:04 UTC, Steven Schveighoffer wrote:On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry Olshansky wrote:I might need help getting std.concurrency to run with photon, vibe.d kind of works with it but I’m missing something. I got scheduler implemented forwarding to the right photon’s condvars and mutexes but I have no idea about Tid management i.e. how do I register/setup a fiber for Tid?So closing thoughts - is anyone willing to help me iron out the inevitable bugs and improve things beyond this proof of concept? What the community thinks of the idea in general?I think this is fantastic! This is good evidence you are following the right path. And it's a great test for the concept. If there's anything that you run into that might be difficult to solve or understand, please post it!If I get a chance, I'll try it on my vibe-d server, but probably not for production at this point. The RPS isn't high anyways, probably more like 1 request every few minutes.Yeah, absolutely not in production! Cannot stress enough, this is all alpha quality.
Sep 18
So in the last couple of weeks I've been porting vibe-core, vibe-stream, vibe-inet and vibe-http to:I guess vibe-stream/-inet/-http just need to be adjusted due to limitations of vibe-core-lite? Would it make sense to upstream those in some way (`version (Have_vibe_core_lite)` if necessary) to avoid diverging more than necessary? More broadly, it would be interesting how to best organize this in a way that avoids code duplication as much as possible and ensures that the APIs don't deviate (although vibe-core has been very stable).2. There is no Interruptible* mutexes, condvars or anything photon doesn't support the notion and code that relies on interrupt needs to be rethought (including some part of vibe.d itself).Is this a fundamental limitation, or could it be implemented in the future? I know interruption/cancellation is generally problematic to get to work across platforms, but interruptible sleep() could at least be implemented by waiting on an an event with timeout, and I guess sleep() is the most important candidate to start with.5. Fibers are scheduled roughly to the least loaded cores so all of LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying the whole thing and making it easier to scale.This is okay for `runWorkerTask`, but would be a fundamental deviation from vibe-core's threading model. Having the basic `runTask` schedule fibers on the calling thread is absolutely critical if there is to be any kind of meaningful compatibility with "non-lite" code. In general, considering that TLS is the default in D, and also considering that many libraries are either not thread-safe, or explicitly thread-local, I think it's also the right default to schedule thread-local and only schedule across multiple threads in situations where CPU load is the guiding factor. But being able to get rid of low-level synchronization can also be a big performance win. Anyway, it's great to see this progress, as well as the performance numbers!
Sep 19
On Friday, 19 September 2025 at 08:01:35 UTC, Sönke Ludwig wrote:I think stream/inet is just updating the deps to be “light”. Maybe some Interruptible* change. It would be interesting to have vibe-core-light / vibe-core compatibility. Http had some less than minor changes but yes the most changes are in core.So in the last couple of weeks I've been porting vibe-core, vibe-stream, vibe-inet and vibe-http to:I guess vibe-stream/-inet/-http just need to be adjusted due to limitations of vibe-core-lite? Would it make sense to upstream those in some way (`version (Have_vibe_core_lite)` if necessary) to avoid diverging more than necessary?More broadly, it would be interesting how to best organize this in a way that avoids code duplication as much as possible and ensures that the APIs don't deviate (although vibe-core has been very stable).Agreed.The limitation is this - photon operates inside of syscall wrappers, those are nothrow so if we get interrupted there is no way to throw anything. Plus this could be deep in some C library, not sure how exception would propagate but likely missing cleanup in the C side.2. There is no Interruptible* mutexes, condvars or anything photon doesn't support the notion and code that relies on interrupt needs to be rethought (including some part of vibe.d itself).Is this a fundamental limitation, or could it be implemented in the future?I know interruption/cancellation is generally problematic to get to work across platforms, but interruptible sleep() could at least be implemented by waiting on an an event with timeout, and I guess sleep() is the most important candidate to start with.Sleep is trivial but also kind of pointless, if you want to interrupt why not wait on the event and trigger that?I on the other hand imagine that it’s not. In year 2025 not utilizing all of available cores is shameful. The fact that I had to dig around to find how vibe.d is supposed to run on multiple cores is telling.5. Fibers are scheduled roughly to the least loaded cores so all of LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying the whole thing and making it easier to scale.This is okay for `runWorkerTask`, but would be a fundamental deviation from vibe-core's threading model. Having the basic `runTask` schedule fibers on the calling thread is absolutely critical if there is to be any kind of meaningful compatibility with "non-lite" code.In general, considering that TLS is the default in D, and also considering that many libraries are either not thread-safe, or explicitly thread-local, I think it's also the right default to schedule thread-local and only schedule across multiple threads in situations where CPU load is the guiding factor. But being able to get rid of low-level synchronization can also be a big performance win.Most TLS using libs would work just fine as long as they are not pretending to be “globals” and the whole program to be single threaded. Say TLS random has thread-local state but there is no problem with multiple fibers sharing this state nor any problem that fibers in different threads do not “see” each other changes to this state.Anyway, it's great to see this progress, as well as the performance numbers!Yeah, but I still think there is potential to go faster ;)
Sep 19
Am 19.09.25 um 12:33 schrieb Dmitry Olshansky:Shouldn't it still be possible to set an "interrupted" flag somewhere and let only the vibe-core-lite APIs throw? Low level C functions should of course stay unaffected.The limitation is this - photon operates inside of syscall wrappers, those are nothrow so if we get interrupted there is no way to throw anything. Plus this could be deep in some C library, not sure how exception would propagate but likely missing cleanup in the C side.2. There is no Interruptible* mutexes, condvars or anything photon doesn't support the notion and code that relies on interrupt needs to be rethought (including some part of vibe.d itself).Is this a fundamental limitation, or could it be implemented in the future?It's more of a timeout pattern that I've seen multiple times, there are certainly multiple (better) alternatives, but if compatibility with existing code is the goal then this would still be important.I know interruption/cancellation is generally problematic to get to work across platforms, but interruptible sleep() could at least be implemented by waiting on an an event with timeout, and I guess sleep() is the most important candidate to start with.Sleep is trivial but also kind of pointless, if you want to interrupt why not wait on the event and trigger that?Telling in what way? It's really quite simple, you can use plain D threads as normal, or you can use task pools, either explicitly, or through the default worker task pool using `runWorkerTask` or `runWorkerTaskDist`. (Then there are also higher level concepts, such as async, performInWorker or parallel(Unordered)Map) Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I on the other hand imagine that it’s not. In year 2025 not utilizing all of available cores is shameful. The fact that I had to dig around to find how vibe.d is supposed to run on multiple cores is telling.5. Fibers are scheduled roughly to the least loaded cores so all of LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying the whole thing and making it easier to scale.This is okay for `runWorkerTask`, but would be a fundamental deviation from vibe-core's threading model. Having the basic `runTask` schedule fibers on the calling thread is absolutely critical if there is to be any kind of meaningful compatibility with "non-lite" code.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern. By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads. Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.In general, considering that TLS is the default in D, and also considering that many libraries are either not thread-safe, or explicitly thread-local, I think it's also the right default to schedule thread-local and only schedule across multiple threads in situations where CPU load is the guiding factor. But being able to get rid of low-level synchronization can also be a big performance win.Most TLS using libs would work just fine as long as they are not pretending to be “globals” and the whole program to be single threaded. Say TLS random has thread-local state but there is no problem with multiple fibers sharing this state nor any problem that fibers in different threads do not “see” each other changes to this state.
Sep 19
On Friday, 19 September 2025 at 13:22:48 UTC, Sönke Ludwig wrote:Am 19.09.25 um 12:33 schrieb Dmitry Olshansky:Since vibe-core-light depends on syscalls this would mean creating a separate set of API for vibe-core-light which is not something I’d like to do.Shouldn't it still be possible to set an "interrupted" flag somewhere and let only the vibe-core-lite APIs throw? Low level C functions should of course stay unaffected.The limitation is this - photon operates inside of syscall wrappers, those are nothrow so if we get interrupted there is no way to throw anything. Plus this could be deep in some C library, not sure how exception would propagate but likely missing cleanup in the C side.2. There is no Interruptible* mutexes, condvars or anything photon doesn't support the notion and code that relies on interrupt needs to be rethought (including some part of vibe.d itself).Is this a fundamental limitation, or could it be implemented in the future?I guess, again most likely I’d need to create API specifically for vibe. Also that would mean interrupt becomes part of photon but only works when certain APIs are used. This is bad.It's more of a timeout pattern that I've seen multiple times, there are certainly multiple (better) alternatives, but if compatibility with existing code is the goal then this would still be important.I know interruption/cancellation is generally problematic to get to work across platforms, but interruptible sleep() could at least be implemented by waiting on an an event with timeout, and I guess sleep() is the most important candidate to start with.Sleep is trivial but also kind of pointless, if you want to interrupt why not wait on the event and trigger that?That running single threaded is the intended model.Telling in what way?I on the other hand imagine that it’s not. In year 2025 not utilizing all of available cores is shameful. The fact that I had to dig around to find how vibe.d is supposed to run on multiple cores is telling.5. Fibers are scheduled roughly to the least loaded cores so all of LocalThis LocalThat are in fact SharedThis and SharedThat, simplifying the whole thing and making it easier to scale.This is okay for `runWorkerTask`, but would be a fundamental deviation from vibe-core's threading model. Having the basic `runTask` schedule fibers on the calling thread is absolutely critical if there is to be any kind of meaningful compatibility with "non-lite" code.It's really quite simple, you can use plain D threads as normal, or you can use task pools, either explicitly, or through the default worker task pool using `runWorkerTask` or `runWorkerTaskDist`. (Then there are also higher level concepts, such as async, performInWorker or parallel(Unordered)Map)This does little to the most important case - handling requests in parallel. Yeah there are pool and such for cases where going parallel inside of a single request makes sense.Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread.TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern.In general, considering that TLS is the default in D, and also considering that many libraries are either not thread-safe, or explicitly thread-local, I think it's also the right default to schedule thread-local and only schedule across multiple threads in situations where CPU load is the guiding factor. But being able to get rid of low-level synchronization can also be a big performance win.Most TLS using libs would work just fine as long as they are not pretending to be “globals” and the whole program to be single threaded. Say TLS random has thread-local state but there is no problem with multiple fibers sharing this state nor any problem that fibers in different threads do not “see” each other changes to this state.By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads.Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care.Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.
Sep 19
On 20/09/2025 4:29 AM, Dmitry Olshansky wrote:Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is /usually/ by running multiple /processes/ in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach. There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.And more importantly you don't pay for anywhere the same number of context switches if you can let IOCP/epoll handle scheduling. But alas, that means thread safety which fibers can't do.
Sep 19
Am 19.09.25 um 18:29 schrieb Dmitry Olshansky:So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Shouldn't it still be possible to set an "interrupted" flag somewhere and let only the vibe-core-lite APIs throw? Low level C functions should of course stay unaffected.Since vibe-core-light depends on syscalls this would mean creating a separate set of API for vibe-core-light which is not something I’d like to do.It's more of a timeout pattern that I've seen multiple times, there are certainly multiple (better) alternatives, but if compatibility with existing code is the goal then this would still be important.I guess, again most likely I’d need to create API specifically for vibe. Also that would mean interrupt becomes part of photon but only works when certain APIs are used. This is bad.Obviously this is wrong, though.That running single threaded is the intended model.I on the other hand imagine that it’s not. In year 2025 not utilizing all of available cores is shameful. The fact that I had to dig around to find how vibe.d is supposed to run on multiple cores is telling.Telling in what way?``` runWorkerTaskDist({ HTTPServerSettings settings; settings.options |= HTTPServerOption.reusePort; listenHTTP(settings); }); ```It's really quite simple, you can use plain D threads as normal, or you can use task pools, either explicitly, or through the default worker task pool using `runWorkerTask` or `runWorkerTaskDist`. (Then there are also higher level concepts, such as async, performInWorker or parallel(Unordered)Map)This does little to the most important case - handling requests in parallel. Yeah there are pool and such for cases where going parallel inside of a single request makes sense.Anything client side involving a user interface has plenty of opportunities for employing secondary tasks or long-running sparsely updated state logic that are not CPU bound. Most of the time is spent idle there. Specific computations on the other hand can of course still be handed off to other threads.Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread.The problem is that for example you might have a handle that was created in thread A and is not valid in thread B, or you set a state in thread A and thread B doesn't see that state. This would mean that you are limited to a single task for the complete library interaction.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern.TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS.This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads.Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care.The GC/malloc is the main reason why this is mostly false in practice, but it extends to any central contention source within the process - yes, often you can avoid that, but often that takes a lot of extra work and processes sidestep that issue in the first place. Also, in the usual case where the threads don't have to communicate with each other (apart from memory allocation synchronization), a separate process per core isn't any slower - except maybe when hyper-threading is in play, but whether that helps or hurts performance always depends on the concrete workload. Separate process also have the advantage of being more robust and enabling seamless restarts and updates of the executable. And they facilitate an application design that lends itself to scaling across multiple machines.Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.
Sep 19
On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Am 19.09.25 um 18:29 schrieb Dmitry Olshansky:Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Shouldn't it still be possible to set an "interrupted" flag somewhere and let only the vibe-core-lite APIs throw? Low level C functions should of course stay unaffected.Since vibe-core-light depends on syscalls this would mean creating a separate set of API for vibe-core-light which is not something I’d like to do.It's more of a timeout pattern that I've seen multiple times, there are certainly multiple (better) alternatives, but if compatibility with existing code is the goal then this would still be important.I guess, again most likely I’d need to create API specifically for vibe. Also that would mean interrupt becomes part of photon but only works when certain APIs are used. This is bad.All the examples plus your last statement on process per core being better makes me conclude that. I don't see how I'm wrong here.Obviously this is wrong, though.That running single threaded is the intended model.I on the other hand imagine that it’s not. In year 2025 not utilizing all of available cores is shameful. The fact that I had to dig around to find how vibe.d is supposed to run on multiple cores is telling.Telling in what way?Yet this is not the default, and the default is basically single threaded. We have different opinions on what the default should be obviously.``` runWorkerTaskDist({ HTTPServerSettings settings; settings.options |= HTTPServerOption.reusePort; listenHTTP(settings); }); ```It's really quite simple, you can use plain D threads as normal, or you can use task pools, either explicitly, or through the default worker task pool using `runWorkerTask` or `runWorkerTaskDist`. (Then there are also higher level concepts, such as async, performInWorker or parallel(Unordered)Map)This does little to the most important case - handling requests in parallel. Yeah there are pool and such for cases where going parallel inside of a single request makes sense.Latency still going to be better if multiple cores are utilized. And I'm still not sure what the example is.Anything client side involving a user interface has plenty of opportunities for employing secondary tasks or long-running sparsely updated state logic that are not CPU bound. Most of the time is spent idle there. Specific computations on the other hand can of course still be handed off to other threads.Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread.Or just initialize it lazily in all threads that happen to use it. Otherwise, this is basically stick to one thread really.The problem is that for example you might have a handle that was created in thread A and is not valid in thread B, or you set a state in thread A and thread B doesn't see that state. This would mean that you are limited to a single task for the complete library interaction.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern.TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS.I have go and goOnSameThread. Guess which is the encouraged option.This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads.Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care.As is observable from the look on other languages and runtimes malloc is not the bottleneck it used to be. Our particular version of GC that doesn't have thread caches is a bottleneck.The GC/malloc is the main reason why this is mostly false in practice, but it extends to any central contention source within the process - yes, often you can avoid that, but often that takes a lot of extra work and processes sidestep that issue in the first place.Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.Also, in the usual case where the threads don't have to communicate with each other (apart from memory allocation synchronization), a separate process per core isn't any slower - except maybe when hyper-threading is in play, but whether that helps or hurts performance always depends on the concrete workload.The fact that context switch has to drop all of virtual address spaces does add a bit of overhead. Though to be certain of anything there better be a benchmark.Separate process also have the advantage of being more robust and enabling seamless restarts and updates of the executable. And they facilitate an application design that lends itself to scaling across multiple machines.Then give me the example code to run multiple vibe.d in parallel processes (should be simillar to runDist) and we can compare approaches. For all I know it could be faster then multi-threaded vibe.d-light. Also honestly if vibe.d's target is multiple processes it should probably start like this by default.
Sep 22
Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Why can't you then use poll() to for example implement `ManualEvent` with timeout and interrupt support? And shouldn't recv() with timeout be implementable the same way, poll with timeout and only read when ready?So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.I think we have a misunderstanding of what vibe.d is supposed to be. It seems like you are only focused on the web/server role, while to me vibe-core is a general-purpose I/O and concurrency system with no particular specialization in server tasks. With that view, your statement to me sounds like "Clearly D is not meant to do multi-threading, since main() is only running in a single thread". Of course, there could be a high-level component on top of vibe-d:web that makes some opinionated assumptions on how to structure a web application to ensure it is scalable, but that would go against the idea of being a toolkit with functional building blocks, as opposed to a framework that dictates your application structure.All the examples plus your last statement on process per core being better makes me conclude that. I don't see how I'm wrong here.Obviously this is wrong, though.Telling in what way?That running single threaded is the intended model.(...) ``` runWorkerTaskDist({ HTTPServerSettings settings; settings.options |= HTTPServerOption.reusePort; listenHTTP(settings); }); ```Yet this is not the default, and the default is basically single threaded. We have different opinions on what the default should be obviously.We are comparing fiber switches and working on data with a shared cache and no synchronization to synchronizing data access and control flow between threads/cores. There is such a broad spectrum of possibilities for one of those to be faster than the other that it's just silly to make a general statement like that. The thing is that if you always share data between threads, you have to pay for that for every single data access, regardless of whether there is actual concurrency going on or not. If you want a concrete example, take a simple download dialog with a progress bar. There is no gain in off-loading anything to a separate thread here, since this is fully I/O bound, but it adds quite some communication complexity if you do. CPU performance is simply not a concern here.Latency still going to be better if multiple cores are utilized. And I'm still not sure what the example is.Anything client side involving a user interface has plenty of opportunities for employing secondary tasks or long-running sparsely updated state logic that are not CPU bound. Most of the time is spent idle there. Specific computations on the other hand can of course still be handed off to other threads.Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread.But then it's a different handle representing a different object - that's not the same thing. I'm not just talking about initializing the library as a whole. But even if, there are a lot of libraries that don't use TLS and are simply not thread-safe at all.Or just initialize it lazily in all threads that happen to use it. Otherwise, this is basically stick to one thread really.The problem is that for example you might have a handle that was created in thread A and is not valid in thread B, or you set a state in thread A and thread B doesn't see that state. This would mean that you are limited to a single task for the complete library interaction.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern.TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS.Does go() enforce proper use of shared/immutable when passing data to the scheduled "go routine"?I have go and goOnSameThread. Guess which is the encouraged option.This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads.Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care.malloc() will also always be a bottleneck with the right load. Just the n times larger amount of virtual address space required may start to become an issue for memory heavy applications. But even if ignore that, ruling out using the existing GC doesn't sound like a good idea to me. And the fact is that, even with relatively mild GC use, a web application will not scale properly with many cores.As is observable from the look on other languages and runtimes malloc is not the bottleneck it used to be. Our particular version of GC that doesn't have thread caches is a bottleneck.The GC/malloc is the main reason why this is mostly false in practice, but it extends to any central contention source within the process - yes, often you can avoid that, but often that takes a lot of extra work and processes sidestep that issue in the first place.Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.There is no context switch involved with each process running on its own core.Also, in the usual case where the threads don't have to communicate with each other (apart from memory allocation synchronization), a separate process per core isn't any slower - except maybe when hyper- threading is in play, but whether that helps or hurts performance always depends on the concrete workload.The fact that context switch has to drop all of virtual address spaces does add a bit of overhead. Though to be certain of anything there better be a benchmark.Again, the "default" is a high-level issue and none of vibe-core's business. The simplest way to have that work is to use `HTTPServerOption.reusePort` and then start as many processes as desired.Separate process also have the advantage of being more robust and enabling seamless restarts and updates of the executable. And they facilitate an application design that lends itself to scaling across multiple machines.Then give me the example code to run multiple vibe.d in parallel processes (should be simillar to runDist) and we can compare approaches. For all I know it could be faster then multi-threaded vibe.d-light. Also honestly if vibe.d's target is multiple processes it should probably start like this by default.
Sep 22
On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:Yes, recv with timeout is basically poll+recv. The problem is that then I need to support interrupts in poll. Nothing really changed. As far as manual event goes I've implemented that with custom cond var and mutex. That mutex is not interruptible as it's backed by semaphore on slow path in a form of eventfd. I might create custom mutex that is interruptible I guess but the notion of interrupts would have to be introduced to photon. I do not really like it.On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Why can't you then use poll() to for example implement `ManualEvent` with timeout and interrupt support? And shouldn't recv() with timeout be implementable the same way, poll with timeout and only read when ready?So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.I think we have a misunderstanding of what vibe.d is supposed to be. It seems like you are only focused on the web/server role, while to me vibe-core is a general-purpose I/O and concurrency system with no particular specialization in server tasks. With that view, your statement to me sounds like "Clearly D is not meant to do multi-threading, since main() is only running in a single thread".The defaults are what is important. Go defaults to multi-threading for instance. D defaults to multi-threading because TLS by default is certainly a mark of multi-threaded environment. std.concurrency defaults to new thread per spawn, again this tells me it's about multithreading. I intend to support multi-threading by default. I understand that we view this issue differently.Of course, there could be a high-level component on top of vibe-d:web that makes some opinionated assumptions on how to structure a web application to ensure it is scalable, but that would go against the idea of being a toolkit with functional building blocks, as opposed to a framework that dictates your application structure.Agreed.Obviously, we should strive to share responsibly. Photon has Channels much like vibe-core has Channel. Mine are MPSC though, mostly to model Input/Output range concepts.We are comparing fiber switches and working on data with a shared cache and no synchronization to synchronizing data access and control flow between threads/cores. There is such a broad spectrum of possibilities for one of those to be faster than the other that it's just silly to make a general statement like that. The thing is that if you always share data between threads, you have to pay for that for every single data access, regardless of whether there is actual concurrency going on or not.Latency still going to be better if multiple cores are utilized. And I'm still not sure what the example is.Anything client side involving a user interface has plenty of opportunities for employing secondary tasks or long-running sparsely updated state logic that are not CPU bound. Most of the time is spent idle there. Specific computations on the other hand can of course still be handed off to other threads.Not everything is CPU bound and using threads "just because" doesn't make sense either. This is especially true, because of low level race conditions that require special care. D's shared/immutable helps with that, but that also means that your whole application suddenly needs to use shared/immutable when passing data between tasks.I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread.If you want a concrete example, take a simple download dialog with a progress bar. There is no gain in off-loading anything to a separate thread here, since this is fully I/O bound, but it adds quite some communication complexity if you do. CPU performance is simply not a concern here.Channels tame the complexity. Yes, channels could get more expansive in multi-threaded scenario but we already agreed that it's not CPU bound.Something that is not thread-safe at all is a dying breed. It's been 20 years that we have multi-cores. Most libraries can be initialized once per thread which is quite naturally modeled with TLS handle to said library. Communicating between fibers via shared TLS handle is not something I would recommend regardless of the default spawn behavior.But then it's a different handle representing a different object - that's not the same thing. I'm not just talking about initializing the library as a whole. But even if, there are a lot of libraries that don't use TLS and are simply not thread-safe at all.Or just initialize it lazily in all threads that happen to use it. Otherwise, this is basically stick to one thread really.The problem is that for example you might have a handle that was created in thread A and is not valid in thread B, or you set a state in thread A and thread B doesn't see that state. This would mean that you are limited to a single task for the complete library interaction.But TLS variables are always "globals" in the sense that they outlive the scope that accesses them. A modification in one thread would obviously not be visible in another thread, meaning that you may or may not have a semantic connection when you access such a library sequentially from multiple tasks. And then there are said libraries that are not thread-safe at all, or are bound to the thread where you initialize them. Or handles returned from a library may be bound to the thread that created them. Dealing with all of this just becomes needlessly complicated and error-prone, especially if CPU cycles are not a concern.TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS.It goes with the same API as we have for threads - a delegate, so sharing becomes user's responsibility. I may add function + args for better handling of resources passed to the lambda.Does go() enforce proper use of shared/immutable when passing data to the scheduled "go routine"?I have go and goOnSameThread. Guess which is the encouraged option.This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.By robbing the user the control over where a task spawns, you are also forcing synchronization everywhere, which can quickly become more expensive than any benefits you would gain from using multiple threads.Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care.The existing GC is basically 20+ years old, ofc we need better GC and thread cached allocation solves contention in multi-threaded environments. Alternative memory allocator is doing great on 320 core machines. I cannot tell you which allocator that is or what exactly these servers are. Though even jemalloc does okayish.malloc() will also always be a bottleneck with the right load. Just the n times larger amount of virtual address space required may start to become an issue for memory heavy applications. But even if ignore that, ruling out using the existing GC doesn't sound like a good idea to me.As is observable from the look on other languages and runtimes malloc is not the bottleneck it used to be. Our particular version of GC that doesn't have thread caches is a bottleneck.The GC/malloc is the main reason why this is mostly false in practice, but it extends to any central contention source within the process - yes, often you can avoid that, but often that takes a lot of extra work and processes sidestep that issue in the first place.Finally, in the case of web applications, in my opinion the better approach for using multiple CPU cores is *usually* by running multiple *processes* in parallel, as opposed to multiple threads within a single process. Of course, every application is different and there is no one-size-fits-all approach.There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.And the fact is that, even with relatively mild GC use, a web application will not scale properly with many cores.Only partially agree, Java's GC handles load just fine and runs faster than vibe.d(-light). It does allocations on its serving code path.Yeah, pinning down cores works, I stand corrected.There is no context switch involved with each process running on its own core.Also, in the usual case where the threads don't have to communicate with each other (apart from memory allocation synchronization), a separate process per core isn't any slower - except maybe when hyper- threading is in play, but whether that helps or hurts performance always depends on the concrete workload.The fact that context switch has to drop all of virtual address spaces does add a bit of overhead. Though to be certain of anything there better be a benchmark.So I did just that. To my surprise it indeed speeds up all of my D server examples. The speed ups are roughly: On vibe-http-light: 8 cores 1.14 12 cores 1.10 16 cores 1.08 24 cores 1.05 32 cores 1.06 48 cores 1.07 On vibe-http-classic: 8 cores 1.33 12 cores 1.45 16 cores 1.60 24 cores 2.54 32 cores 4.44 48 cores 8.56 On plain photon-http: 8 cores 1.15 12 cores 1.10 16 cores 1.09 24 cores 1.05 32 cores 1.07 48 cores 1.04 We should absolutely tweak vibe.d TechEmpower benchmark to run vibe.d as a process per core! As far as photon-powered versions go I see there is a point where per-process becomes less of a gain with more cores, so I would think there are 2 factors at play one positive and one negative, with negative being tied to the number of processes. Lastly, I have found opportunities to speed up vibe-http even without switching to vibe-core-light. Will send PRs.Again, the "default" is a high-level issue and none of vibe-core's business. The simplest way to have that work is to use `HTTPServerOption.reusePort` and then start as many processes as desired.Separate process also have the advantage of being more robust and enabling seamless restarts and updates of the executable. And they facilitate an application design that lends itself to scaling across multiple machines.Then give me the example code to run multiple vibe.d in parallel processes (should be simillar to runDist) and we can compare approaches. For all I know it could be faster then multi-threaded vibe.d-light. Also honestly if vibe.d's target is multiple processes it should probably start like this by default.
Sep 23
On Tuesday, 23 September 2025 at 15:35:47 UTC, Dmitry Olshansky wrote:On Monday, 22 September 2025 at 11:14:17 UTC, Sönke LudwigThat 15% speed was suspicious, so I looked closer into what I was doing and indeed I was launching N+1 processes due to a bug in my script. 9 / 8 ~ 1.15 so here goes.Again, the "default" is a high-level issue and none of vibe-core's business. The simplest way to have that work is to use `HTTPServerOption.reusePort` and then start as many processes as desired.So I did just that. To my surprise it indeed speeds up all of my D server examples.The speed ups are roughly: On vibe-http-light: 8 cores 1.14 12 cores 1.10 16 cores 1.08 24 cores 1.05 32 cores 1.06 48 cores 1.07Proper numbers: 8 cores 1.01 12 cores 1.01 16 cores 1.02 24 cores 1.00 32 cores 1.02 48 cores 1.05On plain photon-http: 8 cores 1.15 12 cores 1.10 16 cores 1.09 24 cores 1.05 32 cores 1.07 48 cores 1.04Proper numbers: 8 cores 1.02 12 cores 1.02 16 cores 1.01 24 cores 1.02 32 cores 1.04 48 cores 1.02 So it *seems* there is still a little bit of gain, I'm investigating where the benefit actually comes from, keeping in mind that there is some noise in the benchmark itself.Lastly, I have found opportunities to speed up vibe-http even without switching to vibe-core-light. Will send PRs.First one to go: https://github.com/vibe-d/vibe-http/pull/65
Sep 24
Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:I'd probably create an additional event FD per thread used to signal interruption and also pass that to any poll() that is used for interruptible wait.Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:Yes, recv with timeout is basically poll+recv. The problem is that then I need to support interrupts in poll. Nothing really changed. As far as manual event goes I've implemented that with custom cond var and mutex. That mutex is not interruptible as it's backed by semaphore on slow path in a form of eventfd. I might create custom mutex that is interruptible I guess but the notion of interrupts would have to be introduced to photon. I do not really like it.On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Why can't you then use poll() to for example implement `ManualEvent` with timeout and interrupt support? And shouldn't recv() with timeout be implementable the same way, poll with timeout and only read when ready?So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.But you are comparing different defaults here. With plain D, you also have to import either `core.thread` or `std.concurrency`/`std.paralellism` to do any multi-threaded work. The same is true for vibe-core. What you propose would be more comparable to having foreach() operate like parallelForeach(), with far-reaching consequences. If we are just talking about naming - runTask/runWorkerTask vs. go/goOnSameThread - that is of course debatable, but in that case I think it's blown very much out of proportion to take that as the basis to claim "it's meant to be used single-threaded".I think we have a misunderstanding of what vibe.d is supposed to be. It seems like you are only focused on the web/server role, while to me vibe-core is a general-purpose I/O and concurrency system with no particular specialization in server tasks. With that view, your statement to me sounds like "Clearly D is not meant to do multi- threading, since main() is only running in a single thread".The defaults are what is important. Go defaults to multi-threading for instance. D defaults to multi-threading because TLS by default is certainly a mark of multi-threaded environment. std.concurrency defaults to new thread per spawn, again this tells me it's about multithreading. I intend to support multi-threading by default. I understand that we view this issue differently.True, but it's still not free (as in CPU cycles and code complexity) and you can't always control all code involved.Obviously, we should strive to share responsibly. Photon has Channels much like vibe-core has Channel. Mine are MPSC though, mostly to model Input/Output range concepts.We are comparing fiber switches and working on data with a shared cache and no synchronization to synchronizing data access and control flow between threads/cores. There is such a broad spectrum of possibilities for one of those to be faster than the other that it's just silly to make a general statement like that. The thing is that if you always share data between threads, you have to pay for that for every single data access, regardless of whether there is actual concurrency going on or not.Anything client side involving a user interface has plenty of opportunities for employing secondary tasks or long-running sparsely updated state logic that are not CPU bound. Most of the time is spent idle there. Specific computations on the other hand can of course still be handed off to other threads.Latency still going to be better if multiple cores are utilized. And I'm still not sure what the example is.If you have code that does a lot of these things, this just degrades code readability for absolutely no practical gain, though.If you want a concrete example, take a simple download dialog with a progress bar. There is no gain in off-loading anything to a separate thread here, since this is fully I/O bound, but it adds quite some communication complexity if you do. CPU performance is simply not a concern here.Channels tame the complexity. Yes, channels could get more expansive in multi-threaded scenario but we already agreed that it's not CPU bound.Unfortunately, those libraries are an unpleasant reality that you can't always avoid. BTW, one of the worst offenders is Apple's whole Objective-C API. Auto-release pools in particular make it extremely fragile to work with fibers at all and of course there are all kinds of hidden thread dependencies inside.Something that is not thread-safe at all is a dying breed. It's been 20 years that we have multi-cores. Most libraries can be initialized once per thread which is quite naturally modeled with TLS handle to said library. Communicating between fibers via shared TLS handle is not something I would recommend regardless of the default spawn behavior.But then it's a different handle representing a different object - that's not the same thing. I'm not just talking about initializing the library as a whole. But even if, there are a lot of libraries that don't use TLS and are simply not thread-safe at all.The problem is that for example you might have a handle that was created in thread A and is not valid in thread B, or you set a state in thread A and thread B doesn't see that state. This would mean that you are limited to a single task for the complete library interaction.Or just initialize it lazily in all threads that happen to use it. Otherwise, this is basically stick to one thread really.That means that this is completely un` safe` - C++ level memory safety. IMO this is an unacceptable default for web applications.It goes with the same API as we have for threads - a delegate, so sharing becomes user's responsibility. I may add function + args for better handling of resources passed to the lambda.Does go() enforce proper use of shared/immutable when passing data to the scheduled "go routine"?This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.I have go and goOnSameThread. Guess which is the encouraged option.I was just talking about the current D GC here. Once we have a better implementation, this can very well become a much weaker argument! However, speaking more generally, the other arguments for preferring to scale using processes still stand, and even with a better GC I would still argue that leading library users to do multi-threaded request handling is not necessarily the best default (of course it still *can* be for some applications). Anyway, the main point from my side is just that the semantics of what *is* in vibe-core-light should really match the corresponding functions in vibe-core. Apart from that, I was just telling you that your impression of it being intended to be used single-threaded is not right, which doesn't mean that the presentation shouldn't probably emphasize the multi-threaded functionality and multi-threaded request processing more.The existing GC is basically 20+ years old, ofc we need better GC and thread cached allocation solves contention in multi-threaded environments. Alternative memory allocator is doing great on 320 core machines. I cannot tell you which allocator that is or what exactly these servers are. Though even jemalloc does okayish.malloc() will also always be a bottleneck with the right load. Just the n times larger amount of virtual address space required may start to become an issue for memory heavy applications. But even if ignore that, ruling out using the existing GC doesn't sound like a good idea to me.The GC/malloc is the main reason why this is mostly false in practice, but it extends to any central contention source within the process - yes, often you can avoid that, but often that takes a lot of extra work and processes sidestep that issue in the first place.As is observable from the look on other languages and runtimes malloc is not the bottleneck it used to be. Our particular version of GC that doesn't have thread caches is a bottleneck.And the fact is that, even with relatively mild GC use, a web application will not scale properly with many cores.Only partially agree, Java's GC handles load just fine and runs faster than vibe.d(-light). It does allocations on its serving code path.Interesting, I wonder whether its the REUSE_PORT connection distribution that gets more expensive when it's working cross-process. Agreed that the TechEmpower benchmark is in dire need of being looked at. In fact I had the code checked out for a long while, intending to look into it, because it obviously didn't scale like my own benchmarks, but then I never got around to do it, being to busy with other things.So I did just that. To my surprise it indeed speeds up all of my D server examples. The speed ups are roughly: On vibe-http-light: 8 cores 1.14 12 cores 1.10 16 cores 1.08 24 cores 1.05 32 cores 1.06 48 cores 1.07 On vibe-http-classic: 8 cores 1.33 12 cores 1.45 16 cores 1.60 24 cores 2.54 32 cores 4.44 48 cores 8.56 On plain photon-http: 8 cores 1.15 12 cores 1.10 16 cores 1.09 24 cores 1.05 32 cores 1.07 48 cores 1.04 We should absolutely tweak vibe.d TechEmpower benchmark to run vibe.d as a process per core! As far as photon-powered versions go I see there is a point where per-process becomes less of a gain with more cores, so I would think there are 2 factors at play one positive and one negative, with negative being tied to the number of processes. Lastly, I have found opportunities to speed up vibe-http even without switching to vibe-core-light. Will send PRs.Again, the "default" is a high-level issue and none of vibe-core's business. The simplest way to have that work is to use `HTTPServerOption.reusePort` and then start as many processes as desired.Separate process also have the advantage of being more robust and enabling seamless restarts and updates of the executable. And they facilitate an application design that lends itself to scaling across multiple machines.Then give me the example code to run multiple vibe.d in parallel processes (should be simillar to runDist) and we can compare approaches. For all I know it could be faster then multi-threaded vibe.d-light. Also honestly if vibe.d's target is multiple processes it should probably start like this by default.
Sep 25
On Thursday, 25 September 2025 at 07:24:00 UTC, Sönke Ludwig wrote:Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:poll could be made interruptible w/o any additions it's really a yield + waiting for events. I could implement interruptible things by simply waking fiber up with special flag passed (it has such to determine which event waked us up anyway). The problem is, I do not see how to provide adequate interface for this functionality. I have an idea though - use EINTR or maybe devise my own code for interrupts, then I could interrupt at my syscall interface level. Next is the question of how do I throw from a nothrow context. Here I told you that I would need a separate APIs for interruptible things - the ones that allow interrupts. *This* is something I do not look forward to.On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:I'd probably create an additional event FD per thread used to signal interruption and also pass that to any poll() that is used for interruptible wait.Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:Yes, recv with timeout is basically poll+recv. The problem is that then I need to support interrupts in poll. Nothing really changed. As far as manual event goes I've implemented that with custom cond var and mutex. That mutex is not interruptible as it's backed by semaphore on slow path in a form of eventfd. I might create custom mutex that is interruptible I guess but the notion of interrupts would have to be introduced to photon. I do not really like it.On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Why can't you then use poll() to for example implement `ManualEvent` with timeout and interrupt support? And shouldn't recv() with timeout be implementable the same way, poll with timeout and only read when ready?So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core-lite itself.Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.So runTask is assumed to run on the same core while runWorkerTask to be run on any available core? Didn't occur to me. I thought worker pool is for blocking tasks, as there is such a pool in photon. I can just switch runTask to goOnSameThread to maximize compatibility with vibed.But you are comparing different defaults here. With plain D, you also have to import either `core.thread` or `std.concurrency`/`std.paralellism` to do any multi-threaded work. The same is true for vibe-core. What you propose would be more comparable to having foreach() operate like parallelForeach(), with far-reaching consequences. If we are just talking about naming - runTask/runWorkerTask vs. go/goOnSameThread - that is of course debatable, but in that case I think it's blown very much out of proportion to take that as the basis to claim "it's meant to be used single-threaded".I think we have a misunderstanding of what vibe.d is supposed to be. It seems like you are only focused on the web/server role, while to me vibe-core is a general-purpose I/O and concurrency system with no particular specialization in server tasks. With that view, your statement to me sounds like "Clearly D is not meant to do multi- threading, since main() is only running in a single thread".The defaults are what is important. Go defaults to multi-threading for instance. D defaults to multi-threading because TLS by default is certainly a mark of multi-threaded environment. std.concurrency defaults to new thread per spawn, again this tells me it's about multithreading. I intend to support multi-threading by default. I understand that we view this issue differently.I humbly disagree. I'd take explicit channels over global TLS variables any day.Channels tame the complexity. Yes, channels could get more expansive in multi-threaded scenario but we already agreed that it's not CPU bound.If you have code that does a lot of these things, this just degrades code readability for absolutely no practical gain, though.Yeah, I'm not in the safe world mostly. But as I said to make it more upstreamable I will switch the defaults, so that vibe-core-light provides the same guarantees as regular vibe-core does.That means that this is completely un` safe` - C++ level memory safety. IMO this is an unacceptable default for web applications.It goes with the same API as we have for threads - a delegate, so sharing becomes user's responsibility. I may add function + args for better handling of resources passed to the lambda.Does go() enforce proper use of shared/immutable when passing data to the scheduled "go routine"?This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.I have go and goOnSameThread. Guess which is the encouraged option.I'm betting more on the threaded approach, but we are just different. See also my reply on the numbers - processes are only about 1-2% faster (and the noise is easily in 0.5% range) once the GC bottleneck is handled that is.I was just talking about the current D GC here. Once we have a better implementation, this can very well become a much weaker argument! However, speaking more generally, the other arguments for preferring to scale using processes still stand, and even with a better GC I would still argue that leading library users to do multi-threaded request handling is not necessarily the best default (of course it still *can* be for some applications).malloc() will also always be a bottleneck with the right load. Just the n times larger amount of virtual address space required may start to become an issue for memory heavy applications. But even if ignore that, ruling out using the existing GC doesn't sound like a good idea to me.The existing GC is basically 20+ years old, ofc we need better GC and thread cached allocation solves contention in multi-threaded environments. Alternative memory allocator is doing great on 320 core machines. I cannot tell you which allocator that is or what exactly these servers are. Though even jemalloc does okayish.And the fact is that, even with relatively mild GC use, a web application will not scale properly with many cores.Only partially agree, Java's GC handles load just fine and runs faster than vibe.d(-light). It does allocations on its serving code path.Anyway, the main point from my side is just that the semantics of what *is* in vibe-core-light should really match the corresponding functions in vibe-core. Apart from that, I was just telling you that your impression of it being intended to be used single-threaded is not right, which doesn't mean that the presentation shouldn't probably emphasize the multi-threaded functionality and multi-threaded request processing more.Given the number of potential expectations from the user side it seems I need to update vibe-core-light to use goOnSameThread for runTask. I do not like it how I need to do extra work to launch a multi-threaded server though which is something that got me started on the whole "defaults argument".
Sep 25
Am 25.09.25 um 14:25 schrieb Dmitry Olshansky:On Thursday, 25 September 2025 at 07:24:00 UTC, Sönke Ludwig wrote:Yes, I think that should be enough to make the semantics compatible. runWorkerTask is kind of dual-use in that regard and is mostly meant for CPU workloads. There is a separate I/O worker pool for blocking I/O operations to avoid computationally expensive worker tasks getting blocked by I/O. This is definitely the area where Photon can shine, working fine for all kinds of workloads with just a single pool.Am 23.09.25 um 17:35 schrieb Dmitry Olshansky:poll could be made interruptible w/o any additions it's really a yield + waiting for events. I could implement interruptible things by simply waking fiber up with special flag passed (it has such to determine which event waked us up anyway). The problem is, I do not see how to provide adequate interface for this functionality. I have an idea though - use EINTR or maybe devise my own code for interrupts, then I could interrupt at my syscall interface level. Next is the question of how do I throw from a nothrow context. Here I told you that I would need a separate APIs for interruptible things - the ones that allow interrupts. *This* is something I do not look forward to.On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:I'd probably create an additional event FD per thread used to signal interruption and also pass that to any poll() that is used for interruptible wait.Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:Yes, recv with timeout is basically poll+recv. The problem is that then I need to support interrupts in poll. Nothing really changed. As far as manual event goes I've implemented that with custom cond var and mutex. That mutex is not interruptible as it's backed by semaphore on slow path in a form of eventfd. I might create custom mutex that is interruptible I guess but the notion of interrupts would have to be introduced to photon. I do not really like it.On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig wrote:Why can't you then use poll() to for example implement `ManualEvent` with timeout and interrupt support? And shouldn't recv() with timeout be implementable the same way, poll with timeout and only read when ready?So you don't support timeouts when waiting for an event at all? Otherwise I don't see why a separate API would be required, this should be implementable with plain Posix APIs within vibe-core- lite itself.Photon's API is the syscall interface. So to wait on an event you just call poll. Behind the scenes it will just wait on the right fd to change state. Now vibe-core-light wants something like read(buffer, timeout) which is not syscall API but maybe added. But since I'm going to add new API I'd rather have something consistent and sane not just a bunch of adhoc functions to satisfy vibe.d interface.So runTask is assumed to run on the same core while runWorkerTask to be run on any available core? Didn't occur to me. I thought worker pool is for blocking tasks, as there is such a pool in photon. I can just switch runTask to goOnSameThread to maximize compatibility with vibed.But you are comparing different defaults here. With plain D, you also have to import either `core.thread` or `std.concurrency`/ `std.paralellism` to do any multi-threaded work. The same is true for vibe-core. What you propose would be more comparable to having foreach() operate like parallelForeach(), with far-reaching consequences. If we are just talking about naming - runTask/runWorkerTask vs. go/ goOnSameThread - that is of course debatable, but in that case I think it's blown very much out of proportion to take that as the basis to claim "it's meant to be used single-threaded".I think we have a misunderstanding of what vibe.d is supposed to be. It seems like you are only focused on the web/server role, while to me vibe-core is a general-purpose I/O and concurrency system with no particular specialization in server tasks. With that view, your statement to me sounds like "Clearly D is not meant to do multi- threading, since main() is only running in a single thread".The defaults are what is important. Go defaults to multi-threading for instance. D defaults to multi-threading because TLS by default is certainly a mark of multi-threaded environment. std.concurrency defaults to new thread per spawn, again this tells me it's about multithreading. I intend to support multi-threading by default. I understand that we view this issue differently.It wouldn't usually be TLS, but just a delegate that gets passed from the UI task to the I/O task for example, implicitly operating on stack data, or on some UI structures referenced from there.I humbly disagree. I'd take explicit channels over global TLS variables any day.Channels tame the complexity. Yes, channels could get more expansive in multi-threaded scenario but we already agreed that it's not CPU bound.If you have code that does a lot of these things, this just degrades code readability for absolutely no practical gain, though.Maybe we can at least think about a possible reintroduction of a direct `listenHTTPDist`/`ListenHTTPMultiThreaded`/... API that provides a ` safe` interface - there used to be a `HTTPServerOption.distribute` that did that, but it didn't enforce `shared` properly and lead to race-conditions in practical applications, because people were not aware of the implicitly shared data or of the implications thereof.Yeah, I'm not in the safe world mostly. But as I said to make it more upstreamable I will switch the defaults, so that vibe-core-light provides the same guarantees as regular vibe-core does.That means that this is completely un` safe` - C++ level memory safety. IMO this is an unacceptable default for web applications.It goes with the same API as we have for threads - a delegate, so sharing becomes user's responsibility. I may add function + args for better handling of resources passed to the lambda.Does go() enforce proper use of shared/immutable when passing data to the scheduled "go routine"?This doesn't make sense, in the original vibe-core, you can simply choose between spawning in the same thread or in "any" thread. `shared`/`immutable` is correctly enforced in the latter case to avoid unintended data sharing.I have go and goOnSameThread. Guess which is the encouraged option.I'm betting more on the threaded approach, but we are just different. See also my reply on the numbers - processes are only about 1-2% faster (and the noise is easily in 0.5% range) once the GC bottleneck is handled that is.I was just talking about the current D GC here. Once we have a better implementation, this can very well become a much weaker argument! However, speaking more generally, the other arguments for preferring to scale using processes still stand, and even with a better GC I would still argue that leading library users to do multi-threaded request handling is not necessarily the best default (of course it still *can* be for some applications).malloc() will also always be a bottleneck with the right load. Just the n times larger amount of virtual address space required may start to become an issue for memory heavy applications. But even if ignore that, ruling out using the existing GC doesn't sound like a good idea to me.The existing GC is basically 20+ years old, ofc we need better GC and thread cached allocation solves contention in multi-threaded environments. Alternative memory allocator is doing great on 320 core machines. I cannot tell you which allocator that is or what exactly these servers are. Though even jemalloc does okayish.And the fact is that, even with relatively mild GC use, a web application will not scale properly with many cores.Only partially agree, Java's GC handles load just fine and runs faster than vibe.d(-light). It does allocations on its serving code path.Anyway, the main point from my side is just that the semantics of what *is* in vibe-core-light should really match the corresponding functions in vibe-core. Apart from that, I was just telling you that your impression of it being intended to be used single-threaded is not right, which doesn't mean that the presentation shouldn't probably emphasize the multi-threaded functionality and multi-threaded request processing more.Given the number of potential expectations from the user side it seems I need to update vibe-core-light to use goOnSameThread for runTask. I do not like it how I need to do extra work to launch a multi-threaded server though which is something that got me started on the whole "defaults argument".
Sep 26
On Friday, 19 September 2025 at 16:29:11 UTC, Dmitry Olshansky wrote:I’m dying to know which application not being cpu bound still needs to pass data between tasks that are all running on a single thread. TLS is fine for using not thread safe library - just make sure you initialize it for all threads. I do not switch or otherwise play dirty tricks with TLS. Either of default kind of rob user of control of where the task spawns. Which is sensible a user shouldn’t really care. There we differ, not only load balancing is simpler within a single application but also processes are more expansive. Current D GC situation kind of sucks on multithreaded workloads but that is the only reason to go multiprocess IMHO.I'm dying to see some statistics to show which approach is more performant in different scenarios.
Sep 21
On Thursday, 18 September 2025 at 16:00:48 UTC, Dmitry Olshansky wrote:I have been building Photon[1] scheduler library with the aim to build high performance servers but since we are already have vibe.d and everybody is using it might as well try to speed it up. Thus vibe.d-light idea was born - port vibe.d framework on top of photon.Conrgatulations on your amazing work! I also agree with you that there's no point into protecting users from threading today and this actually reflects D philosophy since it both uses TLS (so the language needs threading anyway) and being easy to write parallel code :)
Sep 19