digitalmars.D - vibe.d benchmarks
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/3) Dec 28 2015 https://www.techempower.com/benchmarks/
- Charles (4/7) Dec 28 2015 Sönke is already on it.
- Nick B (6/14) Dec 29 2015 Correct me if I am wrong here, but as far I can tell there is no
- Charles (3/19) Dec 29 2015 The last time the official benchmark was run was over a month
- yawniek (24/27) Dec 30 2015 i guess its not enough, there are still things that make vibe.d
- Daniel Kozak via Digitalmars-d (5/44) Dec 30 2015 Which async library you use for vibed? libevent? libev? or libasync?
- yawniek (6/11) Dec 30 2015 the numbers above are libevent in release mode, as per original
- Daniel Kozak via Digitalmars-d (4/20) Dec 30 2015 Thanks, it is wierd I use libasync and have quite good performance,
- Laeeth Isharc (7/34) Dec 31 2015 Isn't there a decent chance the bottleneck is vibe.d's JSON
- yawniek (15/21) Dec 31 2015 this is not the same benchmark discussed elsewhere, this one is a
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/21) Dec 31 2015 Go scores 0.5ms latency, vibe.d scores 14.7ms latency. That's a
- Etienne Cimon (6/27) Dec 31 2015 That would be the other way around. TCP_NODELAY is not enabled in
- yawniek (8/16) Dec 31 2015 obvious typo and thanks for investigating etienne.
- Daniel Kozak via Digitalmars-d (12/32) Dec 31 2015 One thing I forgot to mention I have to modify few things
- Nick B (4/22) Jan 03 2016 can someone tell me what changes need to be commited, so that we
- Etienne Cimon (5/8) Jan 03 2016 Considering that the best benchmarks are from tools that have all
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (8/14) Jan 03 2016 Fiber context changes are not a significant influence. I've created a
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (10/19) Jan 03 2016 For me, threadsPerCPU correctly yields the number of logical cores (i.e....
- Daniel Kozak via Digitalmars-d (4/32) Jan 04 2016 On my AMD FX4100 (4 cores) and my AMD AMD A10-7850K(4 core) it is
- Daniel Kozak (5/10) Dec 31 2015 When I use HTTPServerOption.distribute with libevent I get better
- Etienne Cimon (5/16) Dec 31 2015 I launch libasync programs as multiple processes, a bit like
- Daniel Kozak (3/17) Jan 01 2016 ?
- Etienne Cimon (4/22) Jan 01 2016 With libasync, you can run multiple instances of your vibe.d
- Sebastiaan Koppe (6/9) Jan 02 2016 That is nice. Didn't know that. That would enable
- Etienne Cimon (13/18) Jan 02 2016 Yes, although you might still break existing connections unless
- Daniel Kozak via Digitalmars-d (4/28) Jan 04 2016 Yes, but I speak about one instance of vibe.d with multiple
- Etienne Cimon (2/17) Jan 04 2016 Yes, I will investigate this.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Dec 31 2015 I don't know how the benchmarks are set up, but I would assume
- yawniek (5/9) Dec 31 2015 its actually pretty realistic, one point of having a fast
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) Dec 31 2015 It does not scale. If you can do it, then you don't really have a
- Atila Neves (13/58) Jan 05 2016 vibe.d _was_ faster than Go. I redid the measurements recently
- Etienne Cimon (3/17) Jan 05 2016 The Rust mio library doesn't seem to be doing any black magic. I
- rsw0x (11/32) Jan 05 2016 Have you used perf(or similar) to attempt to find bottlenecks yet?
- Nikolay (5/7) Jan 05 2016 I used perf and wrote my result here:
- Etienne (2/11) Jan 05 2016 libasync is the result of an attempt to use epoll directly
- Atila Neves (10/38) Jan 06 2016 Extensively. I optimised my D code as much as I know how to. And
- Etienne Cimon (5/27) Jan 07 2016 It's possible that those cache misses will be irrelevant when the
- Nikolay (4/8) Jan 08 2016 I believe cache-misses problem is related to old vibed version.
- Atila Neves (4/8) Jan 06 2016 No black magic, it's a thin wrapper over epoll. But it was faster
- Etienne Cimon (3/12) Jan 07 2016 You tested D+mio, but the equivalent would probably be D+libasync
- Daniel Kozak (83/110) Dec 31 2015 My results from siege(just return page with Hello World same as
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (4/17) Jan 04 2016 Can you try with the latest GIT master? There are some important
https://www.techempower.com/benchmarks/ The entries for vibe.d are either doing very poorly or fail to complete. Maybe someone should look into this?
Dec 28 2015
On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim Grøstad wrote:https://www.techempower.com/benchmarks/ The entries for vibe.d are either doing very poorly or fail to complete. Maybe someone should look into this?Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 28 2015
On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim Grøstad wrote:Correct me if I am wrong here, but as far I can tell there is no independent benchmarks showing performance (superior or good enough) of D verses Go, or against just about any other language, as well ? https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hrhttps://www.techempower.com/benchmarks/ The entries for vibe.d are either doing very poorly or fail to complete. Maybe someone should look into this?Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 29 2015
On Tuesday, 29 December 2015 at 22:49:36 UTC, Nick B wrote:On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:The last time the official benchmark was run was over a month before Sönke's PR.On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim Grøstad wrote:Correct me if I am wrong here, but as far I can tell there is no independent benchmarks showing performance (superior or good enough) of D verses Go, or against just about any other language, as well ? https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hrhttps://www.techempower.com/benchmarks/ The entries for vibe.d are either doing very poorly or fail to complete. Maybe someone should look into this?Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 29 2015
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 30 2015
V Wed, 30 Dec 2015 20:32:08 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:Which async library you use for vibed? libevent? libev? or libasync? Which compilation switches you used? Without this info it says nothing about vibe.d's performance :)i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 30 2015
On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak wrote:V Wed, 30 Dec 2015 20:32:08 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno: Which async library you use for vibed? libevent? libev? or libasync? Which compilation switches you used? Without this info it says nothing about vibe.d's performance :)the numbers above are libevent in release mode, as per original configuration. for libasync there is a problem so its stuck at 2.4 rps. etcimon is currently investigating there.
Dec 30 2015
V Wed, 30 Dec 2015 21:09:37 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak wrote:Thanks, it is wierd I use libasync and have quite good performance, probably some regression (which version of libasync?)V Wed, 30 Dec 2015 20:32:08 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno: Which async library you use for vibed? libevent? libev? or libasync? Which compilation switches you used? Without this info it says nothing about vibe.d's performance :)the numbers above are libevent in release mode, as per original configuration. for libasync there is a problem so its stuck at 2.4 rps. etcimon is currently investigating there.
Dec 30 2015
On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:Isn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 31 2015
On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:Isn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.this is not the same benchmark discussed elsewhere, this one is a simple echo thing. no json. it just states that there is some overhead around on various layers. so its testimony is very limited. from a slightly more distant view you can thus argue that 50k rps vs 150k rps basically just means that the framework will most probably not be your bottle neck. none the less, getting ahead in the benchmarks would help to attract people who are then pleasantly surprised how easy it is to make full blown services with vibe. the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.
Dec 31 2015
On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:from a slightly more distant view you can thus argue that 50k rps vs 150k rps basically just means that the framework will most probably not be your bottle neck.Go scores 0.5ms latency, vibe.d scores 14.7ms latency. That's a big difference that actually matters. Dart + MongoDB also does very well in the multiple request tests. 17300 requests versus Python + MySQL at 8800.none the less, getting ahead in the benchmarks would help to attract people who are then pleasantly surprised how easy it is to make full blown services with vibe.It also matters for people who pick a framework. Although the benchmark isn't great as general benchmarks it says something about: 1. Whether you can stick to the framework even when you need better performance, which is why the overhead versus raw platform speed is interesting. 2. That the framework has been engineered using performance measurements. It is more useful for writing dynamic web services with simple requests rather than regular web servers though.
Dec 31 2015
On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.Isn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.this is not the same benchmark discussed elsewhere, this one is a simple echo thing. no json. it just states that there is some overhead around on various layers. so its testimony is very limited. from a slightly more distant view you can thus argue that 50k rps vs 150k rps basically just means that the framework will most probably not be your bottle neck. none the less, getting ahead in the benchmarks would help to attract people who are then pleasantly surprised how easy it is to make full blown services with vibe. the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.
Dec 31 2015
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:obvious typo and thanks for investigating etienne. daniel: i made similar results over the network. i want to redo them with a more optimized setup though. my wrk server was too weak. the local results are still relevant as its a common setup to have nginx distribute to a few vibe instances locally.the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.
Dec 31 2015
V Thu, 31 Dec 2015 12:26:12 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:One thing I forgot to mention I have to modify few things vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot) To use more threads it must be setup with distribute option: settings.options |= HTTPServerOption.distribute; //setupWorkerThreads(4); // works with master listenHTTP(settings, &hello);On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:obvious typo and thanks for investigating etienne. daniel: i made similar results over the network. i want to redo them with a more optimized setup though. my wrk server was too weak. the local results are still relevant as its a common setup to have nginx distribute to a few vibe instances locally.the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.
Dec 31 2015
On Thursday, 31 December 2015 at 12:44:37 UTC, Daniel Kozak wrote:V Thu, 31 Dec 2015 12:26:12 +0000 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:can someone tell me what changes need to be commited, so that we have a chance at getting some decent (or even average) benchmark numbers ?obvious typo and thanks for investigating etienne. daniel: i made similar results over the network. i want to redo them with a more optimized setup though. my wrk server was too weak. the local results are still relevant as its a common setup to have nginx distribute to a few vibe instances locally.One thing I forgot to mention I have to modify few things vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot)
Jan 03 2016
On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:can someone tell me what changes need to be commited, so that we have a chance at getting some decent (or even average) benchmark numbers ?Considering that the best benchmarks are from tools that have all the C calls inlined, I think the best optimizations would be in pragma(inline, true), even doing inlining for fiber context changes.
Jan 03 2016
Am 04.01.2016 um 04:27 schrieb Etienne Cimon:On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:Fiber context changes are not a significant influence. I've created a proof of concept HTTP-server based in vanilla OS calls a while ago and got almost no slowdown compared to using only callbacks. The performance level was around 200% of current vibe.d. Having said that, the latest version (0.7.27-alpha.3) contains some important performance optimizations over 0.7.26 and should be used for comparisons. 0.7.26 also had a performance regression related to allocators.can someone tell me what changes need to be commited, so that we have a chance at getting some decent (or even average) benchmark numbers ?Considering that the best benchmarks are from tools that have all the C calls inlined, I think the best optimizations would be in pragma(inline, true), even doing inlining for fiber context changes.
Jan 03 2016
Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot) To use more threads it must be setup with distribute option: settings.options |= HTTPServerOption.distribute; //setupWorkerThreads(4); // works with master listenHTTP(settings, &hello);For me, threadsPerCPU correctly yields the number of logical cores (i.e. coresPerCPU * 2 for hyper threading enabled CPUs), which is usually the optimal number of threads*. What numbers did you get/expect? One actual issue could be that, judging by the name, these functions would yield the wrong numbers for multi-processor systems. I didn't try that so far. Do we have a function in Phobos/Druntime to get the number of processors? * Granted, HT won't help for pure I/O payloads, but worker threads are primarily meant for computational tasks.
Jan 03 2016
V Mon, 4 Jan 2016 08:37:10 +0100 Sönke Ludwig via Digitalmars-d <digitalmars-d puremagic.com> napsáno:Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:On my AMD FX4100 (4 cores) and my AMD AMD A10-7850K(4 core) it is return 1.vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot) To use more threads it must be setup with distribute option: settings.options |= HTTPServerOption.distribute; //setupWorkerThreads(4); // works with master listenHTTP(settings, &hello);For me, threadsPerCPU correctly yields the number of logical cores (i.e. coresPerCPU * 2 for hyper threading enabled CPUs), which is usually the optimal number of threads*. What numbers did you get/expect?One actual issue could be that, judging by the name, these functions would yield the wrong numbers for multi-processor systems. I didn't try that so far. Do we have a function in Phobos/Druntime to get the number of processors? * Granted, HT won't help for pure I/O payloads, but worker threads are primarily meant for computational tasks.
Jan 04 2016
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Dec 31 2015
On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databasesThat would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Dec 31 2015
On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon wrote:On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:?On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases[...]When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Jan 01 2016
On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon wrote:With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:?On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases[...]When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Jan 01 2016
On Saturday, 2 January 2016 at 03:00:19 UTC, Etienne Cimon wrote:With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.That is nice. Didn't know that. That would enable zero-downtime-updates right? I use docker a lot so normally I run a proxy container in front of the app containers and have it handle ssl and virtual hosts routing.
Jan 02 2016
On Saturday, 2 January 2016 at 10:05:56 UTC, Sebastiaan Koppe wrote:That is nice. Didn't know that. That would enable zero-downtime-updates right?Yes, although you might still break existing connections unless you can make the previous process wait for the existing connections to close after killing it.I use docker a lot so normally I run a proxy container in front of the app containers and have it handle ssl and virtual hosts routing.I haven't needed to migrate out of my linux server yet (12c/24t 128gb) but when I do, I'll just add another one and go for DNS round robin. I use cloudflare currently and in practice you can add/remove A records and it'll round robin through them. If your server application is capable of running as multiple instances, it's only a matter of having the database/cache servers accessible from another server and you've got a very efficient load balancing that doesn't require any proxies.
Jan 02 2016
V Sat, 02 Jan 2016 03:00:19 +0000 Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> napsáno:On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:Yes, but I speak about one instance of vibe.d with multiple workerThreads witch perform really bad with libasyncOn Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon wrote:With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:?On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases[...]When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Jan 04 2016
On Monday, 4 January 2016 at 10:32:41 UTC, Daniel Kozak wrote:V Sat, 02 Jan 2016 03:00:19 +0000 Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> napsáno:Yes, I will investigate this.On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:Yes, but I speak about one instance of vibe.d with multiple workerThreads witch perform really bad with libasyncOn Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon wrote:With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.[...]?
Jan 04 2016
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon wrote:That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.I don't know how the benchmarks are set up, but I would assume that they don't use a local socket. I wonder if they run the database on the same machine, maybe they do, but that's not realistic, so they really should not.
Dec 31 2015
On Thursday, 31 December 2015 at 15:35:45 UTC, Ola Fosheim Grøstad wrote:I don't know how the benchmarks are set up, but I would assume that they don't use a local socket. I wonder if they run the database on the same machine, maybe they do, but that's not realistic, so they really should not.its actually pretty realistic, one point of having a fast webserver is that you can save on ressources. you get a cheap box and have everything there. very common.
Dec 31 2015
On Thursday, 31 December 2015 at 15:51:50 UTC, yawniek wrote:its actually pretty realistic, one point of having a fast webserver is that you can save on ressources. you get a cheap box and have everything there. very common.It does not scale. If you can do it, then you don't really have a real need for the throughput in the first place...
Dec 31 2015
On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. AtilaIsn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Jan 05 2016
On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.[...]vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
Jan 05 2016
On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:Have you used perf(or similar) to attempt to find bottlenecks yet? If you use linux and LDC or GDC, I found it worked fine for my needs. Just compile it with optimizations & frame pointers(-fno-omit-frame-pointers for GDC and -disable-fp-elim for LDC) or dwarf debug symbols. I can't remember which generates a better callstack right now, actually, so it's probably worth playing around with under the --call-graph flag(fp or dwarf). Perf is a bit hard to understand if you've never used it before, but it's also very powerful. Bye.On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.[...]vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
Jan 05 2016
On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:Have you used perf(or similar) to attempt to find bottlenecks yet?I used perf and wrote my result here: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2 As Sönke Ludwig said direct epoll usage can give more then 200% improvements over libevent.
Jan 05 2016
On Tuesday, 5 January 2016 at 14:45:18 UTC, Nikolay wrote:On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:libasync is the result of an attempt to use epoll directlyHave you used perf(or similar) to attempt to find bottlenecks yet?I used perf and wrote my result here: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2 As Sönke Ludwig said direct epoll usage can give more then 200% improvements over libevent.
Jan 05 2016
On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:Extensively. I optimised my D code as much as I know how to. And that's the same code that gets driven by vibe.d, boost::asio and mio. Nothing stands out anymore in perf. The only main difference I can see is that the vibe.d version has far more cache misses. I used perf to try and figure out where those came from and included them in the email I sent to Soenke.On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:Have you used perf(or similar) to attempt to find bottlenecks yet?On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc wrote:The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.[...]vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. AtilaPerf is a bit hard to understand if you've never used it before, but it's also very powerful.Oh, I know. :) Atila
Jan 06 2016
On Wednesday, 6 January 2016 at 08:24:10 UTC, Atila Neves wrote:On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:It's possible that those cache misses will be irrelevant when the requests actually do something, is it not? When a lot of different requests are competing for cache lines, I'd assume it's shuffling it enough to change these readingsOn Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:Extensively. I optimised my D code as much as I know how to. And that's the same code that gets driven by vibe.d, boost::asio and mio. Nothing stands out anymore in perf. The only main difference I can see is that the vibe.d version has far more cache misses. I used perf to try and figure out where those came from and included them in the email I sent to Soenke.On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:Have you used perf(or similar) to attempt to find bottlenecks yet?[...]The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.Perf is a bit hard to understand if you've never used it before, but it's also very powerful.Oh, I know. :) Atila
Jan 07 2016
On Friday, 8 January 2016 at 04:02:39 UTC, Etienne Cimon wrote:It's possible that those cache misses will be irrelevant when the requests actually do something, is it not? When a lot of different requests are competing for cache lines, I'd assume it's shuffling it enough to change these readingsI believe cache-misses problem is related to old vibed version. There was to many context switch. Now vibed uses SO_REUSEPORT socket option. It reduces context switch count radically.
Jan 08 2016
On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:No black magic, it's a thin wrapper over epoll. But it was faster than boost::asio and vibe.d the last time I measured. Atila[...]The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Jan 06 2016
On Wednesday, 6 January 2016 at 08:21:00 UTC, Atila Neves wrote:On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:You tested D+mio, but the equivalent would probably be D+libasync as it is a standalone library, thin wrapper around epollOn Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:No black magic, it's a thin wrapper over epoll. But it was faster than boost::asio and vibe.d the last time I measured. Atila[...]The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Jan 07 2016
On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:My results from siege(just return page with Hello World same as WebFrameworkBenchmark): siege -c 20 -q -b -t30S http://127.0.0.1:8080 vibed: --combined -b release-nobounds --compiler=ldmd Transactions: 968269 hits Availability: 100.00 % Elapsed time: 29.10 secs Data transferred: 12.00 MB Response time: 0.00 secs Transaction rate: 33273.85 trans/sec Throughput: 0.41 MB/sec Concurrency: 19.62 Successful transactions: 968269 Failed transactions: 0 Longest transaction: 0.04 Shortest transaction: 0.00 vibed(one thread): Transactions: 767815 hits Availability: 100.00 % Elapsed time: 29.94 secs Data transferred: 9.52 MB Response time: 0.00 secs Transaction rate: 25645.12 trans/sec Throughput: 0.32 MB/sec Concurrency: 19.66 Successful transactions: 767815 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 GOMAXPROCS=4 go run hello.go Transactions: 765301 hits Availability: 100.00 % Elapsed time: 29.52 secs Data transferred: 8.03 MB Response time: 0.00 secs Transaction rate: 25924.83 trans/sec Throughput: 0.27 MB/sec Concurrency: 19.68 Successful transactions: 765301 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 GOMAXPROCS=1 go run hello.go Transactions: 478991 hits Availability: 100.00 % Elapsed time: 29.47 secs Data transferred: 5.02 MB Response time: 0.00 secs Transaction rate: 16253.51 trans/sec Throughput: 0.17 MB/sec Concurrency: 19.75 Successful transactions: 478992 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 UnderTow (4 cores): Transactions: 965835 hits Availability: 100.00 % Elapsed time: 29.41 secs Data transferred: 10.13 MB Response time: 0.00 secs Transaction rate: 32840.36 trans/sec Throughput: 0.34 MB/sec Concurrency: 19.57 Successful transactions: 965836 Failed transactions: 0 Longest transaction: 0.01 Shortest transaction: 0.00 Kore.io (4 workers) Transactions: 2043 hits Availability: 100.00 % Elapsed time: 29.61 secs Data transferred: 0.02 MB Response time: 0.29 secs Transaction rate: 69.00 trans/sec Throughput: 0.00 MB/sec Concurrency: 19.96 Successful transactions: 2043 Failed transactions: 0 Longest transaction: 0.55 Shortest transaction: 0.00 So it seems vibed has the best results :)i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 31 2015
Am 30.12.2015 um 21:32 schrieb yawniek:Can you try with the latest GIT master? There are some important optimizations which are not in 0.7.26 (which has at least one performance regression).i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): (...) its sad.Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Jan 04 2016