digitalmars.D - Language performance benchmark be updated 2019/11/09
- zoujiaqing (40/40) Nov 14 2019 | Language | Time, s | Memory, MiB |
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/5) Nov 15 2019 Sadly, the benchmark entries appears to use different
- aliak (6/11) Nov 15 2019 You mean for sorting one uses quick sort while another uses
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Nov 15 2019 The testset are very limited.
- Gregor =?UTF-8?B?TcO8Y2ts?= (6/18) Nov 15 2019 The Json test uses very different parser implementations. There
- Jacob Shtokolov (71/72) Nov 16 2019 Sorry, but have you tried it by yourself?
- Jacob Shtokolov (16/17) Nov 16 2019 Just tried to compile and run Base64
- Jacob Shtokolov (13/14) Nov 16 2019 The Havlak test is closer to reality:
- Daniel Kozak (4/18) Nov 17 2019 LDC binary is ok, this is about GC, I was able to make it lamost as
- Jacob Shtokolov (5/7) Nov 17 2019 Just checked the code and found that they're using allocations
- Daniel Kozak (4/11) Nov 17 2019 Sorry I missed insert the link. It is on my github:
- Jacob Shtokolov (11/13) Nov 17 2019 Now it's faster than the C++ version on my machine:
- Daniel Kozak (3/6) Nov 17 2019 Not only, other change is not filling number AA with UNVISITED, the
- Jon Degenhardt (30/37) Nov 17 2019 Regarding the benefits seen from switching from AAs to Appenders
- JN (11/17) Nov 18 2019 I think it signifies a deeper problem with these kind of
- bachmeier (10/16) Nov 18 2019 If you're in a position where you care about "fast as possible"
- Jon Degenhardt (14/30) Nov 18 2019 Yes, there are often multiple goals behind a benchmark like this,
- Daniel Kozak (13/20) Nov 17 2019 original code
- James Blachly (2/5) Nov 17 2019 Can you summarize or share the changes for learning purposes?
- kinke (5/28) Nov 17 2019 With full LTO, I'm seeing an additional 5% boost on Windows
- IGotD- (6/20) Nov 17 2019 C++ memory consumption is way lower than the rest. Is this
- IGotD- (4/21) Nov 16 2019 Why is C++ doing so badly? Is it because of inefficient usage of
- Jacob Shtokolov (6/7) Nov 16 2019 Looks like that's because they're using some libcrypto APIs (like
| Language | Time, s | Memory, MiB | | --------------- | ------- | ----------- | | Kotlin | 2.01 | 37.6 | | Nim Gcc | 2.17 | 0.7 | | C++ Gcc | 2.41 | 1.7 | | OCaml | 2.50 | 4.4 | | Go | 2.94 | 1.5 | | Java | 3.05 | 37.2 | | Crystal | 3.06 | 2.7 | | ML MLton | 3.22 | 0.7 | | Go Gcc | 3.30 | 19.2 | | Rust | 3.43 | 0.8 | | Nim Clang | 3.43 | 1.0 | | D Ldc | 3.57 | 1.4 | | D Gdc | 3.72 | 5.8 | | Scala | 4.30 | 136.3 | | D Dmd | 4.74 | 3.3 | | Haskell (MArray)| 6.88 | 3.5 | | Javascript Node | 6.97 | 31.5 | | V Gcc | 7.30 | 0.8 | | V Clang | 9.06 | 1.0 | | Racket | 10.49 | 77.4 | | LuaJIT | 10.99 | 2.1 | | Python PyPy | 21.51 | 95.4 | | Chez Scheme | 24.72 | 29.2 | | Haskell | 29.14 | 3.4 | | Ruby truffle | 32.52 | 613.3 | | Ruby JRuby | 180.65 | 241.7 | | Ruby | 191.36 | 13.1 | | Lua 5.3 | 201.26 | 1.4 | | Elixir | 279.03 | 48.9 | | Python3 | 388.22 | 7.8 | | Python | 399.75 | 6.2 | | Tcl (FP) | 494.78 | 4.3 | | Perl | 769.17 | 5.2 | | Tcl (OO) | 1000.55 | 4.3 | https://github.com/kostya/benchmarks
Nov 14 2019
On Friday, 15 November 2019 at 03:31:24 UTC, zoujiaqing wrote:https://github.com/kostya/benchmarksSadly, the benchmark entries appears to use different algorithms... despite the site claiming otherwise. As far as I can tell...
Nov 15 2019
On Friday, 15 November 2019 at 08:25:26 UTC, Ola Fosheim Grøstad wrote:On Friday, 15 November 2019 at 03:31:24 UTC, zoujiaqing wrote:You mean for sorting one uses quick sort while another uses bubble or something to that affect? Did you check a fair amount and found them all different? (I haven't looked yet obviously, and trying to avoid it depending on how much you peaked :) )https://github.com/kostya/benchmarksSadly, the benchmark entries appears to use different algorithms... despite the site claiming otherwise. As far as I can tell...
Nov 15 2019
On Friday, 15 November 2019 at 18:05:35 UTC, aliak wrote:You mean for sorting one uses quick sort while another uses bubble or something to that affect? Did you check a fair amount and found them all different?The testset are very limited. Matrix multiplication for one... If you benchmark a language that calls into a C-implementation library... and get twice as good results as the C-benchmark... then you know something is not right! :-D
Nov 15 2019
On Friday, 15 November 2019 at 18:05:35 UTC, aliak wrote:On Friday, 15 November 2019 at 08:25:26 UTC, Ola Fosheim Grøstad wrote:The Json test uses very different parser implementations. There are even multiple implementations for D. One of them uses the fast I/O library while the other one ises std.json, I think. I haven't checked the others, but I expect them to have a similar spread.On Friday, 15 November 2019 at 03:31:24 UTC, zoujiaqing wrote:You mean for sorting one uses quick sort while another uses bubble or something to that affect? Did you check a fair amount and found them all different? (I haven't looked yet obviously, and trying to avoid it depending on how much you peaked :) )https://github.com/kostya/benchmarksSadly, the benchmark entries appears to use different algorithms... despite the site claiming otherwise. As far as I can tell...
Nov 15 2019
On Friday, 15 November 2019 at 03:31:24 UTC, zoujiaqing wrote:https://github.com/kostya/benchmarksSorry, but have you tried it by yourself? I'm running this benchmark outside of Docker, and the numbers I see are very interesting. First of all, the time measurement script is quite questionable. It's a Ruby script, and I found that the running time depends on the shell and environment. For example, when I run it under VSCode's console window, I get 20% worse results in time for all binaries than when I run it under the regular terminal emulator window. Second, the numbers are, hmmm, how to say... bullshit? Pardon my French! Here is what I get for the Brainfuck2 mandelbrot benchmark (a simple Brainfuck interpreter implemented in different languages): C++ gcc version 7.4.0 (g++ -flto -O3 -o bin_cpp bf.cpp): ``` $ ../xtime.rb ./bin_cpp mandel.b 18.05s, 3.6Mb ``` D LDC2 1.18.0 (ldc2 -ofbin_d_ldc -O5 -release -boundscheck=off bf.d) ``` $ ../xtime.rb ./bin_d_ldc mandel.b 19.53s, 3.6Mb ``` Nim 1.0.2 (nim c -o:bin_nim_gcc -d:danger --cc:gcc --verbosity:0 bf.nim) ``` $ ../xtime.rb ./bin_nim_gcc mandel.b 25.07s, 2.2Mb ``` Kotlin kotlinc-jvm 1.3.50 (JRE 1.8.0_201-b09) (kotlinc bf2.kt -include-runtime -d bf2-kt.jar) ``` $ ../xtime.rb java -jar bf2-kt.jar mandel.b JIT warming up time: 1.25s run 26.81s, 36.6Mb ``` Golang go1.13.4 linux/amd64 (go build -o bin_go bf.go) ``` $ ../xtime.rb ./bin_go mandel.b 38.73s, 2.9Mb ``` ============================================ So the results for Braunfuck2 mandel.b are: ``` C++ gcc: 18.05s, 3.6Mb D LDC2: 19.53s, 3.6Mb Nim: 25.07s, 2.2Mb Kotlin: 26.81s, 36.6Mb ``` Please note that I've added `boundscheck=off` to LDC2 command line there. Also, Kotlin is always printing `JIT warming up` and takes about 1 to 2 seconds to warm up, so the results for Brainfuck2 `bench.b` are VERY different. Kotlin is not the first one obviously. I'm running everything on my laptop with Intel(R) Core(TM) i7-8550U CPU 1.80GHz As I mentioned, no Docker containers, just using the included scripts to build the binaries. Haven't tried other tests, but I feel like I'll get very interesting results for them also. Would be great if someone else is able to confirm these results because these benchmarks look very manipulative.
Nov 16 2019
On Saturday, 16 November 2019 at 16:07:29 UTC, Jacob Shtokolov wrote:Haven't tried other testsJust tried to compile and run Base64 The results are: ``` C: 1.46s, 1.9Mb Rust: 1.49s, 2.4Mb D LDC2: 1.98s, 4.2Mb Golang: 2.89s, 10.9Mb C++: 3.18s, 4.8Mb ``` This test is closer to the author's numbers, but Rust implementation isn't faster than the C implementation on my machine. Golang here is faster than C++. The D version was built with bounds checks this time.
Nov 16 2019
On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote:Just tried to compile and run Base64The Havlak test is closer to reality: ``` Nim: 12.24s, 477.8Mb C++: 17.33s, 179.3Mb Golang: 21.58s, 358.0Mb D LDC2: 23.55s, 460.4Mb D DMD: 29.04s, 461.9Mb ``` Nim is the winner. But here I would look into the code: what makes LDC produce such poorly optimized binary.
Nov 16 2019
On Sat, Nov 16, 2019 at 5:50 PM Jacob Shtokolov via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote:LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvmentsJust tried to compile and run Base64The Havlak test is closer to reality: ``` Nim: 12.24s, 477.8Mb C++: 17.33s, 179.3Mb Golang: 21.58s, 358.0Mb D LDC2: 23.55s, 460.4Mb D DMD: 29.04s, 461.9Mb ``` Nim is the winner. But here I would look into the code: what makes LDC produce such poorly optimized binary.
Nov 17 2019
On Sunday, 17 November 2019 at 10:36:41 UTC, Daniel Kozak wrote:LDC binary is ok, this is about GC, I was able to make it almost as twice fast for ldc with some improvementsJust checked the code and found that they're using allocations with `new` in loops. But that's very interesting to see what changes you made to make it run so much faster! Could you please share it somewhere?
Nov 17 2019
On Sun, Nov 17, 2019 at 2:50 PM Jacob Shtokolov via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Sunday, 17 November 2019 at 10:36:41 UTC, Daniel Kozak wrote:Sorry I missed insert the link. It is on my github: https://github.com/Kozzi11/benchmarks/tree/improve_dLDC binary is ok, this is about GC, I was able to make it almost as twice fast for ldc with some improvementsJust checked the code and found that they're using allocations with `new` in loops. But that's very interesting to see what changes you made to make it run so much faster! Could you please share it somewhere?
Nov 17 2019
On Sunday, 17 November 2019 at 14:15:00 UTC, Daniel Kozak wrote:Sorry I missed insert the link. It is on my github: https://github.com/Kozzi11/benchmarks/tree/improve_dNow it's faster than the C++ version on my machine: ``` Nim: 12.01s, 478.1Mb D LDC2: 13.48s, 428.1Mb C++: 19.97s, 179.3Mb Golang: 21.90s, 364.7Mb ``` So basically the only critical change was to replace the built-in associative arrays with Appender types? That's really amazing!
Nov 17 2019
So basically the only critical change was to replace the built-in associative arrays with Appender types? That's really amazing!Not only, other change is not filling number AA with UNVISITED, the other change is to disable parallel GC, because it is cause performance decrease
Nov 17 2019
On Sunday, 17 November 2019 at 16:25:52 UTC, Daniel Kozak wrote:Regarding the benefits seen from switching from AAs to Appenders - This is a nice performance improvement. Also a nice example of often available performance improvements in D programs. At a high level, I feel I've seen this pattern a number of times. When people starting with D run benchmarks as part of their initial experiments, they naturally start with the simplest and most straightforward programming approaches. Nothing wrong with this. It's a strength of D that quality code can be written quickly. However, in many cases these simple approaches allocate a fair bit of GC memory, memory that becomes unused quickly and needs to be GC collected. Again, nothing wrong with this. But, I have the impression that many times there is an expectation that such code will perform similarly to code using manually managed memory in other native compiled languages. And often this expectation is not met, as memory allocation and use patterns are a major performance driver. What often gets missed in these assessments is that D has quite a few mechanisms available to enable better memory management use, without needing to drop GC paradigms entirely and move to fully manually managed memory. Modifying performance sensitive programs to use these mechanisms is often not hard. The switch here from AAs to Appenders is an example. Being able to improve program performance in this way is a strength of D. One consideration is that until one has some experience with the language, it may not be obvious that these options exist, and the specific changes and approaches that can be used. This can lead to perception issues if nothing else. --JonSo basically the only critical change was to replace the built-in associative arrays with Appender types? That's really amazing!Not only, other change is not filling number AA with UNVISITED, the other change is to disable parallel GC, because it is cause performance decrease
Nov 17 2019
On Sunday, 17 November 2019 at 21:42:37 UTC, Jon Degenhardt wrote:At a high level, I feel I've seen this pattern a number of times. When people starting with D run benchmarks as part of their initial experiments, they naturally start with the simplest and most straightforward programming approaches. Nothing wrong with this. It's a strength of D that quality code can be written quickly.I think it signifies a deeper problem with these kind of benchmarks. Most people would expect these benchmarks to measure idiomatic code, "every day" kind of code. Most people would write their code with associative arrays in this case. Sure, you can optimize it later, but just as well you can just drop into asm {} block and write hand optimized code. Same with Java, you can write a lot of the code in a very C-like way for a large speedup, but the code will be completely foreign for most Java programmers and not very representative for the language.
Nov 18 2019
On Monday, 18 November 2019 at 21:35:08 UTC, JN wrote:I think it signifies a deeper problem with these kind of benchmarks. Most people would expect these benchmarks to measure idiomatic code, "every day" kind of code. Most people would write their code with associative arrays in this case. Sure, you can optimize it later, but just as well you can just drop into asm {} block and write hand optimized code.If you're in a position where you care about "fast as possible" code, how fast your "every day" code runs isn't really helpful. Now, I do understand that you might want to measure the performance of a piece of code written when you aren't optimizing for execution speed. Someone in that position is going to care about speed of execution and speed of development, among other things. The problem is that you can't learn anything useful in that case from a benchmark that reports execution time and nothing else.
Nov 18 2019
On Monday, 18 November 2019 at 21:50:04 UTC, bachmeier wrote:On Monday, 18 November 2019 at 21:35:08 UTC, JN wrote:Yes, there are often multiple goals behind a benchmark like this, goals that may not be explicitly identified. There is also the question of what "idiomatic" means. This is can be quite subjective, especially in multi-paradigm languages. And, what "idiomatic" means to an individual may change as familiarity with the language grows. For D performance studies, an example is that it can take time to learn how to use lazy, range-based programming facilities. This is certainly one idiomatic D coding style. And, it often results in much better memory management and performance improvements. Code can move further from the most common paradigms of course, including all the way to inline assembly blocks. Makes it difficult to say when versions of a program in different languages are similarly idiomatic.I think it signifies a deeper problem with these kind of benchmarks. Most people would expect these benchmarks to measure idiomatic code, "every day" kind of code. Most people would write their code with associative arrays in this case. Sure, you can optimize it later, but just as well you can just drop into asm {} block and write hand optimized code.If you're in a position where you care about "fast as possible" code, how fast your "every day" code runs isn't really helpful. Now, I do understand that you might want to measure the performance of a piece of code written when you aren't optimizing for execution speed. Someone in that position is going to care about speed of execution and speed of development, among other things. The problem is that you can't learn anything useful in that case from a benchmark that reports execution time and nothing else.
Nov 18 2019
On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11 gmail.com> wrote:original code Golang: 22.74s, 364.1Mb D LDC2: 29.55s, 463.9Mb D DMD: 29.42s, 462.5Mb D GDC: 25.28s, 415.3Mb Nim: 14.26s, 468,9Mb with small changes: Golang: 22.74s, 364.1Mb D LDC2: 15.90s, 389.8Mb D DMD: 16.86s, 387.3Mb D GDC: 19.48s, 403.8Mb Nim: 14.26s, 468,9MbNim is the winner. But here I would look into the code: what makes LDC produce such poorly optimized binary.LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvements
Nov 17 2019
On 11/17/19 6:04 AM, Daniel Kozak wrote:On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11 gmail.com> wrote:Can you summarize or share the changes for learning purposes?LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvements
Nov 17 2019
On Sunday, 17 November 2019 at 11:04:55 UTC, Daniel Kozak wrote:On Sun, Nov 17, 2019 at 11:36 AM Daniel Kozak <kozzi11 gmail.com> wrote:With full LTO, I'm seeing an additional 5% boost on Windows (-flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto). As they are using gcc LTO for the brainfuck2 benchmark too (https://github.com/kostya/benchmarks/blob/2777925c4e64987e83e9a53478910de080408057/brai fuck2/build.sh#L5), I wouldn't consider it to be cheating.original code Golang: 22.74s, 364.1Mb D LDC2: 29.55s, 463.9Mb D DMD: 29.42s, 462.5Mb D GDC: 25.28s, 415.3Mb Nim: 14.26s, 468,9Mb with small changes: Golang: 22.74s, 364.1Mb D LDC2: 15.90s, 389.8Mb D DMD: 16.86s, 387.3Mb D GDC: 19.48s, 403.8Mb Nim: 14.26s, 468,9MbNim is the winner. But here I would look into the code: what makes LDC produce such poorly optimized binary.LDC binary is ok, this is about GC, I was able to make it lamost as twice fast for ldc with some improvements
Nov 17 2019
On Saturday, 16 November 2019 at 16:45:02 UTC, Jacob Shtokolov wrote:On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote:C++ memory consumption is way lower than the rest. Is this because of the tracing GC penalty? It would have been interesting to see Rust here as it doesn't use GC and if it would get close to the C++ memory consumption.Just tried to compile and run Base64The Havlak test is closer to reality: ``` Nim: 12.24s, 477.8Mb C++: 17.33s, 179.3Mb Golang: 21.58s, 358.0Mb D LDC2: 23.55s, 460.4Mb D DMD: 29.04s, 461.9Mb ``` Nim is the winner. But here I would look into the code: what makes LDC produce such poorly optimized binary.
Nov 17 2019
On Saturday, 16 November 2019 at 16:34:58 UTC, Jacob Shtokolov wrote:On Saturday, 16 November 2019 at 16:07:29 UTC, Jacob Shtokolov wrote:Why is C++ doing so badly? Is it because of inefficient usage of buffers and that it doesn't natively support slices?Haven't tried other testsJust tried to compile and run Base64 The results are: ``` C: 1.46s, 1.9Mb Rust: 1.49s, 2.4Mb D LDC2: 1.98s, 4.2Mb Golang: 2.89s, 10.9Mb C++: 3.18s, 4.8Mb ``` This test is closer to the author's numbers, but Rust implementation isn't faster than the C implementation on my machine. Golang here is faster than C++. The D version was built with bounds checks this time.
Nov 16 2019
On Saturday, 16 November 2019 at 16:47:40 UTC, IGotD- wrote:Why is C++ doing so badly?Looks like that's because they're using some libcrypto APIs (like the BIO). Also, my C++ compiler is not the latest one - 7.4.0. In the benchmark, it's claimed as GCC 9.2.1. But the rest compilers are up to date and actually the same versions as in the benchmark README file
Nov 16 2019