digitalmars.D - Humble benchmark (fisher's exact test)
- Ki Rill (6/6) Aug 13 2021 It's a simple benchmark examining:
- =?UTF-8?Q?Ali_=c3=87ehreli?= (9/17) Aug 13 2021 The most obvious improvement I can see is removing the dynamic array
- zjh (3/4) Aug 13 2021 `Das betterC` is very competitive.`d` should invest 'resources'
- John Colvin (4/10) Aug 14 2021 Lots of things to improve there.
- Ki Rill (2/13) Aug 14 2021 Thanks!
- Ki Rill (14/25) Aug 15 2021 I have added the proposed changes. The performance of D increased
- max haughton (6/35) Aug 15 2021 I could be wrong but I think our routines internally use the max
- Jacob Shtokolov (4/5) Aug 14 2021 Regarding the binary size: please make sure that you're using
- bachmeier (17/23) Aug 14 2021 I'm skeptical that you're measuring what you think you're
- Ki Rill (6/18) Aug 14 2021 It happens at the end of the program only once and takes a
- bachmeier (12/21) Aug 23 2021 That might have been the point of your benchmark, but that
- russhy (11/39) Aug 23 2021 JIT isn't something you want if you need fast execution time
- bachmeier (4/5) Aug 23 2021 ?
- russhy (3/8) Aug 23 2021 that's why they are now spending their time writing an AOT
- Alexandru Ermicioi (2/4) Aug 23 2021 What does go have to do with aot development?
- Bienlein (4/15) Aug 24 2021 AOT in C#/Java is only to speed up startup times. It doesn't make
- russhy (4/20) Aug 24 2021 R2R is not true AOT, it still ship JIT and IL and recompile code
- Paulo Pinto (26/42) Aug 24 2021 That doesn't use all AOT options available to C# and Java.
- russhy (5/50) Aug 24 2021 Android is the worst of all
- Bienlein (10/15) Aug 24 2021 JIT is a very good compromise for programming languages that are
- Guillaume Piolat (3/9) Aug 14 2021 Using the `intel-intrinsics` package you can do 4x exp or log
- Tejas (15/25) Aug 14 2021 I know both D and C can theoretically reach the same level of
- Guillaume Piolat (12/19) Aug 14 2021 If you pay me I can produce a faster D version of whatever small
- max haughton (3/16) Aug 14 2021 ICC *has* moved to LLVM. Past tense now, sadly.
- max haughton (7/13) Aug 14 2021 If anyone is wondering why the GDC results look a bit weird: It's
- max haughton (10/24) Aug 14 2021 A little more: I got the performance down to be less awful by
- Imperatorn (3/9) Aug 23 2021 Interesting. I know people say benchmarks aren't important, but I
- russhy (2/12) Aug 23 2021 I agree
- H. S. Teoh (53/58) Aug 23 2021 I wouldn't say benchmarks aren't *important*, I think the issue is how
It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)
Aug 13 2021
On 8/13/21 7:19 PM, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main fishers-exact-test)The most obvious improvement I can see is removing the dynamic array creations in a loop. Since that array seems to be very short, using a static array would improve performance. (Ok, now I see that you already do that for the betterC and nogc versios.) Also, I wonder how disabling GC collections would affect execution time and memory consumption: https://dlang.org/spec/garbage.html#gc_config Ali
Aug 13 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining:`Das betterC` is very competitive.`d` should invest 'resources' in this.
Aug 13 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Lots of things to improve there. https://github.com/rillki/humble-benchmarks/pull/4 A nice quick morning exercise :)
Aug 14 2021
On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:Thanks!It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Lots of things to improve there. https://github.com/rillki/humble-benchmarks/pull/4 A nice quick morning exercise :)
Aug 14 2021
On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC! The betterC version is still slightly faster though. To sum up: ``` Clang C 9.1 s Clang C++ 9.4 s LDC Das betterC 10.3 s LDC D libC math 12.2 s Rust 13 s ``` Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Lots of things to improve there. https://github.com/rillki/humble-benchmarks/pull/4 A nice quick morning exercise :)
Aug 15 2021
On Sunday, 15 August 2021 at 09:20:56 UTC, Ki Rill wrote:On Saturday, 14 August 2021 at 10:26:52 UTC, John Colvin wrote:I could be wrong but I think our routines internally use the max precision when the can, so they are slower but they are also more precise in the internals (where allowed by the platform). You could probably test this by running these benchmarks on ARM or similar.On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:I have added the proposed changes. The performance of D increased to almost that of C with ~1-2 seconds difference if using LDC! The betterC version is still slightly faster though. To sum up: ``` Clang C 9.1 s Clang C++ 9.4 s LDC Das betterC 10.3 s LDC D libC math 12.2 s Rust 13 s ``` Thank you John for you invaluable help! I didn't know that Phobos math is twice as slow as libC math.It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Lots of things to improve there. https://github.com/rillki/humble-benchmarks/pull/4 A nice quick morning exercise :)
Aug 15 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:* binary size (kb)Regarding the binary size: please make sure that you're using dynamic linking for the D package, as by default it always links statically, while libc and libc++ are always linked dynamically.
Aug 14 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)I'm skeptical that you're measuring what you think you're measuring. I say this because the R version shouldn't be that much slower than the C version. All that happens when you call `fisher.test` is that it checks which case it's handling and then calls the builtin C function. For example, [this line](https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/fisher.test.R#L120). More likely a chunk of your C code is being eliminated by the optimizer. Another thing is that printing to the screen is much slower in R than in C. You shouldn't benchmark printing to the screen since that is not something you would ever do in practice. If you really want performance, you can determine which case applies to your code and then make the underlying `.Call` yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.
Aug 14 2021
On Saturday, 14 August 2021 at 12:48:05 UTC, bachmeier wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote: More likely a chunk of your C code is being eliminated by the optimizer. Another thing is that printing to the screen is much slower in R than in C. You shouldn't benchmark printing to the screen since that is not something you would ever do in practice.It happens at the end of the program only once and takes a fraction of a second. I consider it to be irrelevant here.If you really want performance, you can determine which case applies to your code and then make the underlying `.Call` yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.
Aug 14 2021
On Saturday, 14 August 2021 at 14:08:05 UTC, Ki Rill wrote:That might have been the point of your benchmark, but that doesn't mean the benchmark is meaningful, in this case for at least three reasons: 1. You're measuring the performance of completely different tasks in R and C, where the R task is much bigger. 2. What you've done is only one way to use R. Anyone that wanted performance would use .Call rather than what you're doing. 3. R has a JIT compiler, and you're likely not making use of it. The comparison against R is not what you're after anyway. If you don't want to do it in a way that's meaningful - and that's perfectly understandable - it's best to delete it.If you really want performance, you can determine which case applies to your code and then make the underlying `.Call` yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.
Aug 23 2021
On Monday, 23 August 2021 at 13:12:21 UTC, bachmeier wrote:On Saturday, 14 August 2021 at 14:08:05 UTC, Ki Rill wrote:JIT isn't something you want if you need fast execution time And nobody gonna warm JIT 1000000 times to call a task, you want result immediately they are only reliable if the programs is calling the same code 100000000 times, wich never happen, except under heavy load, wich also almost never happen for most use cases other than webdev; and even then you have crappy execution time because of cold startup This benchmarks even mention it:That might have been the point of your benchmark, but that doesn't mean the benchmark is meaningful, in this case for at least three reasons: 1. You're measuring the performance of completely different tasks in R and C, where the R task is much bigger. 2. What you've done is only one way to use R. Anyone that wanted performance would use .Call rather than what you're doing. 3. R has a JIT compiler, and you're likely not making use of it. The comparison against R is not what you're after anyway. If you don't want to do it in a way that's meaningful - and that's perfectly understandable - it's best to delete it.If you really want performance, you can determine which case applies to your code and then make the underlying `.Call` yourself. If you don't do that, you're comparing Fisher's exact test against a routine that does a lot more than Fisher's exact test. In any event, you're not comparing against an R implementation of this test.That is the point of this benchmark, to test it against Python/R implementation irrespective of what it does additionally. And to test compiled languages in general.It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code)
Aug 23 2021
On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 23 2021
On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 23 2021
On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)What does go have to do with aot development?
Aug 23 2021
On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.htmlOn Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 24 2021
On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:R2R is not true AOT, it still ship JIT and IL and recompile code at runtime, this benchmarkgame is flawed and a pure lie For true AOT you need to use NativeAOT (old-CoreRT)On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.htmlOn Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 24 2021
On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:On Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:and Burst compiler, UWP .NET Native, Windows 8 Bartok, and several community projects. Full AOT on regular .NET is coming in .NET 6 with final touches in .NET 7. Regarding Java, like C and C++, there are plenty of implementations to choose from, a couple of commercial JDKs with proper AOT. Then both OpenJDK and OpenJ9 do JIT cache between runs, which gets improved each time the application runs thanks PGO profiles. OpenJ9 and Azul go one step further by having AOT/JIT compiler daemons that generates native code with PGO data from the whole cluster. Finally Android, despite not being really Java, uses an hand written Assembly interpreter for fast startup, then JIT, and when the device is idle, the JIT code gets AOT compiled with PGO gathered during each run. Starting with Android 10 the PGO profiles are shared across devices via the play store, so that AOT compilation can be done right away skipping the whole interpreter/JIT step. Really, the benchmarks game is a joke, because they only use the basic FOSS tooling available to them. And after 25/20 years apparently plenty still don't know Java and .NET ecosystems as they should.On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.htmlOn Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 24 2021
On Tuesday, 24 August 2021 at 20:00:06 UTC, Paulo Pinto wrote:On Tuesday, 24 August 2021 at 09:29:03 UTC, Bienlein wrote:Android is the worst of all A compiler in a supposed resource constrained device that runs on a limited power source (battery), good on Apple to forbid JIT in their store one of their best moveOn Monday, 23 August 2021 at 22:27:02 UTC, russhy wrote:and Burst compiler, UWP .NET Native, Windows 8 Bartok, and several community projects. Full AOT on regular .NET is coming in .NET 6 with final touches in .NET 7. Regarding Java, like C and C++, there are plenty of implementations to choose from, a couple of commercial JDKs with proper AOT. Then both OpenJDK and OpenJ9 do JIT cache between runs, which gets improved each time the application runs thanks PGO profiles. OpenJ9 and Azul go one step further by having AOT/JIT compiler daemons that generates native code with PGO data from the whole cluster. Finally Android, despite not being really Java, uses an hand written Assembly interpreter for fast startup, then JIT, and when the device is idle, the JIT code gets AOT compiled with PGO gathered during each run. Starting with Android 10 the PGO profiles are shared across devices via the play store, so that AOT compilation can be done right away skipping the whole interpreter/JIT step. Really, the benchmarks game is a joke, because they only use the basic FOSS tooling available to them. And after 25/20 years apparently plenty still don't know Java and .NET ecosystems as they should.On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:make anything faster, see https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/csharpcore-csharpaot.htmlOn Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:that's why they are now spending their time writing an AOT compiler after GO started to ate their cake ;)JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 24 2021
On Monday, 23 August 2021 at 22:06:39 UTC, bachmeier wrote:On Monday, 23 August 2021 at 17:35:59 UTC, russhy wrote:JIT is a very good compromise for programming languages that are intended for application development and also server side development. also working on making calls to C functions easy. For application development using a jitter and moving performance intensive features into C programs is an approach that has shown to work well already in the old times about 20 years ago when Smalltalk had a time where it was doing commercially well.JIT isn't something you want if you need fast execution time? I suppose they spent all those hours writing their JIT compilers because they had nothing else to do with their time.
Aug 24 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Using the `intel-intrinsics` package you can do 4x exp or log operations at once.
Aug 14 2021
On Saturday, 14 August 2021 at 14:14:08 UTC, Guillaume Piolat wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:I know both D and C can theoretically reach the same level of performance, but why does C **always** lead by a few milliseconds? What is it that we aren't doing? Is it the implementation's fault? The optimizer? What can we do for those precious few milliseconds? It's so frustrating to see C/C++ always being the winners in the **absolute** sense, and we always end up making the argument about how much more painstaking it is to actually create a complete program in those languages only for negligibly better performance. Do these benchmarks even matter if it's all about the quality of implementation? Sorry if I'm sounding a little bitter.It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Using the `intel-intrinsics` package you can do 4x exp or log operations at once.
Aug 14 2021
On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:It's so frustrating to see C/C++ always being the winners in the **absolute** sense, and we always end up making the argument about how much more painstaking it is to actually create a complete program in those languages only for negligibly better performance. Do these benchmarks even matter if it's all about the quality of implementation?If you pay me I can produce a faster D version of whatever small program you want. But the reality is that noone really _needs_ those benchmark programs, and thinking about optimizing them is an adequate punishment for writing them. The only things I can think of where C++ could wins a bit against D was that the ICC compiler could auto-vectorize transcendentals. Like logf in a loop, which LLVM doesn't do. But the ICC compiler has been moving to LLVM recently. When your compiler see the same IR from different front-end language, in the end it is the same codegen.
Aug 14 2021
On Saturday, 14 August 2021 at 19:28:42 UTC, Guillaume Piolat wrote:On Saturday, 14 August 2021 at 16:20:21 UTC, Tejas wrote:ICC *has* moved to LLVM. Past tense now, sadly.[...]If you pay me I can produce a faster D version of whatever small program you want. But the reality is that noone really _needs_ those benchmark programs, and thinking about optimizing them is an adequate punishment for writing them. The only things I can think of where C++ could wins a bit against D was that the ICC compiler could auto-vectorize transcendentals. Like logf in a loop, which LLVM doesn't do. But the ICC compiler has been moving to LLVM recently. When your compiler see the same IR from different front-end language, in the end it is the same codegen.
Aug 14 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)If anyone is wondering why the GDC results look a bit weird: It's because GDC doesn't actually inline unless you compile with LTO or enable whole program optimization (The rationale is due to the interaction of linking with templates). https://godbolt.org/z/Gj8hMjEch play with removing the '-fwhole-program' flag on that link.
Aug 14 2021
On Saturday, 14 August 2021 at 14:29:16 UTC, max haughton wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:A little more: I got the performance down to be less awful by using LTO (and found an LTO ICE in the process...), but as far as I can tell the limiting factor for GDC is that it's standard library by default doesn't seem to compile with either inlining or LTO support enabled so cycles are being wasted on (say) calling IsNaN sadly. I also note that X87 code is generated in Phobos, which could be hypothetically required for the necessary precision on a generic target, but is probably quite slow.It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)If anyone is wondering why the GDC results look a bit weird: It's because GDC doesn't actually inline unless you compile with LTO or enable whole program optimization (The rationale is due to the interaction of linking with templates). https://godbolt.org/z/Gj8hMjEch play with removing the '-fwhole-program' flag on that link.
Aug 14 2021
On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:It's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍
Aug 23 2021
On Monday, 23 August 2021 at 07:52:01 UTC, Imperatorn wrote:On Saturday, 14 August 2021 at 02:19:02 UTC, Ki Rill wrote:I agreeIt's a simple benchmark examining: * execution time (sec) * memory consumption (kb) * binary size (kb) * conciseness of a programming language (lines of code) [Link](https://github.com/rillki/humble-benchmarks/tree/main/fishers-exact-test)Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍
Aug 23 2021
On Mon, Aug 23, 2021 at 05:37:01PM +0000, russhy via Digitalmars-d wrote:On Monday, 23 August 2021 at 07:52:01 UTC, Imperatorn wrote:[...]I wouldn't say benchmarks aren't *important*, I think the issue is how you interpret the results. Sometimes people read too much into it, i.e., over-generalize it far beyond the scope of the actual test, forgetting that a benchmark is simply just that: a benchmark. It's a rough estimation of performance in a contrived use case that may or may not reflect real-world usage. Many of the simpler benchmarks consist of repeating a task 100000... times, which can be useful to measure, but hardly represents real-world usage. You can optimize the heck out of memcpy (substitute your favorite function to measure here) until it beats everybody else in the benchmark, but that does not necessarily mean it will actually make your real-life program run faster. It may, it may not. Because after running the 100th time, all the code/data you're using has gone into cache, so it will run much faster than the first few iterations. But in a real program, you usually don't need to call memcpy (etc) 100000 times in a row. Usually you need to do something else in between, and that something else may cause the memcpy-related data to be evicted from cache, change the state of the branch predictor, etc.. So your ultra-optimized code may not behave the same way in a real-life program vs. the benchmark, and may turn out to be actually slower than expected. Note that this does not mean optimizing for a benchmark is completely worthless; usually the first few iterations represents actual bottlenecks in the code, and improving that will improve performance in a real-life scenario. But up to a certain point. Trying to optimize beyond that, and you risk the danger of optimizing for your specific benchmark instead of the general use-case, and therefore may end up pessimizing the code instead of optimizing it. Real-life example: GNU wc, which counts the number of lines in a text file. Some time ago I did a comparison with several different D implementations of the same program that I wrote, and discovered that because GNU wc uses glibc's memchr, which was ultra-optimized for scanning large buffers, if your text files contains long lines then wc will run faster; but if the text file contains many short lines, a naïve File.byLine / foreach (ch; line) D implementation will actually beat wc. The reason is that glibc's memchr implementation is optimized for large buffers, and has a rather expensive setup overhead for the main fast-scanning loop. For long lines, the fast-scan more than outweighs this overhead, but for short lines, the overhead dominates the running time, so a naïve char-by-char implementation works faster. I don't know the history behind glibc's memchr implementation, but I'll bet that at some point somebody came up with a benchmark that tests scanning of large buffers and said, look, this complicated bit-hack fast-scanning loop improves the benchmark by X%! So the code change was accepted. But not realizing that this fast scanning of large buffers comes at the cost of pessimizing the small buffers case. Past a certain point, optimization becomes a trade-off, and which route you go depends on your specific application, not some general benchmarks that do not represent real-world usage patterns. T -- Famous last words: I *think* this will work...Interesting. I know people say benchmarks aren't important, but I disagree. I think it's healthy to compare from time to time 👍I agree
Aug 23 2021