digitalmars.D - Standard D, Mir D benchmarks against Numpy (BLAS)
- Pavel Shkadzko (34/34) Mar 12 2020 I have done several benchmarks against Numpy for various 2D
- rikki cattermole (3/3) Mar 12 2020 You forgot to disable the GC for both bench's.
- Pavel Shkadzko (7/10) Mar 12 2020 Thank you.
- rikki cattermole (7/18) Mar 12 2020 Okay that means no GC collection was triggered during your benchmarks.
- 9il (2/36) Mar 12 2020 Haha
- 9il (8/13) Mar 12 2020 Generally speaking, the D/Mir code of the benchmark is slow by
- 9il (3/18) Mar 12 2020 Ah, nevermind, the forum table didn't show mir numbers aligned.
- jmh530 (6/14) Mar 12 2020 I saw your subsequent post about not seeing the numbers, but I
- Pavel Shkadzko (8/23) Mar 12 2020 Didn't understand. You argue against D/Mir usage when talking to
- 9il (9/33) Mar 12 2020 Agreed. I just misunderstood the table at the forum, it was
- Pavel Shkadzko (3/15) Mar 12 2020 Thank you for the comments!
- 9il (5/23) Mar 12 2020 Phobos sort bench bug report:
- p.shkadzko (6/36) Mar 12 2020 I am actually intrigued with the timings of huge matrices. Why
- bachmeier (7/12) Mar 12 2020 Been quite a while since I worked with numpy, but I think that's
- Patrick Schluter (3/10) Mar 13 2020 The interpreter getting in the way of the hardware prefetcher,
- jmh530 (13/14) Mar 12 2020 Looked into some of those that aren't faster than numpy:
- Pavel Shkadzko (2/17) Mar 12 2020 Numpy uses BLAS "gemm" and D uses OpenBlas "gemm".
- 9il (4/7) Mar 12 2020 Depending on the system they can use the same or configure
- drug (12/15) Mar 12 2020 How long the benchmark runs? It have already took 20 min and continue to...
- Pavel Shkadzko (4/19) Mar 12 2020 For Numpy Python it's ~1m 30s, but for all D benchmarks it takes
- drug (2/5) Mar 12 2020 Hmm, I see. In my case the benchmark hangs up in dgemm_ infinitely(
- Jacob Carlborg (6/15) Mar 14 2020 Have you tried to compile with LTO (Link Time Optimization) and PGO
- 9il (4/19) Mar 14 2020 The problem is that Numpy uses its own version of OpenBLAS, that
- Pavel Shkadzko (4/16) Mar 15 2020 My version of NumPy is installed with anaconda and it looks like
- Pavel Shkadzko (6/21) Mar 15 2020 If for LTO the dub.json dflags-ldc: ["-flto=full"] is enough then
- Jon Degenhardt (29/52) Mar 15 2020 Try:
- 9il (4/10) Mar 15 2020 LTO and PGO are useless for this kind of stuff. Nothing to
I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=release *Table descriptions reduced to fit into post width. +---------------------------------+---------------------+--------------------+---------------------+ | Description | Numpy (BLAS) (sec.) | Standard D (sec.) | Mir D (sec.) | +---------------------------------+---------------------+--------------------+---------------------+ | sum of two 250x200 (50 loops) | 0.00115 | 0.00400213(x3.5) | 0.00014372(x1/8) | | mult of two 250x200 (50 loops) | 0.0011578 | 0.0132323(x11.4) | 0.00013852(x1/8.3) | | sum of two 500x600 (50 loops) | 0.0101275 | 0.016496(x1.6) | 0.00021556(x1/47) | | mult of two 500x600 (50 loops) | 0.010182 | 0.06857(x6.7) | 0.00021717(x1/47) | | sum of two 1k x 1k (50 loops) | 0.0493201 | 0.0614544(x1.3) | 0.000422135(x1/117) | | mult of two 1k x 1k (50 loops) | 0.0493693 | 0.233827(x4.7) | 0.000453535(x1/109) | | Scalar product of two 30k | 0.0152186 | 0.0227465(x1.5) | 0.0198812(x1.3) | | Dot product of 5k x 6k, 6k x 5k | 1.6084685 | -------------- | 2.03398(x1.2) | | L2 norm of 5k x 6k | 0.0072423 | 0.0160546(x2.2) | 0.0110136(x1.6) | | Quicksort of 5k x 6k | 2.6516816 | 0.178071(x1/14.8) | 1.52406(x1/0.6) | +---------------------------------+---------------------+--------------------+---------------------+
Mar 12 2020
You forgot to disable the GC for both bench's. Also fastmath for standard_ops_bench. FYI: standard_ops_bench does a LOT of memory allocations.
Mar 12 2020
On Thursday, 12 March 2020 at 13:18:41 UTC, rikki cattermole wrote:You forgot to disable the GC for both bench's. Also fastmath for standard_ops_bench. FYI: standard_ops_bench does a LOT of memory allocations.Thank you. Add GC.disable; inside the main function, right? It didn't really change anything for any of the benchmarks, maybe I did it wrong. Does fastmath work only on functions with plain loops or everything with math ops? It is not clear from LDC docs.
Mar 12 2020
On 13/03/2020 3:27 AM, Pavel Shkadzko wrote:On Thursday, 12 March 2020 at 13:18:41 UTC, rikki cattermole wrote:Okay that means no GC collection was triggered during your benchmarks. This is good to know, that means the performance problems are indeed on your end and not runtime related.You forgot to disable the GC for both bench's. Also fastmath for standard_ops_bench. FYI: standard_ops_bench does a LOT of memory allocations.Thank you. Add GC.disable; inside the main function, right? It didn't really change anything for any of the benchmarks, maybe I did it wrong.Does fastmath work only on functions with plain loops or everything with math ops? It is not clear from LDC docs.Try it :) I have no idea how much it'll help. You have used it on one but not the other, so it seems odd to not do it on both.
Mar 12 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=release *Table descriptions reduced to fit into post width. +---------------------------------+---------------------+--------------------+---------------------+ | Description | Numpy (BLAS) (sec.) | Standard D (sec.) | Mir D (sec.) | +---------------------------------+---------------------+--------------------+---------------------+ | sum of two 250x200 (50 loops) | 0.00115 | 0.00400213(x3.5) | 0.00014372(x1/8) | | mult of two 250x200 (50 loops) | 0.0011578 | 0.0132323(x11.4) | 0.00013852(x1/8.3) | | sum of two 500x600 (50 loops) | 0.0101275 | 0.016496(x1.6) | 0.00021556(x1/47) | | mult of two 500x600 (50 loops) | 0.010182 | 0.06857(x6.7) | 0.00021717(x1/47) | | sum of two 1k x 1k (50 loops) | 0.0493201 | 0.0614544(x1.3) | 0.000422135(x1/117) | | mult of two 1k x 1k (50 loops) | 0.0493693 | 0.233827(x4.7) | 0.000453535(x1/109) | | Scalar product of two 30k | 0.0152186 | 0.0227465(x1.5) | 0.0198812(x1.3) | | Dot product of 5k x 6k, 6k x 5k | 1.6084685 | -------------- | 2.03398(x1.2) | | L2 norm of 5k x 6k | 0.0072423 | 0.0160546(x2.2) | 0.0110136(x1.6) | | Quicksort of 5k x 6k | 2.6516816 | 0.178071(x1/14.8) | 1.52406(x1/0.6) | +---------------------------------+---------------------+--------------------+---------------------+Haha
Mar 12 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check.Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
Mar 12 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:Ah, nevermind, the forum table didn't show mir numbers aligned. Thank you for the work. I will open an MR with a few addons.I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check.Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
Mar 12 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:[snip] Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --IlyaI saw your subsequent post about not seeing the numbers, but I think my broader response is that most people don't need to get every single drop of performance. Typical performance for numpy versus typical performance for mir is still valuable information for people to know.
Mar 12 2020
On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:Didn't understand. You argue against D/Mir usage when talking to your clients? Actually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check.Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --Ilya
Mar 12 2020
On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:It depends on the problem they wanted me to solve.On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:Didn't understand. You argue against D/Mir usage when talking to your clients?I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check.Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --IlyaActually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.Agreed. I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`. Minor updates https://github.com/tastyminerals/mir_benchmarks/pull/1
Mar 12 2020
On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:Thank you for the comments! Looks like I will be updating the benchmarks tables today :)[...]It depends on the problem they wanted me to solve.[...]Agreed. I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`. Minor updates https://github.com/tastyminerals/mir_benchmarks/pull/1
Mar 12 2020
On Thursday, 12 March 2020 at 15:46:47 UTC, Pavel Shkadzko wrote:On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:Phobos sort bench bug report: https://github.com/tastyminerals/mir_benchmarks/issues/2On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:[...]It depends on the problem they wanted me to solve.[...]Agreed. I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`.another small update that changes the ration a lot https://github.com/tastyminerals/mir_benchmarks/pull/3Minor updates https://github.com/tastyminerals/mir_benchmarks/pull/1Thank you for the comments! Looks like I will be updating the benchmarks tables today :)
Mar 12 2020
On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:On Thursday, 12 March 2020 at 14:37:13 UTC, Pavel Shkadzko wrote:I am actually intrigued with the timings of huge matrices. Why Mir D and Standard D are so much better than NumPy? Once we get to 500x600, 1000x1000 sizes there is a huge drop in performance for NumPy and not so much for D. You mentioned L3 cache but CPU architecture is equal for all the benchmarks so what's going on?On Thursday, 12 March 2020 at 14:00:48 UTC, 9il wrote:It depends on the problem they wanted me to solve.On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:Didn't understand. You argue against D/Mir usage when talking to your clients?[...]Generally speaking, the D/Mir code of the benchmark is slow by how it has been written. I am not arguing you to use D/Mir. Furthermore, sometimes I am arguing my clients to do not to use it if you can. On the commercial request, I can write the benchmark or an applied algorithm so D/Mir will beat numpy in all the tests including gemm. --IlyaActually, I feel like it is also useful to have unoptimized D code benchmarked because this is how most people will write their code when they first write it. Although, I can hardly call these benchmarks unoptimized because I use LDC optimization flags as well as some tips from you.Agreed. I just misunderstood the table at the forum, it was misaligned for me. The numbers look cool, thank you for the benchmark. Mir sorting looks slower then Phobos, it is interesting, and need a fix. You can use Phobos sorting with ndslice the same way with `each`. Minor updates https://github.com/tastyminerals/mir_benchmarks/pull/1
Mar 12 2020
On Thursday, 12 March 2020 at 20:39:59 UTC, p.shkadzko wrote:I am actually intrigued with the timings of huge matrices. Why Mir D and Standard D are so much better than NumPy? Once we get to 500x600, 1000x1000 sizes there is a huge drop in performance for NumPy and not so much for D. You mentioned L3 cache but CPU architecture is equal for all the benchmarks so what's going on?Been quite a while since I worked with numpy, but I think that's where you're hitting memory limits (easier to do with Python than with D) and it causes performance to deteriorate quickly. I had those problems with R, and I believe it's relatively easy to hit that constraint with numpy as well, but you definitely want to find a numpy expert to confirm - something I definitely am not.
Mar 12 2020
On Thursday, 12 March 2020 at 20:39:59 UTC, p.shkadzko wrote:On Thursday, 12 March 2020 at 15:34:58 UTC, 9il wrote:The interpreter getting in the way of the hardware prefetcher, maybe.[...]I am actually intrigued with the timings of huge matrices. Why Mir D and Standard D are so much better than NumPy? Once we get to 500x600, 1000x1000 sizes there is a huge drop in performance for NumPy and not so much for D. You mentioned L3 cache but CPU architecture is equal for all the benchmarks so what's going on?
Mar 13 2020
On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:[snip]Looked into some of those that aren't faster than numpy: For dot product, (what I would just call matrix multiplication), both functions are using gemm. There might be some quirks that have caused a difference in performance, but otherwise I would expect to be pretty close and it is. It looks like you are allocating the output matrix with the GC, which could be a driver of the difference. For the L2-norm, you are calculating the L2 norm entry-wise as a Froebenius norm. That should be the same as the default for numpy. For numpy, the only difference I can tell between yours and there is that it re-uses its dot product function. Otherwise it looks the same.
Mar 12 2020
On Thursday, 12 March 2020 at 14:12:14 UTC, jmh530 wrote:On Thursday, 12 March 2020 at 12:59:41 UTC, Pavel Shkadzko wrote:Numpy uses BLAS "gemm" and D uses OpenBlas "gemm".[snip]Looked into some of those that aren't faster than numpy: For dot product, (what I would just call matrix multiplication), both functions are using gemm. There might be some quirks that have caused a difference in performance, but otherwise I would expect to be pretty close and it is. It looks like you are allocating the output matrix with the GC, which could be a driver of the difference. For the L2-norm, you are calculating the L2 norm entry-wise as a Froebenius norm. That should be the same as the default for numpy. For numpy, the only difference I can tell between yours and there is that it re-uses its dot product function. Otherwise it looks the same.
Mar 12 2020
On Thursday, 12 March 2020 at 15:18:43 UTC, Pavel Shkadzko wrote:On Thursday, 12 March 2020 at 14:12:14 UTC, jmh530 wrote:Depending on the system they can use the same or configure specific like OpenBlas or intel MKL (sure about mir, NumPy likely allows to do it as well )[...]Numpy uses BLAS "gemm" and D uses OpenBlas "gemm".
Mar 12 2020
On 3/12/20 3:59 PM, Pavel Shkadzko wrote:[snip]How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage. P.S. Probably the reason is that I use ``` "subConfigurations": {"mir-blas": "blas"}, ``` instead of ``` "subConfigurations": {"mir-blas": "twolib"}, ```
Mar 12 2020
On Thursday, 12 March 2020 at 14:26:14 UTC, drug wrote:On 3/12/20 3:59 PM, Pavel Shkadzko wrote:For Numpy Python it's ~1m 30s, but for all D benchmarks it takes around ~2m on my machine which I think is the real benchmark here :)[snip]How long the benchmark runs? It have already took 20 min and continue to run in "Mir D" stage. P.S. Probably the reason is that I use ``` "subConfigurations": {"mir-blas": "blas"}, ``` instead of ``` "subConfigurations": {"mir-blas": "twolib"}, ```
Mar 12 2020
On 3/12/20 5:30 PM, Pavel Shkadzko wrote:For Numpy Python it's ~1m 30s, but for all D benchmarks it takes around ~2m on my machine which I think is the real benchmark here :)Hmm, I see. In my case the benchmark hangs up in dgemm_ infinitely(
Mar 12 2020
On 2020-03-12 13:59, Pavel Shkadzko wrote:I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=releaseHave you tried to compile with LTO (Link Time Optimization) and PGO (Profile Guided Optimization) enabled? You should also link with the versions of Phobos and druntime that has been compiled with LTO. -- /Jacob Carlborg
Mar 14 2020
On Saturday, 14 March 2020 at 08:01:33 UTC, Jacob Carlborg wrote:On 2020-03-12 13:59, Pavel Shkadzko wrote:The problem is that Numpy uses its own version of OpenBLAS, that is multithread including Level 1 BLAS operations like L2 norm and dot product, while D code is a single thread.I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=releaseHave you tried to compile with LTO (Link Time Optimization) and PGO (Profile Guided Optimization) enabled? You should also link with the versions of Phobos and druntime that has been compiled with LTO.
Mar 14 2020
On Saturday, 14 March 2020 at 09:34:55 UTC, 9il wrote:On Saturday, 14 March 2020 at 08:01:33 UTC, Jacob Carlborg wrote:My version of NumPy is installed with anaconda and it looks like anaconda numpy package comes with mkl libraries. I have updated the benchmarks with respect to single/multi thread.On 2020-03-12 13:59, Pavel Shkadzko wrote:The problem is that Numpy uses its own version of OpenBLAS, that is multithread including Level 1 BLAS operations like L2 norm and dot product, while D code is a single thread.[...]Have you tried to compile with LTO (Link Time Optimization) and PGO (Profile Guided Optimization) enabled? You should also link with the versions of Phobos and druntime that has been compiled with LTO.
Mar 15 2020
On Saturday, 14 March 2020 at 08:01:33 UTC, Jacob Carlborg wrote:On 2020-03-12 13:59, Pavel Shkadzko wrote:If for LTO the dub.json dflags-ldc: ["-flto=full"] is enough then it doesn't improve anything. For PGO, I am a bit confused how to use it with dub -- dflags-ldc: ["-O3"]? It compiles but I see no difference. By default, ldc2 should be using O2 -- good optimizations.I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=releaseHave you tried to compile with LTO (Link Time Optimization) and PGO (Profile Guided Optimization) enabled? You should also link with the versions of Phobos and druntime that has been compiled with LTO.
Mar 15 2020
On Sunday, 15 March 2020 at 12:13:39 UTC, Pavel Shkadzko wrote:On Saturday, 14 March 2020 at 08:01:33 UTC, Jacob Carlborg wrote:Try: "dflags-ldc" : ["-flto=thin", "-defaultlib=phobos2-ldc-lto,druntime-ldc-lto", "-singleobj" ] The "-defaultlib=..." parameter engages LTO for phobos and druntime. You can also use "-flto=full" rather than "thin". I've had good results with "thin". Not sure if the "-singleobj" parameter helps.On 2020-03-12 13:59, Pavel Shkadzko wrote:If for LTO the dub.json dflags-ldc: ["-flto=full"] is enough then it doesn't improve anything.I have done several benchmarks against Numpy for various 2D matrix operations. The purpose was mere curiosity and spread the word about Mir D library among the office data engineers. Since I am not a D expert, I would be happy if someone could take a second look and double check. https://github.com/tastyminerals/mir_benchmarks Compile and run the project via: dub run --compiler=ldc --build=releaseHave you tried to compile with LTO (Link Time Optimization) and PGO (Profile Guided Optimization) enabled? You should also link with the versions of Phobos and druntime that has been compiled with LTO.For PGO, I am a bit confused how to use it with dub -- dflags-ldc: ["-O3"]? It compiles but I see no difference. By default, ldc2 should be using O2 -- good optimizations.PGO (profile guided optimization) is a multi-step process. First step is create an instrumented build (-fprofile-instr-generate). Second step is to run the instrumented binary on a representative workload. Last step is to use the resulting workload in the final build (-fprofile-instr-use). For information on PGO see Johan Engelen's blog page: https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html I have done studies on LTO and PGO and found both beneficial, often significantly. The largest gains came in code running in tight loops that were included code pulled from libraries (e.g. phobos, druntime). It was hard to predict what code was going benefit from LTO/PGO. I've found it tricky to use dub for the full PGO process. (Creating the instrumented build, generating the profile data, and using it in the final build process.) Mostly I've used make for this. I did get it to work in a simple performance test app: https://github.com/jondegenhardt/dcat-perf. It doesn't document how the PGO steps work, but it dub.json file is relatively short and repository README.md contains the build instructions for both LTO and LTO plus PGO. --Jon
Mar 15 2020
On Sunday, 15 March 2020 at 20:15:07 UTC, Jon Degenhardt wrote:On Sunday, 15 March 2020 at 12:13:39 UTC, Pavel Shkadzko wrote:LTO and PGO are useless for this kind of stuff. Nothing to inline, the code is to simple and generic. Nothing to apply this technology for.[...]Try: "dflags-ldc" : ["-flto=thin", "-defaultlib=phobos2-ldc-lto,druntime-ldc-lto", "-singleobj" ] [...]
Mar 15 2020