digitalmars.D.announce - MIR vs. Numpy
- Tobias Schmidt (10/10) Nov 18 2020 Dear all,
- Bastiaan Veelo (12/18) Nov 18 2020 Nice numbers. I’m not a Python guy but I was under the impression
- John Colvin (18/38) Nov 18 2020 A lot of numpy is in C, C++, fortran, asm etc....
- jmh530 (7/17) Nov 18 2020 Very nice write up.
- 9il (2/6) Nov 18 2020 -O is added by DUB
- Max Haughton (3/10) Nov 18 2020 Just -O? LDC is quite impressive with lto and
- jmh530 (2/4) Nov 18 2020 Ah, the -release-nobounds
- Tobias Schmidt (6/8) Nov 20 2020 The number was meant as the number of used threads in our runs.
- 9il (12/22) Nov 18 2020 Thank you a lot! It is a huge benefit for Mir and D to have so
Dear all, to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them. You can find our code and results here: https://github.com/typohnebild/numpy-vs-mir Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below. Kind regards, Tobias
Nov 18 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:Dear all, to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them. You can find our code and results here: https://github.com/typohnebild/numpy-vs-mirNice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of. — Bastiaan.
Nov 18 2020
On Wednesday, 18 November 2020 at 13:01:42 UTC, Bastiaan Veelo wrote:On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:A lot of numpy is in C, C++, fortran, asm etc.... But when you chain a bunch of things together, you are going via python. The language boundary (and python being slow) means that internal iteration in native code is a requirement for performance, which leads to eager allocation for composability via python, which then hurts performance. Numpy makes a very good effort, but is always constrained by this. Clever schemes with laziness where operations in python are actually just composing operations for execution later/on-demand can work as an alternative, but a) that's hard and b) even if you can completely avoid calling back in to python during iteration you would still need JIT to really unlock the performance. Julia fixes this by having all/most in one language which is JIT'd D can do the same with templates AOT, like C++/Eigen does but more flexible and less terrifying code. That's (one part of) what mir provides.Dear all, to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them. You can find our code and results here: https://github.com/typohnebild/numpy-vs-mirNice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of. — Bastiaan.
Nov 18 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:Dear all, to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them. You can find our code and results here: https://github.com/typohnebild/numpy-vs-mir Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below. Kind regards, TobiasVery nice write up. It's been a while since I've used numba, so I was a little confused on the numba 1 and numba 8 runs. It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?
Nov 18 2020
On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote: It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?-O is added by DUB
Nov 18 2020
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:Just -O? LDC is quite impressive with lto and cross-module-inlining turned onOn Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote: It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?-O is added by DUB
Nov 18 2020
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:[snip] -O is added by DUBAh, the -release-nobounds
Nov 18 2020
Thanks for all of your feedback! On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:It's been a while since I've used numba, so I was a little confused on the numba 1 and numba 8 runs.The number was meant as the number of used threads in our runs. The prefix 'numba' is indicating if numba was used (numba) or not (nonumba). We have added a section to clarify this. Thanks for the hint.
Nov 20 2020
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt wrote:Dear all, to compare MIR and Numpy in the HPC context, we implemented a multigrid solver in Python using Numpy and in D using Mir and perforemd some benchmarks with them. You can find our code and results here: https://github.com/typohnebild/numpy-vs-mir Feedback is very welcome. Please feel free to open issues, pull requests or simply post your thoughts below. Kind regards, TobiasThank you a lot! It is a huge benefit for Mir and D to have so quality benchmarks. Python's sweep_3D access memory only once for one element computation, while old D's sweep_slice access it 7 times. A PR [1] for new version of sweep_slice was added, I expect it will be at least twice faster. The new sweep_slice uses a more D'sh approach and single memory access to the computation element. [1] https://github.com/typohnebild/numpy-vs-mir/pull/1 Cheers, Ilya
Nov 18 2020