digitalmars.D - Mir GLAS vs Intel MKL: which is faster?
- Ilya Yaroshenko (9/9) Sep 24 2016 Yesterday I announced [1] blog post [2] about Mir [3] benchmark.
- Martin Nowak (6/11) Sep 24 2016 Let me run that on a desktop machine before you publish the
- Ilya Yaroshenko (13/25) Sep 24 2016 This will be good addition! Thank you!
- Joseph Rushton Wakeling (9/11) Sep 26 2016 Is this what you mean by your description of the results as e.g.
- Ilya Yaroshenko (4/15) Sep 26 2016 I mean that for single precision numbers I have 2 charts (normal
- Joseph Rushton Wakeling (5/7) Sep 26 2016 Ah, OK. Would still be nice to have a note, though, on how the
- Ilya Yaroshenko (4/11) Sep 26 2016 The data is the same. The first chart represents absolute values,
- rikki cattermole (3/11) Sep 24 2016 For giggles can we get a comparison against dmc for Intel MKL assuming
- Ilya Yaroshenko (6/23) Sep 24 2016 Intel MKL is closed source. In the same time I don't think that a
- Andrei Alexandrescu (34/42) Sep 24 2016 Awesome. Good to see that most of the graphs have a nice blue envelope
- Andrei Alexandrescu (2/2) Sep 24 2016 Also the linkedin photo is much better than the one at the bottom of the...
- John Colvin (7/9) Sep 24 2016 That's just BLAS (so could be mkl, could be openBLAS, could be
- Andrei Alexandrescu (5/13) Sep 24 2016 I see, thanks. To the extent the Python-specific overheads are
- jmh530 (9/11) Sep 24 2016 Here are some benchmarks from Eigen and Blaze for comparison
- Andrei Alexandrescu (8/20) Sep 24 2016 OK. Yah, native Python wouldn't make sense. It may be worth mentioning
- Ilya Yaroshenko (5/25) Sep 24 2016 Eigen was added (but only data, still need to write text).
- ZombineDev (7/37) Sep 24 2016 It would also be interesting to compare the results to Blaze [1].
- Ilya Yaroshenko (3/17) Sep 25 2016 It has not CBLAS interface like Eigen, so additional efforts are
- Andrei Alexandrescu (18/20) Sep 24 2016 Looks awesome. Couple more nits after one more pass:
- Ilya Yaroshenko (5/11) Sep 25 2016 Thank for the review! I have added notes about Eigen and CBLAS
- =?UTF-8?Q?Ali_=c3=87ehreli?= (18/30) Sep 25 2016 Some more:
- Ilya Yaroshenko (2/29) Sep 25 2016 Thank you, fixed
- Joseph Rushton Wakeling (2/6) Sep 26 2016 "lastest" -> "latest" ... ?
- Joseph Rushton Wakeling (10/12) Sep 26 2016 One extra suggestion:
- Ilya Yaroshenko (3/15) Sep 26 2016 Thank you, added
- Ilya Yaroshenko (4/13) Sep 24 2016 Seems like libeigen_blas.dylib and libeigen_blas_static.a does
- Ilya Yaroshenko (3/19) Sep 24 2016 Fixed with Netlib CBLAS
- dextorious (18/18) Sep 24 2016 First of all, awesome work. It's great to see that it's possible
- Ilya Yaroshenko (3/9) Sep 24 2016 Thank you !!! --Ilya
- WebFreak001 (5/15) Sep 24 2016 I think you should put the Mir.GLAS graph in front of all the
- Andrei Alexandrescu (5/20) Sep 24 2016 Also, one other class of plots that would be informative: performance of...
- Joseph Rushton Wakeling (13/15) Sep 26 2016 One other place that a little more explanation could be helpful
- Ilya Yaroshenko (11/26) Sep 26 2016 Updated:
- Edwin van Leeuwen (4/8) Sep 26 2016 It doesn't really require LDC though, it just requires it to get
- Edwin van Leeuwen (4/12) Sep 26 2016 I would say something like:
- Ilya Yaroshenko (9/17) Sep 26 2016 No, LDC is required. I plan to update DUB for quick testing
- Joseph Rushton Wakeling (8/18) Sep 26 2016 Hmmm, I was thinking more along the lines of just describing
- Ilya Yaroshenko (3/5) Sep 26 2016 Thanks, fixed
- Johan Engelen (14/17) Sep 26 2016 I guess this is my terrain. I'll think about writing that blog
- Edwin van Leeuwen (5/23) Sep 26 2016 Ah, I was not aware that DMD support was dropped completely. I
- Johan Engelen (4/9) Sep 26 2016 "_much_"
- Edwin van Leeuwen (4/13) Sep 26 2016 I love LDC, I just also tend to use DMD for testing and won't
- Ilya Yaroshenko (9/41) Sep 26 2016 Shame is that D is not popular. I think that Mir can replace C /
- Ilya Yaroshenko (3/20) Sep 26 2016 EDIT: that Mir can help D to replace ...
- Andrei Alexandrescu (2/40) Sep 26 2016 I think we need to make it a point to support Mir in dmd. -- Andrei
- jmh530 (3/5) Sep 26 2016 +1, even if it's slow.
- Johan Engelen (10/16) Sep 26 2016 I thought so too but if the algorithm is 50x slower, it probably
- Ilya Yaroshenko (4/6) Sep 26 2016 new thread
Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.io
Sep 24 2016
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit.Let me run that on a desktop machine before you publish the results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here. Mobile CPU often don't reproduce the same numbers, e.g. https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.
Sep 24 2016
On Saturday, 24 September 2016 at 08:13:22 UTC, Martin Nowak wrote:On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:This will be good addition! Thank you! Please use `dub build ...` and then run report at least 2 times, and choice a better one. GEMM uses CPU cache intensively and OS and other apps may significantly hard the performance. So, it make sense to rerun test if something looks failed. Benchmark code: https://github.com/libmir/mir/blob/master/benchmarks/glas/gemm_report.d You can ping at Gitter or open an issue if you need help with Benchmark setup. Gitter: https://gitter.im/libmir/publicYesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit.Let me run that on a desktop machine before you publish the results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here. Mobile CPU often don't reproduce the same numbers, e.g. https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.
Sep 24 2016
On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko wrote:Please use `dub build ...` and then run report at least 2 times, and choice a better one.Is this what you mean by your description of the results as e.g. "single precision numbers x2", "double precision numbers x2", etc.? Might be better, instead of the "x2", to offer a small one-sentence description, e.g. "Each benchmark was run twice for each matrix size, and the better of the two runs was chosen in each case."
Sep 26 2016
On Monday, 26 September 2016 at 09:46:50 UTC, Joseph Rushton Wakeling wrote:On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko wrote:I mean that for single precision numbers I have 2 charts (normal and normalized).Please use `dub build ...` and then run report at least 2 times, and choice a better one.Is this what you mean by your description of the results as e.g. "single precision numbers x2", "double precision numbers x2", etc.? Might be better, instead of the "x2", to offer a small one-sentence description, e.g. "Each benchmark was run twice for each matrix size, and the better of the two runs was chosen in each case."
Sep 26 2016
On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko wrote:I mean that for single precision numbers I have 2 charts (normal and normalized).Ah, OK. Would still be nice to have a note, though, on how the numbers in the charts are generated, i.e. are they the result of a single run, best of N, average of N ... ?
Sep 26 2016
On Monday, 26 September 2016 at 11:03:40 UTC, Joseph Rushton Wakeling wrote:On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko wrote:The data is the same. The first chart represents absolute values, the second chart represents normalised values. Will addI mean that for single precision numbers I have 2 charts (normal and normalized).Ah, OK. Would still be nice to have a note, though, on how the numbers in the charts are generated, i.e. are they the result of a single run, best of N, average of N ... ?
Sep 26 2016
On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.ioFor giggles can we get a comparison against dmc for Intel MKL assuming of course it compiles?
Sep 24 2016
On Saturday, 24 September 2016 at 12:08:33 UTC, rikki cattermole wrote:On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:Intel MKL is closed source. In the same time I don't think that a compiler makes sense for OpenBLAS, Intel MKL, and Apple Accelerate because their computation kernel source code written in assembler.Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.ioFor giggles can we get a comparison against dmc for Intel MKL assuming of course it compiles?
Sep 24 2016
On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.ioAwesome. Good to see that most of the graphs have a nice blue envelope :o). Could you also add a comparison with SciPy? People often say it's just fine for scientific computing. A few correx: "The post represents performance benchmark" -> "This post presents performance benchmarks" "most of numerical" -> "most numerical" "for example Julia Programing Language" -> "for example the Julia Programing Language" "Mir GLAS is Generic Linear Algebra Subroutines. It has single generic kernel for all targets, all floating point and complex types." -> "Mir GLAS (Generic Linear Algebra Subroutines) has a single generic kernel for all CPU targets, all floating point types, and all complex types." "In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code." -> "In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code, so they put less pressure on the instruction cache in large applications." "To add new architecture" -> "To add a new architecture" "needs to extend small GLAS configuration file" -> "needs to extend one small GLAS configuration file" "configuration is available for" -> "configurations are available for" "Mir GLAS has native mir.ndslice interface." -> "Mir GLAS offers a native interface in module mir.ndslice." "for almost all cases" -> "for virtually all benchmarks and parameters" "Ilya is IT consultant, statistician. He has experience in distributed High Load services, business process analyses. He is the author of std.experimental.ndslice and Mir founder. He was a GSoC mentor for the D Language Foundation and Mir project." -> "Ilya is an IT consultant with a background in statistics. He has experience in distributed high-load services and business process analyses. He is the creator of the Mir library, including std.experimental.ndslice in the D Standard Library. He mentored a related GSoC project for the D Language Foundation." Andrei
Sep 24 2016
Also the linkedin photo is much better than the one at the bottom of the benchmark page. -- Andrei
Sep 24 2016
On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Sep 24 2016
On 9/24/16 9:18 AM, John Colvin wrote:On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.An Eigen comparison would be interesting.That'd be awesome especially since the article text refers to it. Andrei
Sep 24 2016
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.Here are some benchmarks from Eigen and Blaze for comparison http://eigen.tuxfamily.org/index.php?title=Benchmark https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks They don't include Python, for the reason mentioned above (no one would use native python implementation of matrix multiplication, it just calls some other library). I don't see a reason to include it here.
Sep 24 2016
On 09/24/2016 10:26 AM, jmh530 wrote:On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:OK. Yah, native Python wouldn't make sense. It may be worth mentioning that SciPy uses BLAS so it has the same performance profile. Also, a great idea for a followup would be a blog post comparing the source code for a typical linear algebra real-world task. The idea being, yes the D version has parity with Intel, but there _is_ a reason to switch to it because of its ease of use. AndreiI see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.Here are some benchmarks from Eigen and Blaze for comparison http://eigen.tuxfamily.org/index.php?title=Benchmark https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks They don't include Python, for the reason mentioned above (no one would use native python implementation of matrix multiplication, it just calls some other library). I don't see a reason to include it here.
Sep 24 2016
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:On 9/24/16 9:18 AM, John Colvin wrote:Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --IlyaOn Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.An Eigen comparison would be interesting.That'd be awesome especially since the article text refers to it. Andrei
Sep 24 2016
On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko wrote:On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:It would also be interesting to compare the results to Blaze [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac it is faster than Eigen and on some instances faster than even Intel MKL. [1]: https://bitbucket.org/blaze-lib/blazeOn 9/24/16 9:18 AM, John Colvin wrote:Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --IlyaOn Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.An Eigen comparison would be interesting.That'd be awesome especially since the article text refers to it. Andrei
Sep 24 2016
On Saturday, 24 September 2016 at 18:15:30 UTC, ZombineDev wrote:On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko wrote:It has not CBLAS interface like Eigen, so additional efforts are required.On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:It would also be interesting to compare the results to Blaze [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac it is faster than Eigen and on some instances faster than even Intel MKL. [1]: https://bitbucket.org/blaze-lib/blaze[...]Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya
Sep 25 2016
On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:Eigen was added (but only data, still need to write text). Relative charts was added.Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" "OpenBLAS Haswell computation kernels" -> "The OpenBLAS Haswell computation kernels" "To add a new architecture or target an engineer" -> "To add a new architecture or target, an engineer" "configurations are available for X87, SSE2, AVX, and AVX2 instruction sets" -> "configurations are available for the X87, SSE2, AVX, and AVX2 instruction sets" In the machine, you may want to specify the amount of L2 cache (I think it's 6 MB) Instead of "Recent" MKL, a version number would be more precise Relative performance plots should specify "percent", i.e. "Performance relative to Mir" -> "Performance relative to Mir [%]" "General Matrix-matrix Multiplication" -> "General Matrix-Matrix Multiplication" Andrei
Sep 24 2016
On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei Alexandrescu wrote:On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya[...]Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" [...]
Sep 25 2016
On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei Alexandrescu wrote:Some more: In the same time, CBLAS interface is unwieldy -> On the other hand, CBLAS interface is unwieldy (Or something better?) GLAS calling conversion -> GLAS calling convention single precisions -> single precision (Several occurrences) double precisions -> double precision (Several occurrences) Stay in touch with the lastest developments in scientific computing for D. -> (I will let others recommend something better there but neither "stay in touch" nor "lastest" sounds right to my ears. :) ) AliOn 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya[...]Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" [...]
Sep 25 2016
On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:Thank you, fixedOn Saturday, 24 September 2016 at 19:01:47 UTC, Andrei Alexandrescu wrote:Some more: In the same time, CBLAS interface is unwieldy -> On the other hand, CBLAS interface is unwieldy (Or something better?) GLAS calling conversion -> GLAS calling convention single precisions -> single precision (Several occurrences) double precisions -> double precision (Several occurrences) Stay in touch with the lastest developments in scientific computing for D. -> (I will let others recommend something better there but neither "stay in touch" nor "lastest" sounds right to my ears. :) ) Ali[...]Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya
Sep 25 2016
On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:Stay in touch with the lastest developments in scientific computing for D. -> (I will let others recommend something better there but neither "stay in touch" nor "lastest" sounds right to my ears. :) )"lastest" -> "latest" ... ?
Sep 26 2016
On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko wrote:Thank for the review! I have added notes about Eigen and CBLAS interface example.One extra suggestion: "Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a native mir.ndslice interface" I would also suggest adding a small note on what `ndslice` is, e.g. "mir.ndslice is a development version of std.experimental.ndslice, which provides an N-dimensional equivalent of D's built-in array slicing."
Sep 26 2016
On Monday, 26 September 2016 at 08:57:06 UTC, Joseph Rushton Wakeling wrote:On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko wrote:Thank you, addedThank for the review! I have added notes about Eigen and CBLAS interface example.One extra suggestion: "Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a native mir.ndslice interface" I would also suggest adding a small note on what `ndslice` is, e.g. "mir.ndslice is a development version of std.experimental.ndslice, which provides an N-dimensional equivalent of D's built-in array slicing."
Sep 26 2016
On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Sep 24 2016
On Saturday, 24 September 2016 at 14:59:32 UTC, Ilya Yaroshenko wrote:On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:Fixed with Netlib CBLASOn Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?Could you also add a comparison with SciPy? People often say it's just fine for scientific computing.That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Sep 24 2016
First of all, awesome work. It's great to see that it's possible to match or even exceed the performance of hand-crafted assembly implementations with generic code. I would suggest adding more information on how the Eigen results were obtained. Unlike OpenBLAS, Eigen performance does often vary by compiler and varies greatly depending on the kind of preprocessor macros that are defined. In particular, EIGEN_NO_DEBUG is defined by default and reduces performance, EIGEN_FAST_MATH is not defined by default but can often increase performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for performance on very small matrices (where MKL and especially OpenBLAS are very inefficient). It's been a while since I've used Eigen, so I may have forgotten one or two. It may also be worth noting in the blog post that these are all single threaded comparisons and multithreaded implementations are on the way. This is obvious to anyone who's followed the development of Mir, but a general audience on Reddit will likely point it out as a deficiency unless stated upfront.
Sep 24 2016
On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:Thank you !!! --Ilya[...]Awesome. Good to see that most of the graphs have a nice blue envelope :o). Could you also add a comparison with SciPy? People often say it's just fine for scientific computing. [...]
Sep 24 2016
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:Yesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.ioI think you should put the Mir.GLAS graph in front of all the other graphs, right now they are overlapping on that graph. Would probably look a bit better if Mir.GLAS was in the front
Sep 24 2016
On 9/24/16 8:59 AM, WebFreak001 wrote:On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:Also, one other class of plots that would be informative: performance of all other libraries normalized to Mir. The Y axis would be in percentages with Mir at 100%. Then people can easily see what relative gains to expect. -- AndreiYesterday I announced [1] blog post [2] about Mir [3] benchmark. Intel MKL and Apple Accelerate was added to the benchmark today. Please help to improve the blog post during this weekend. It will be announced in the Reddit. [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org [2] http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html [3] http://mir.dlang.ioI think you should put the Mir.GLAS graph in front of all the other graphs, right now they are overlapping on that graph. Would probably look a bit better if Mir.GLAS was in the front
Sep 24 2016
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:Please help to improve the blog post during this weekend. It will be announced in the Reddit.One other place that a little more explanation could be helpful is this sentence: "It is written completely in D for LDC (LLVM D Compiler), without any assembler blocks." It would be nice to describe (if it can be summarized in a sentence) why Mir GLAS relies on LDC and/or LLVM, and what differences in outcome can be expected if one uses a different compiler (will it not work at all, or just not as well?). The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.
Sep 26 2016
On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling wrote:On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:Updated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation. Mir GLAS (Generic Linear Algebra Subprograms) has a single generic kernel for all CPU targets, all floating point types, and all complex types. It is written completely in D, without any assembler blocks. In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code, so they put less pressure on the instruction cache in large applications.Please help to improve the blog post during this weekend. It will be announced in the Reddit.One other place that a little more explanation could be helpful is this sentence: "It is written completely in D for LDC (LLVM D Compiler), without any assembler blocks." It would be nice to describe (if it can be summarized in a sentence) why Mir GLAS relies on LDC and/or LLVM, and what differences in outcome can be expected if one uses a different compiler (will it not work at all, or just not as well?). The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.
Sep 26 2016
On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko wrote:Updated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation.It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
Sep 26 2016
On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen wrote:On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko wrote:I would say something like: For optimal performance it should be compiled using LDC.Updated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation.It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
Sep 26 2016
On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen wrote:On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko wrote:No, LDC is required. I plan to update DUB for quick testing without binary compilation for DUB. The reason why DMD support was dropped is that it generates 10-20 times slower code for matrix multiplication. My opinion is that D community is too small to maintain 3 compilers and we should move forward with LDC. IlyaUpdated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation.It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
Sep 26 2016
On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko wrote:Updated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation. Mir GLAS (Generic Linear Algebra Subprograms) has a single generic kernel for all CPU targets, all floating point types, and all complex types. It is written completely in D, without any assembler blocks. In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code, so they put less pressure on the instruction cache in large applications.Hmmm, I was thinking more along the lines of just describing (very briefly) what features of LLVM Mir GLAS relies on. But I think this might run the risk of endless re-revision. One minor tweak: "Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is an LLVM-Accelerated Generic Numerical Library"
Sep 26 2016
On Monday, 26 September 2016 at 12:20:25 UTC, Joseph Rushton Wakeling wrote:"Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is an LLVM-Accelerated Generic Numerical Library"Thanks, fixed
Sep 26 2016
On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling wrote:The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Sep 26 2016
On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling wrote:Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Sep 26 2016
On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library."_much_" :'( :'( Please don't write that to LDC devs.
Sep 26 2016
On Monday, 26 September 2016 at 11:59:57 UTC, Johan Engelen wrote:On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:I love LDC, I just also tend to use DMD for testing and won't force people to use ldc over dmd if they want to use a library I build.Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library."_much_" :'( :'( Please don't write that to LDC devs.
Sep 26 2016
On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times. IlyaOn Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling wrote:Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Sep 26 2016
On Monday, 26 September 2016 at 12:11:16 UTC, Ilya Yaroshenko wrote:On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:EDIT: that Mir can help D to replace ...On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times. Ilya[...]Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.
Sep 26 2016
On 9/26/16 2:11 PM, Ilya Yaroshenko wrote:On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:I think we need to make it a point to support Mir in dmd. -- AndreiOn Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times.On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling wrote:Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Sep 26 2016
On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu wrote:I think we need to make it a point to support Mir in dmd. -- Andrei+1, even if it's slow.
Sep 26 2016
On Monday, 26 September 2016 at 18:27:15 UTC, jmh530 wrote:On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu wrote:I thought so too but if the algorithm is 50x slower, it probably means you can't develop that algorithm any more (I wouldn't). I think the common use-case for Mir is a calculation that takes seconds, so 50x turns a test into a run of several minutes... (defeating the compilation speed advantage of DMD) It is easy to want something, but someone else has to do it and live with it too. It's up to the Mir devs (**volunteers!**) to choose which compilers they support. As you can see from the PR that removed DMD support, the extra burden is substantial.I think we need to make it a point to support Mir in dmd. -- Andrei+1, even if it's slow.
Sep 26 2016
On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu wrote:I think we need to make it a point to support Mir in dmd. -- Andreinew thread https://forum.dlang.org/thread/pqgtvxklmedxuztopwiq forum.dlang.org
Sep 26 2016