digitalmars.D - Benchmarking mir.ndslice + lubeck against numpy and Julia
- p.shkadzko (144/144) Jan 11 2020 Today I decided to write a couple of benchmarks to compare D mir
- user1234 (5/9) Jan 11 2020 Can you specify the command line used for D ?
- p.shkadzko (5/16) Jan 11 2020 All D code was compiled with ldc2
- Jon Degenhardt (8/25) Jan 11 2020 A useful set of flags to try:
- JN (6/14) Jan 11 2020 Meh. I don't like this kind of table. I think it should at least
- Doc Andrew (8/13) Jan 11 2020 C? Try Fortran!
- p.shkadzko (4/19) Jan 12 2020 Yes, technically we are benchmarking against C numpy and
- Dennis (99/102) Jan 12 2020 No, I think the names in your comparison are fair.
- p.shkadzko (4/11) Jan 12 2020 Thanks! Now it becomes much clearer. It makes sense that in the
- bachmeier (8/12) Jan 12 2020 You left out an important detail in your description. gesdd is
- Dennis (7/10) Jan 12 2020 Interesting, I didn't actually know that. I just quoted the scipy
- Arine (15/15) Jan 11 2020 I ran your dot product test for python and D.
- Arine (5/20) Jan 11 2020 ops `--config=release` should be `--build=release` cause you know
- Ola Fosheim Grostad (11/14) Jan 11 2020 A decent optimizer would remove all your code except the print
- p.shkadzko (18/32) Jan 12 2020 Yes, it all should be there, I was impatient to share the timings.
- bachmeier (10/38) Jan 12 2020 These things can make *some* difference, but not the kind of
- bachmeier (8/14) Jan 12 2020 I was going to say the same thing. I don't even need to look at
- Joseph Rushton Wakeling (29/32) Jan 13 2020 Thanks for running these benchmarks. A few notes/questions that
- jmh530 (9/10) Jan 13 2020 On 1 & 2, you are correct for wherever blas/lapack are used,
- Joseph Rushton Wakeling (11/17) Jan 13 2020 Yes, in principle that's what `mir-glas` is supposed to offer,
- bachmeier (6/12) Jan 13 2020 All you have to do is look at the timings. Julia calls into
- jmh530 (2/8) Jan 13 2020 Yes, that was my initial reaction as well.
- Joseph Rushton Wakeling (7/16) Jan 13 2020 Well, see the link I posted for some details on how they achieve
- jmh530 (9/16) Jan 13 2020 ...take a look at the Julia benchmark in the first post. They are
- Joseph Rushton Wakeling (4/8) Jan 13 2020 Fair. Is it possible that Julia is evaluating these calculations
- Pavel Shkadzko (9/21) Jan 13 2020 Indeed, my experience with Julia is zero and I don't know what
- 9il (5/9) Jan 14 2020 Lubeck doesn't use it's own sad implementation. Instead it uses
Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting. Allocation and SVD of [5000 x 10] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 0.5 | | Julia | LinearAlgebra | 0.0014 | | D | mir.ndslice+lubeck | 0.6 | +--------+--------------------+------------+ Allocation and SVD of [5000 x 100] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 4.5 | | Julia | LinearAlgebra | 0.024 | | D | mir.ndslice+lubeck | 5.2 | +--------+--------------------+------------+ Allocation and SVD of [5000 x 1000] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 1.4 | | Julia | LinearAlgebra | 1 | | D | mir.ndslice+lubeck | 12.85 | +--------+--------------------+------------+ Allocation and SVD of [500 x 10000] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 2.34 | | Julia | LinearAlgebra | 1.1 | | D | mir.ndslice+lubeck | 25 | +--------+--------------------+------------+ Matrices allocation and dot product A [3000 x 3000] * B [3000 x 3000] +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 0.62 | | Julia | LinearAlgebra | 0.215 | | D | mir.ndslice+lubeck | 1.5 | +--------+--------------------+------------+ D lubeck's svd method it quite slow and gets even slower with growth of the second dimension while scipy svd becomes surprisingly faster. Dot product unfortunately is also disappointing. I can only complement Julia on such amazing results. Below is the code I was using. Allocation and SVD of [A x B] matrix: Python ------ import numpy as np from scipy.linalg import svd import timeit def svd_fun(): data = np.random.randint(0, 1024, 5000000).reshape((5000, 1000)) u, s, v = svd(data) print(timeit.timeit(svd_fun, number=1)) Julia ----- using LinearAlgebra using BenchmarkTools function svdFun() a = rand([0, 1024], 5000, 1000) res = svd(a) end btime svdFun() D mir.ndslice+lubeck -------------------- /+dub.sdl: dependency "mir" version="~>3.2.0" dependency "lubeck" version="~>1.1.7" libs "lapack" "openblas" +/ import std.datetime; import std.datetime.stopwatch : benchmark; import std.stdio; import std.random : Xorshift, unpredictableSeed, uniform; import std.array: array; import std.range: generate, take; import mir.ndslice; import mir.math.common : optmath; import lubeck; void svdFun() { Xorshift rnd; rnd.seed(unpredictableSeed); auto matrix = generate(() => uniform(0, 1024, rnd)) .take(5000_000) .array .sliced(5000, 1000); auto svdResult = matrix.svd; } void main() { auto svdTime = benchmark!(svdFun)(1); writeln(svdTime); } Matrices allocation and dot product A [3000 x 3000] * B [3000 x 3000] Python ------ def dot_fun(): a = np.random.random(9000000).reshape((3000, 3000)) b = np.random.random(9000000).reshape((3000, 3000)) c = np.dot(a, b) print(timeit.timeit(dot_fun, number=10)/10) Julia ----- function dotFun() a = rand([0, 1.0], 3000, 3000) b = rand([0, 1.0], 3000, 3000) c = dot(a, b) end btime dotFun() D mir.ndslice+lubeck -------------------- static optmath auto rndMatrix(T)(const T maxN, in int dimA, in int dimB) { Xorshift rnd; rnd.seed(unpredictableSeed); const amount = dimA * dimB; return generate(() => uniform(0, maxN, rnd)) .take(amount) .array .sliced(dimA, dimB); } static optmath T fmuladd(T, Z)(const T a, Z z){ return a + z.a * z.b; } void dotFun() { auto matrixA = rndMatrix!double(1.0, 3000, 3000); auto matrixB = rndMatrix!double(1.0, 3000, 3000); auto zipped = zip!true(matrixA, matrixB); auto dot = reduce!fmuladd(0.0, zipped); } void main() { auto dotTime = benchmark!(dotFun)(1); writeln(dotTime); }
Jan 11 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting. [...]Can you specify the command line used for D ? For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.
Jan 11 2020
On Saturday, 11 January 2020 at 22:21:18 UTC, user1234 wrote:On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:All D code was compiled with ldc2 dub build --compiler=ldc2 --single matrix_ops.d I also tried using $DFLAGS="--O3" but that didn't produce better results.Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting. [...]Can you specify the command line used for D ? For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.
Jan 11 2020
On Saturday, 11 January 2020 at 22:50:46 UTC, p.shkadzko wrote:On Saturday, 11 January 2020 at 22:21:18 UTC, user1234 wrote:A useful set of flags to try: -O -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto The '-flto' option turns on LTO. Doesn't always help, but sometimes makes a significant difference, especially when library code is used in tight loops. Another option is to disable bounds checking in safe code (--boundscheck=off).On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:All D code was compiled with ldc2 dub build --compiler=ldc2 --single matrix_ops.d I also tried using $DFLAGS="--O3" but that didn't produce better results.Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting. [...]Can you specify the command line used for D ? For now it's not clear if use LDC, GDC or DMD and the options used (assertions compiled or not ?). DMD is not a good choice for benchmarking.
Jan 11 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Allocation and SVD of [5000 x 10] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 0.5 | | Julia | LinearAlgebra | 0.0014 | | D | mir.ndslice+lubeck | 0.6 | +--------+--------------------+------------+Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.
Jan 11 2020
On Sunday, 12 January 2020 at 00:25:44 UTC, JN wrote:Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.C? Try Fortran! https://github.com/scipy/scipy/blob/master/scipy/linalg/src/i _dist/src/idz_svd.f (I think that's the underlying code being called in the OP's example) But your point still stands. Most of the heavy-duty lifting done by Python scientific computing libs is done by carefully-tuned, long-standing native code. There's nothing stopping D from linking to said libraries... -Doc
Jan 11 2020
On Sunday, 12 January 2020 at 00:25:44 UTC, JN wrote:On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Yes, technically we are benchmarking against C numpy and C/Fortran scipy through Python syntax layer. It should have been "C/Fortran".Allocation and SVD of [5000 x 10] matrix: +--------+--------------------+------------+ | lang | libs | time (sec) | +--------+--------------------+------------+ | Python | numpy+scipy | 0.5 | | Julia | LinearAlgebra | 0.0014 | | D | mir.ndslice+lubeck | 0.6 | +--------+--------------------+------------+Meh. I don't like this kind of table. I think it should at least say "Python/C". Imagine a newcomer encountering this table and being like "wow, D is slower even than Python, what a mess!". But numpy/scipy are running C underneath, so D is not that bad in comparison.
Jan 12 2020
On Sunday, 12 January 2020 at 09:43:46 UTC, p.shkadzko wrote:Yes, technically we are benchmarking against C numpy and C/Fortran scipy through Python syntax layer. It should have been "C/Fortran".No, I think the names in your comparison are fair. You are not comparing the languages themselve, but available solutions for scientific computing for each language. Let me focus on your 5000 * 1000 SVD example. The thing is, all these languages / packages are just using LAPACK for that (see [1] [2] and [3]), so what you are actually comparing is: - The default settings - The LAPACK implementation that is used - A bit of luck (your matrices are randomly created) - The wrapping code (this is where the programming language actually matters) The first important default setting is the algorithm. Lapack offers 'gesdd', a 'more efficient divide-and-conquer approach', and 'gesvd', a 'general rectangular approach' [4]. Python and Julia default to gesdd, while D's Lubeck defaults to gesvd. I factored out the matrix generation code from the benchmark, and switched the D algorithm to 'gesdd', and the time went from ~17 seconds to ~6 on my PC. Comparing that with Python, I got these times for some subsequent runs: Python: 5.7, 6.3, 6.4, 5.3, 6.3 D (LDC release): 5.9, 5.8, 5.8, 5.9, 7.0, 5.7 D (DMD debug): 6.0, 6.5, 6.9, 6.7, 6.0 It can be seen that: - The times vary between runs (possibly because randomized matrices, and other applications running on my PC) - The times between scipy and Lubeck are comparable - Compiler settings are not that important, the extra compilation time is worse than the gained speed (and when doing scientific computing you recompile a lot). The second important setting is whether to compute full matrix U and Vt. When computing the SVD of matrix A of size 5000x1000, the U*S*Vt have sizes 5000x5000, 5000x1000, and 1000x1000 respectively. However, since S is just a diagonal matrix with 1000 entries and the rest is all 0, you can choose to only compute U as a 5000x1000 matrix. This obviously saves a lot of work. Julia does this by default, while Python and D compute the full U by default. When not computing the full matrices in Python and D, I get: D: roughly 1.7 s Python: roughly 1.2 s I don't have Julia, but I bet that it will be in the same order of magnitude on my PC as well. The new code can be found below. So in conclusion, Lubeck has some unfortunate default settings for benchmarking, and even with comparable settings D can be a little slower. This is either because inferior wrapping code, or because I am using a different LAPACK implementation. For D I use the LAPACK implementation of OpenBLAS that I compiled myself a while ago, I don't know what Python ships with. In any case, while your benchmark is not representative of the actual languages, it is indeed unfortunate when people try something D, find it significantly slower than other languages, and someone on the forum has to point out all the steps to get D performance on par. --- [1] https://github.com/scipy/scipy/blob/adc4f4f7bab120ccfab9383aba272954a0a12fb0/scipy/linalg/decomp_svd.py#L16-L139 [2] https://github.com/JuliaLang/julia/blob/aa35ee2d30065803410718378e5673c7f845da62/stdlib/LinearAlgebra/src/svd.jl#L93 [3] https://github.com/kaleidicassociates/lubeck/blob/72091ecdd545f9524a1d80e5880cda19845143d0/source/lubeck.d#L356 [4] https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html New Python code: ``` import numpy as np from scipy.linalg import svd import timeit h = 5000 w = 1000 data = np.random.randint(0, 1024, w*h).reshape((h, w)) def svd_fun(): u, s, v = svd(data, overwrite_a=False, full_matrices=False, lapack_driver='gesdd') print(timeit.timeit(svd_fun, number=1)) ``` New D code: ``` import std, mir.ndslice, lubeck; import std.datetime.stopwatch: benchmark; enum h = 5000; enum w = 1000; auto getMtx() { Xorshift rnd; rnd.seed(unpredictableSeed); auto matrix = generate(() => uniform(0, 1024, rnd)) .take(w * h).array.sliced(h, w); return matrix; } auto svdFun(T)(T mtx) { auto result = mtx.svd!(No.allowDestroy, "gesdd")(Yes.slim); // gesvd gesdd } void main() { auto svdTime = benchmark!(() => getMtx.svdFun())(1); writeln(svdTime); } ```
Jan 12 2020
On Sunday, 12 January 2020 at 12:46:44 UTC, Dennis wrote:[1] https://github.com/scipy/scipy/blob/adc4f4f7bab120ccfab9383aba272954a0a12fb0/scipy/linalg/decomp_svd.py#L16-L139 [2] https://github.com/JuliaLang/julia/blob/aa35ee2d30065803410718378e5673c7f845da62/stdlib/LinearAlgebra/src/svd.jl#L93 [3] https://github.com/kaleidicassociates/lubeck/blob/72091ecdd545f9524a1d80e5880cda19845143d0/source/lubeck.d#L356 [...]Thanks! Now it becomes much clearer. It makes sense that in the end of the day it is BLAS+LAPACK for any language engaging into matrix calculations.
Jan 12 2020
On Sunday, 12 January 2020 at 12:46:44 UTC, Dennis wrote:The first important default setting is the algorithm. Lapack offers 'gesdd', a 'more efficient divide-and-conquer approach', and 'gesvd', a 'general rectangular approach' [4]. Python and Julia default to gesdd, while D's Lubeck defaults to gesvd.You left out an important detail in your description. gesdd is more efficient, but at the expense of being less accurate, and can easily fail on you. I have a strong preference for correct as the default as I've run into problems with optimized for speed by default before. Nobody ever digs in and understands the tradeoffs involved for the default. https://savannah.gnu.org/bugs/?55564
Jan 12 2020
On Sunday, 12 January 2020 at 14:41:11 UTC, bachmeier wrote:You left out an important detail in your description. gesdd is more efficient, but at the expense of being less accurate, and can easily fail on you.Interesting, I didn't actually know that. I just quoted the scipy documentation. I have only used SVD once before today, and it was on a 3x3 matrix for point set registration. Performance and accuracy weren't a problem. It was wondering why Matlab and Octave default to the slower method too, but that explains why.
Jan 12 2020
I ran your dot product test for python and D. $ python main.py 0.6625702600000001 $ ./main [100 ms, 916 ╬╝s, and 2 hnsecs] // 0.1009162 secs Some things I guess. Dub is horrible. The defaults are horrible. A lot of the way it functions is horrible. So I'm not surprised you got worse results, even though it seems you have a faster PC than me. dub build --config=release --compiler=ldc2 --arch=x86_64 --single main.d Dub sometimes defaults to x86, cause why not, it's not a dying platform. Which can give worse codegen. It also default to debug, cause, you know if you run something through dub it's obviously being run through a debugger.
Jan 11 2020
On Sunday, 12 January 2020 at 05:05:15 UTC, Arine wrote:I ran your dot product test for python and D. $ python main.py 0.6625702600000001 $ ./main [100 ms, 916 ╬╝s, and 2 hnsecs] // 0.1009162 secs Some things I guess. Dub is horrible. The defaults are horrible. A lot of the way it functions is horrible. So I'm not surprised you got worse results, even though it seems you have a faster PC than me. dub build --config=release --compiler=ldc2 --arch=x86_64 --single main.d Dub sometimes defaults to x86, cause why not, it's not a dying platform. Which can give worse codegen. It also default to debug, cause, you know if you run something through dub it's obviously being run through a debugger.ops `--config=release` should be `--build=release` cause you know dub doesn't have confusing argument names and a really horrible documentation page that literally just repeats all the same arguments 20 times.
Jan 11 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.A decent optimizer would remove all your code except the print statement. Make sure to output the result of the computation. Also make sure you use the same algorithms and accuracy. If you write your own innerproduct in one language then you should do so in the other languages as well and require the result to follow ieee754 by evaluating the result. Please note that floating point code cannot be fully restructured by the optimizer without setting the optimizer to less predictable fast-math settings. So it cannot even in theory approach hand tuned library code.
Jan 11 2020
On Sunday, 12 January 2020 at 07:04:00 UTC, Ola Fosheim Grostad wrote:On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Yes, it all should be there, I was impatient to share the timings. On the other hand, I believe that stating that D can be faster at something but you only have to know which compiler to use, correct flags, maybe don't use dub because its defaults are bad will never going to work. People will not spend time trying to figure all this out. There is already some compiler heritage one should be aware of when using D and I suspect a lot of other things. True that in order to make a fair comparison you should be an expert in all these things but then the results won't be representative of the beginners code. Python out-of-the-box gives you such guarantees saying: "here is a programming language for you that is easy as a toaster and btw be assured that it will run fast no matter what stupid thing you do". On top of that you have quite detailed and beginner friendly docs. I shall keep checking out mir libs and lubeck to see what's more in there.Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.A decent optimizer would remove all your code except the print statement. Make sure to output the result of the computation. Also make sure you use the same algorithms and accuracy. If you write your own innerproduct in one language then you should do so in the other languages as well and require the result to follow ieee754 by evaluating the result. Please note that floating point code cannot be fully restructured by the optimizer without setting the optimizer to less predictable fast-math settings. So it cannot even in theory approach hand tuned library code.
Jan 12 2020
On Sunday, 12 January 2020 at 10:25:09 UTC, p.shkadzko wrote:On Sunday, 12 January 2020 at 07:04:00 UTC, Ola Fosheim Grostad wrote:These things can make *some* difference, but not the kind of difference you're claiming to have uncovered.On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Yes, it all should be there, I was impatient to share the timings. On the other hand, I believe that stating that D can be faster at something but you only have to know which compiler to use, correct flags, maybe don't use dub because its defaults are bad will never going to work.Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.A decent optimizer would remove all your code except the print statement. Make sure to output the result of the computation. Also make sure you use the same algorithms and accuracy. If you write your own innerproduct in one language then you should do so in the other languages as well and require the result to follow ieee754 by evaluating the result. Please note that floating point code cannot be fully restructured by the optimizer without setting the optimizer to less predictable fast-math settings. So it cannot even in theory approach hand tuned library code.Python out-of-the-box gives you such guarantees saying: "here is a programming language for you that is easy as a toaster and btw be assured that it will run fast no matter what stupid thing you do".If you're doing simple things, sure, Python will just forward your requests to underlying C/C++/Fortran libraries, but any language - including D - can do that. You could have achieved the same results using Ruby, Tcl, or Perl as you did with your Python code. As soon as you start doing something involved, where some parts of your code are pure Python and some parts are C/C++/Fortran, performance falls apart in a hurry.
Jan 12 2020
On Sunday, 12 January 2020 at 07:04:00 UTC, Ola Fosheim Grostad wrote:On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:I was going to say the same thing. I don't even need to look at the code (and I wouldn't have time to do so anyway). If all you're doing is comparing the execution speed of the underlying libraries, Julia is not going to beat Python by such a wide margin. Julia is "fast" because it's not carrying out the the claimed operations.Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.A decent optimizer would remove all your code except the print statement.
Jan 12 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting.Thanks for running these benchmarks. A few notes/questions that may be interesting: (1) Do I understand right that both numpy and lubeck just call into external linear algebra libraries (BLAS and LAPACK) for the actual number crunching? (I believe so.) (2) Assuming that is true, then compiler flags will probably only make a difference to the initialization and allocation of the matrices. (IIRC, Xorshift's performance is very different when optimized versus un-optimized.) That likely explains why D and Python performance are near-identical once matrix initialization is taken out of the equation: the actual number-crunching is being done by the same external libraries. (3) The much speedier Julia results can probably be explained by its use of dedicated data structures for various stages of calculation, see §5.2 of https://arxiv.org/pdf/1411.1607.pdf. (4) Just as a general remark for your original benchmarks: while obviously a "just getting started" comparison is perfectly valid, it's unlikely you're comparing like with like. Leaving aside the compiler optimization flags, odds are that e.g. the randomized initialization of matrix elements is being done very differently in the 3 different languages (probably a different underlying RNG in each case, and quite possibly some extra tricks in numpy and Julia to handle the case of randomly initializing the contents of an entire matrix or array). **Principal takeaway**: anyone who wants to match Julia for speed needs to follow its lead in terms of dedicated data structures for linear algebra, not just hook into long-running standards like BLAS and LAPACK. There are opportunities here for libmir :-)
Jan 13 2020
On Monday, 13 January 2020 at 11:51:20 UTC, Joseph Rushton Wakeling wrote:[snip]On 1 & 2, you are correct for wherever blas/lapack are used, which is in a lot of numpy, lubeck, R, etc. However, mir-glas is intended to be an alternative to blas, though I don't think much work has been done recently. On 3, I didn't have time to look into the Julia results, but someone above made the comment that Julia was optimizing away the calculation itself. Dennis also had some interesting points above.
Jan 13 2020
On Monday, 13 January 2020 at 13:34:40 UTC, jmh530 wrote:However, mir-glas is intended to be an alternative to blas, though I don't think much work has been done recently.Yes, in principle that's what `mir-glas` is supposed to offer, but the current the benchmarks only cover lubeck (which doesn't use `mir-glas`), and as you say, `mir-glas` itself is not feature-complete when it comes to linear algebra APIs. If anyone does get round to trying to benchmark functionality from `mir-glas`, note that you will have to pay quite a lot of attention to optimization flags if you want to get best results.On 3, I didn't have time to look into the Julia results, but someone above made the comment that Julia was optimizing away the calculation itself. Dennis also had some interesting points above.Yes, I would imagine that might also be part of it. Again, I would anticipate that being something where D ought to be able to readily implement similar features.
Jan 13 2020
On Monday, 13 January 2020 at 13:34:40 UTC, jmh530 wrote:On Monday, 13 January 2020 at 11:51:20 UTC, Joseph Rushton Wakeling wrote: On 3, I didn't have time to look into the Julia results, but someone above made the comment that Julia was optimizing away the calculation itself. Dennis also had some interesting points above.All you have to do is look at the timings. Julia calls into lapack to do these operations, just like everybody else. No amount of optimization will result in the timings reported for Julia - it would be a revolution unlike any ever seen in computing if they were accurate.
Jan 13 2020
On Monday, 13 January 2020 at 14:26:55 UTC, bachmeier wrote:[snip] All you have to do is look at the timings. Julia calls into lapack to do these operations, just like everybody else. No amount of optimization will result in the timings reported for Julia - it would be a revolution unlike any ever seen in computing if they were accurate.Yes, that was my initial reaction as well.
Jan 13 2020
On Monday, 13 January 2020 at 14:52:59 UTC, jmh530 wrote:On Monday, 13 January 2020 at 14:26:55 UTC, bachmeier wrote:Well, see the link I posted for some details on how they achieve that -- for example, when doing QR or LU decomposition, instead of doing in-place calculations where they have to replace every element of a m*n matrix, they define custom types that store the matrix factorizations in packed representations that only include the non-zero elements.[snip] All you have to do is look at the timings. Julia calls into lapack to do these operations, just like everybody else. No amount of optimization will result in the timings reported for Julia - it would be a revolution unlike any ever seen in computing if they were accurate.Yes, that was my initial reaction as well.
Jan 13 2020
On Monday, 13 January 2020 at 17:09:29 UTC, Joseph Rushton Wakeling wrote:[snip] Well, see the link I posted for some details on how they achieve that -- for example, when doing QR or LU decomposition, instead of doing in-place calculations where they have to replace every element of a m*n matrix, they define custom types that store the matrix factorizations in packed representations that only include the non-zero elements....take a look at the Julia benchmark in the first post. They are about 350x faster than the Numpy and D versions that are basically just calling C code. Do you really that the people who write linear algebra code are missing 350x improvements? Maybe their algorithm is faster than what lapack does, but I'm skeptical that - properly benchmarked - it could be that much faster.
Jan 13 2020
On Monday, 13 January 2020 at 17:33:16 UTC, jmh530 wrote:Do you really that the people who write linear algebra code are missing 350x improvements? Maybe their algorithm is faster than what lapack does, but I'm skeptical that - properly benchmarked - it could be that much faster.Fair. Is it possible that Julia is evaluating these calculations lazily, so you don't pay for the factorization or SVD until you genuinely start using the numbers?
Jan 13 2020
On Monday, 13 January 2020 at 14:26:55 UTC, bachmeier wrote:On Monday, 13 January 2020 at 13:34:40 UTC, jmh530 wrote:Indeed, my experience with Julia is zero and I don't know what btime is actually testing. Just copied it from howto benchmark Julia code page. I would honestly test the time it takes the actual script to run like "$ time julia demo.jl" because it does take some time before it precompiles the code so you don't feel those reported 0.1 seconds at all. And if you do data processing and scripting, console responsiveness and processing speed is all that matters to me at least.On Monday, 13 January 2020 at 11:51:20 UTC, Joseph Rushton Wakeling wrote: On 3, I didn't have time to look into the Julia results, but someone above made the comment that Julia was optimizing away the calculation itself. Dennis also had some interesting points above.All you have to do is look at the timings. Julia calls into lapack to do these operations, just like everybody else. No amount of optimization will result in the timings reported for Julia - it would be a revolution unlike any ever seen in computing if they were accurate.
Jan 13 2020
On Saturday, 11 January 2020 at 21:54:13 UTC, p.shkadzko wrote:Today I decided to write a couple of benchmarks to compare D mir with lubeck against Python numpy, then I also added Julia snippets. The results appeared to be quite interesting. [...]Lubeck doesn't use it's own sad implementation. Instead it uses the native library. The speed depends on the kind of BLAS library used. For speed program it's also recommended to use mir-lapack in pair with Intel MKL instead of Lubeck.
Jan 14 2020