digitalmars.D.announce - 2016Q1: std.blas

Ilya Yaroshenko (29/29) Dec 26 2015 Hi,

Ilya Yaroshenko (2/2) Dec 26 2015 Related questions about LDC

=?UTF-8?B?Tm9yZGzDtnc=?= (6/8) Mar 24 2016 Sounds amazing. I can't wait either ;)

Andrei Amatuni (3/6) Dec 26 2015 Just want to thank you in advance. Can't wait!

paper rewriter (4/4) Mar 23 2016 Whatever your reason may be for wanting to rewrite your paper it

Basile B. (7/9) Dec 26 2015 Do you mean using std.experimental.allocators and something like

Ilya Yaroshenko (7/17) Dec 26 2015 Mallocator is only base to build various user defined allocators
Andrei Alexandrescu (2/10) Dec 27 2015 There are also Mmap- and Sbrk-based allocators. -- Andrei

Charles McAnany (19/48) Dec 26 2015 I am absolutely thrilled! I've been using scid

Ilya Yaroshenko (23/55) Dec 26 2015 [...]

Russel Winder via Digitalmars-d-announce (26/61) Dec 27 2015 Shouldn't to goal of a project like this be to be something that

Ilya Yaroshenko (26/61) Dec 27 2015 It depends on what you mean with "something like this". OpenBLAS

jmh530 (3/5) Dec 30 2015 Cool.
=?UTF-8?B?Tm9yZGzDtnc=?= (3/4) Mar 24 2016 Is there a repo where I can track progress?

9il (2/6) Mar 24 2016 I will post Level 1 to Mir during this week.

=?UTF-8?B?Tm9yZGzDtnc=?= (2/3) Mar 24 2016 Great!

9il (2/5) Mar 28 2016 http://forum.dlang.org/thread/xnqazcivzbwlpmbymnze@forum.dlang.org Only ...

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Hi,

I will write GEMM and GEMV families of BLAS for Phobos.

Goals:
  - code without assembler
  - code based on SIMD instructions
  - DMD/LDC/GDC support
  - kernel based architecture like OpenBLAS
  - 85-100% FLOPS comparing with OpenBLAS (100%)
  - tiny generic code comparing with OpenBLAS
  - ability to define user kernels
  - allocators support. GEMM requires small internal allocations.
  -  nogc nothrow pure template functions (depends on allocator)
  - optional multithreaded
  - ability to work with `Slice` multidimensional arrays when 
stride between elements in vector is greater than 1. In common 
BLAS matrix strides between rows or columns always equals 1.

Implementation details:
LDC     all   : very generic D/LLVM IR kernels. AVX/2/512/neon 
support is out of the box.
DMD/GDC x86   : kernels for  8 XMM registers based on core.simd
DMD/GDC x86_64: kernels for 16 XMM registers based on core.simd
DMD/GDC other : generic kernels without SIMD instructions. 
AVX/2/512 support can be added in the future.

References:
[1] Anatomy of High-Performance Matrix Multiplication: 
http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
[2] OpenBLAS  https://github.com/xianyi/OpenBLAS

Happy New Year!

Ilya

Dec 26 2015

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Related questions about LDC
http://forum.dlang.org/thread/lcrquwrehuezpxxvquhs forum.dlang.org

Dec 26 2015

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Saturday, 26 December 2015 at 20:51:06 UTC, Ilya Yaroshenko 
wrote:
 Related questions about LDC
 http://forum.dlang.org/thread/lcrquwrehuezpxxvquhs forum.dlang.org

Sounds amazing. I can't wait either ;)

Thanks in advance.

I have some minimum square data fitting algorithms I would like 
to port to D and perhaps also to Phobos.

Mar 24 2016

Andrei Amatuni <andrei.amatuni gmail.com> writes:

On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
wrote:
 Hi,

 I will write GEMM and GEMV families of BLAS for Phobos.

 [...]

Just want to thank you in advance. Can't wait!

Dec 26 2015

paper rewriter <paperrewriter123 gmail.com> writes:

Whatever your reason may be for wanting to rewrite your paper it 
can still be a difficult task.To improve the language and 
structure of the original. Also, To target a different audience 
with your writing.

Mar 23 2016

Basile B. <b2.temp gmx.com> writes:

On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
wrote:
  - allocators support. GEMM requires small internal allocations.
  -  nogc nothrow pure template functions (depends on allocator)

Do you mean using std.experimental.allocators and something like 
(IAllocator alloc) as template parameter ?

If so this will mostly not work. Only Mallocator is really  nogc 
(on phobos master), and maybe only from the next DMD release 
point, so GDC and LDC not before monthes.

Dec 26 2015

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 27 December 2015 at 05:23:27 UTC, Basile B. wrote:
 On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
 wrote:
  - allocators support. GEMM requires small internal 
 allocations.
  -  nogc nothrow pure template functions (depends on allocator)

 Do you mean using std.experimental.allocators and something 
 like (IAllocator alloc) as template parameter ?

 If so this will mostly not work. Only Mallocator is really 
  nogc (on phobos master), and maybe only from the next DMD 
 release point, so GDC and LDC not before monthes.

Mallocator is only base to build various user defined allocators 
with building blocks like freelist. I hope to create std.blas 
module without Phobos and core.memory&core.thread dependencies, 
so it can be used like C library. std.allocator usage is 
optionally.

Ilya

Dec 26 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/27/15 12:23 AM, Basile B. wrote:
 On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko wrote:
  - allocators support. GEMM requires small internal allocations.
  -  nogc nothrow pure template functions (depends on allocator)

 Do you mean using std.experimental.allocators and something like
 (IAllocator alloc) as template parameter ?

 If so this will mostly not work. Only Mallocator is really  nogc (on
 phobos master), and maybe only from the next DMD release point, so GDC
 and LDC not before monthes.

There are also Mmap- and Sbrk-based allocators. -- Andrei

Dec 27 2015

Charles McAnany <dlang charlesmcanany.com> writes:

On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
wrote:
 Hi,

 I will write GEMM and GEMV families of BLAS for Phobos.

 Goals:
  - code without assembler
  - code based on SIMD instructions
  - DMD/LDC/GDC support
  - kernel based architecture like OpenBLAS
  - 85-100% FLOPS comparing with OpenBLAS (100%)
  - tiny generic code comparing with OpenBLAS
  - ability to define user kernels
  - allocators support. GEMM requires small internal allocations.
  -  nogc nothrow pure template functions (depends on allocator)
  - optional multithreaded
  - ability to work with `Slice` multidimensional arrays when 
 stride between elements in vector is greater than 1. In common 
 BLAS matrix strides between rows or columns always equals 1.

 Implementation details:
 LDC     all   : very generic D/LLVM IR kernels. AVX/2/512/neon 
 support is out of the box.
 DMD/GDC x86   : kernels for  8 XMM registers based on core.simd
 DMD/GDC x86_64: kernels for 16 XMM registers based on core.simd
 DMD/GDC other : generic kernels without SIMD instructions. 
 AVX/2/512 support can be added in the future.

 References:
 [1] Anatomy of High-Performance Matrix Multiplication: 
 http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
 [2] OpenBLAS  https://github.com/xianyi/OpenBLAS

 Happy New Year!

 Ilya

I am absolutely thrilled! I've been using scid 
(https://github.com/kyllingstad/scid) and cblas 
(https://github.com/DlangScience/cblas) in a project, and I can't 
wait to see a smooth integration in the standard library.

Couple questions:

Why will the functions be nothrow? It seems that if you try to 
take the determinant of a 3x5 matrix, you should get an exception.

By 'tiny generic code', you mean that DGEMM, SSYMM, CTRMM, etc. 
all become one function, basically?

You mention that you'll have GEMM and GEMV in your features, do 
you think we'll get a more complete slice of BLAS/LAPACK in the 
future, like GESVD and GEES?

If it's not in the plan, I'd be happy to work on re-tooling scid 
and cblas to feel like std.blas. (That is, mimic how you choose 
to represent a matrix, throw the same type of exceptions, etc. 
But still use external libraries.)

Thanks again for this!

Dec 26 2015

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 27 December 2015 at 05:43:47 UTC, Charles McAnany 
wrote:
 On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
 wrote:
 Hi,

 I will write GEMM and GEMV families of BLAS for Phobos.


[...]
 References:
 [1] Anatomy of High-Performance Matrix Multiplication: 
 http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
 [2] OpenBLAS  https://github.com/xianyi/OpenBLAS

 Happy New Year!

 Ilya

 I am absolutely thrilled! I've been using scid 
 (https://github.com/kyllingstad/scid) and cblas 
 (https://github.com/DlangScience/cblas) in a project, and I 
 can't wait to see a smooth integration in the standard library.

 Couple questions:

 Why will the functions be nothrow? It seems that if you try to 
 take the determinant of a 3x5 matrix, you should get an 
 exception.

Determinant is a part of LAPACK API, but not BLAS API. BTW, D 
scientific code should not throw exceptions if it is possible, 
because it can be integrated with C projects.

 By 'tiny generic code', you mean that DGEMM, SSYMM, CTRMM, etc. 
 all become one function, basically?

No, it is about portability and optimisation. OpenBLAS has huge 
code base it is written in assembler for various platforms. I 
want to make generalise optimisation logic.

 You mention that you'll have GEMM and GEMV in your features, do 
 you think we'll get a more complete slice of BLAS/LAPACK in the 
 future, like GESVD and GEES?

LAPACK can be implemented as standalone package. I hope that I 
will have a time to work on it. Another way is to define new part 
of Phobos with sci.* suffix.

 If it's not in the plan, I'd be happy to work on re-tooling 
 scid and cblas to feel like std.blas. (That is, mimic how you 
 choose to represent a matrix, throw the same type of 
 exceptions, etc. But still use external libraries.)

It will be cool to see scid with Matrix type replaced by 
Slice!(2, double*) / Slice!(2, float*). I will argue to do not 
use any kind of Matrix type, but upcoming Slice 
https://github.com/D-Programming-Language/phobos/pull/3397. 
Slice!(2, double*) is generalisation of matrix type with two 
strides, one for rows and one for columns. std.blas can be 
implemented to support this feature out of the box. Slice!(2, 
double*) do not need to have transposed flag (transpose operator 
only swaps strides and lengths) and fortran_vs_C flag (column 
based vs row based) is deprecated rudiment.

Ilya

Dec 26 2015

Russel Winder via Digitalmars-d-announce writes:

On Sat, 2015-12-26 at 19:57 +0000, Ilya Yaroshenko via Digitalmars-d-
announce wrote:
 Hi,
=20
 I will write GEMM and GEMV families of BLAS for Phobos.
=20
 Goals:
 =C2=A0 - code without assembler
 =C2=A0 - code based on SIMD instructions
 =C2=A0 - DMD/LDC/GDC support
 =C2=A0 - kernel based architecture like OpenBLAS
 =C2=A0 - 85-100% FLOPS comparing with OpenBLAS (100%)
 =C2=A0 - tiny generic code comparing with OpenBLAS
 =C2=A0 - ability to define user kernels
 =C2=A0 - allocators support. GEMM requires small internal allocations.
 =C2=A0 -  nogc nothrow pure template functions (depends on allocator)
 =C2=A0 - optional multithreaded
 =C2=A0 - ability to work with `Slice` multidimensional arrays when=C2=A0
 stride between elements in vector is greater than 1. In common=C2=A0
 BLAS matrix strides between rows or columns always equals 1.

Shouldn't to goal of a project like this be to be something that
OpenBLAS isn't? Given D's ability to call C and C++ code, it is not
clear to me that simply rewriting OpenBLAS in D has any goal for the D
or BLAS communities per se. Doesn't stop it being a fun activity for
the programmer, obviously, but unless there is something that isn't in
OpenBLAS, I cannot see this ever being competition and so building a
community around the project.

Now if the threads/OpenCL/CUDA was front and centre so that a goal was
to be Nx faster than OpenBLAS, that could be a goal worth standing
behind.

Not to mention full N-dimension vectors so that D could seriously
compete against Numpy in the Python world.

 Implementation details:
 LDC=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0all=C2=A0=C2=A0=C2=A0: very generic D/LL=

VM IR kernels. AVX/2/512/neon=C2=A0
 support is out of the box.
 DMD/GDC x86=C2=A0=C2=A0=C2=A0: kernels for=C2=A0=C2=A08 XMM registers bas=

ed on core.simd
 DMD/GDC x86_64: kernels for 16 XMM registers based on core.simd
 DMD/GDC other : generic kernels without SIMD instructions.=C2=A0
 AVX/2/512 support can be added in the future.
=20
 References:
 [1] Anatomy of High-Performance Matrix Multiplication:=C2=A0
 http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.
 pdf
 [2] OpenBLAS=C2=A0=C2=A0https://github.com/xianyi/OpenBLAS
=20
 Happy New Year!
=20
 Ilya

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Dec 27 2015

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 27 December 2015 at 10:28:53 UTC, Russel Winder wrote:
On Sat, 2015-12-26 at 19:57 +0000, Ilya Yaroshenko via
Digitalmars-d- announce wrote:
Hi,

I will write GEMM and GEMV families of BLAS for Phobos.

Goals:
- code without assembler
- code based on SIMD instructions
- DMD/LDC/GDC support
- kernel based architecture like OpenBLAS
- 85-100% FLOPS comparing with OpenBLAS (100%)
- tiny generic code comparing with OpenBLAS
- ability to define user kernels
- allocators support. GEMM requires small internal
allocations.
- nogc nothrow pure template functions (depends on
allocator)
- optional multithreaded
- ability to work with `Slice` multidimensional arrays when
stride between elements in vector is greater than 1. In common
BLAS matrix strides between rows or columns always equals 1.

Shouldn't to goal of a project like this be to be something
that OpenBLAS isn't? Given D's ability to call C and C++ code,
it is not clear to me that simply rewriting OpenBLAS in D has
any goal for the D or BLAS communities per se. Doesn't stop it
being a fun activity for the programmer, obviously, but unless
there is something that isn't in OpenBLAS, I cannot see this
ever being competition and so building a community around the
project.

It depends on what you mean with "something like this". OpenBLAS
is _huge_ amount of assembler code. For _each_ platform for
_each_ CPU generation for _each_ floating point / complex type it
would have a kernel or few kernels. It is 30 MB of assembler code.

Not only D code can call C/C++, but also C/C++ (and so any other
language) can call D code. So std.blas may be used in C/C++
projects like Julia.

Now if the threads/OpenCL/CUDA was front and centre so that a
goal was to be Nx faster than OpenBLAS, that could be a goal
worth standing behind.

It can be goal for standalone project. But standard library
should be portable on any platform without significant problems
(especially without problems caused by matrix multiplication). So
my goal is tiny and portable project like ATLAS, but fast like
OpenBLAS. BTW, threads in std.blas would be optional like in
OpenBLAS. Futhermore std.blas will allow a user to write his own
kernels.

Not to mention full N-dimension vectors so that D could
seriously compete against Numpy in the Python world.

I am not sure how D can compete against Numpy in the Python
world, but it can compete Python in world of programming
languages. BTW, N-dimension ranges/arrays/vectors already
implemented for Phobos:

PR:
https://github.com/D-Programming-Language/phobos/pull/3397

Updated Docs:
http://dtest.thecybershadow.net/artifact/website-76234ca0eab431527327d5ce1ec0ad74c6421533-fedfc857090c1c873b17e7a1e4cf853c/web/phobos-prerelease/std_experimental_ndslice.html

Please participate in voting (time constraints is extended) :-)
http://forum.dlang.org/thread/nexiojzouxtawdwnlfvt forum.dlang.org

Ilya

Dec 27 2015

jmh530 <john.michael.hall gmail.com> writes:

On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
wrote:
 Hi,

 I will write GEMM and GEMV families of BLAS for Phobos.

Cool.

Dec 30 2015

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
wrote:
 I will write GEMM and GEMV families of BLAS for Phobos.

Is there a repo where I can track progress?

Mar 24 2016

9il <ilyayaroshenko gmail.com> writes:

On Thursday, 24 March 2016 at 08:20:30 UTC, Nordlöw wrote:
 On Saturday, 26 December 2015 at 19:57:19 UTC, Ilya Yaroshenko 
 wrote:
 I will write GEMM and GEMV families of BLAS for Phobos.

 Is there a repo where I can track progress?

I will post Level 1 to Mir during this week.

Mar 24 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Thursday, 24 March 2016 at 09:30:04 UTC, 9il wrote:
 I will post Level 1 to Mir during this week.

Great!

Mar 24 2016

9il <ilyayaroshenko gmail.com> writes:

On Thursday, 24 March 2016 at 10:52:39 UTC, Nordlöw wrote:
 On Thursday, 24 March 2016 at 09:30:04 UTC, 9il wrote:
 I will post Level 1 to Mir during this week.

 Great!

http://forum.dlang.org/thread/xnqazcivzbwlpmbymnze forum.dlang.org Only
summation for now :-/ --Ilya

Mar 28 2016

D Programming

C/C++ Programming

Other

digitalmars.D.announce - 2016Q1: std.blas