digitalmars.D - std.math API rework

Ilya Yaroshenko (18/18) Oct 06 2016 Effective work with std.experimental.ndslice and and

Ilya Yaroshenko (3/5) Oct 06 2016 EDIT: mir.ndslice.algorithm
Iain Buclaw via Digitalmars-d (15/30) Oct 06 2016 If you can prove that llvm intrinsics are pure (gcc math intrinsics

Ilya Yaroshenko (3/12) Oct 06 2016 LLVM math functions are pure :P http://llvm.org/docs/LangRef.html

Iain Buclaw via Digitalmars-d (18/33) Oct 06 2016 I picked a random example.

Ilya Yaroshenko (27/65) Oct 06 2016 Current code is (please look in LDC's fork):

Iain Buclaw via Digitalmars-d (7/79) Oct 06 2016 Well, sure, I could mark all gcc intrinsics as pure so you can use
kinke (5/6) Oct 06 2016 No, Iain is right. These LLVM intrinsics are most often simple

Andrei Alexandrescu (15/23) Oct 06 2016 I'd love to understand this point better. In particular, how do you

Ilya Yaroshenko (46/76) Oct 07 2016 For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using

Andrei Alexandrescu (40/77) Oct 07 2016 This is also the case for C++ - most math functions are linked from the

Johan Engelen (8/16) Oct 07 2016 That trivial non-template functions are not cross-module inlined
Ilya Yaroshenko (182/287) Oct 09 2016 1) BLAS-like API requires only sqrt and fabs. The solutions used

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Effective work with std.experimental.ndslice and and 
mir.ndslice.array requires half of std.math be an exactly aliases 
to LLVM intrinsics (for LDC).

To enable vectorization for mir.ndslice.algorithm I created 
internal math module [1] in Mir. But this is weird, because third 
side packages like DCV [2] requires to use the module too. Also, 
some optimisation for std.complex and future 
std.exprimental.color would be very ugly without proposed change.

Proposed change is very simple:
Each math function listed in [1] should be a template for DMD/GDC 
and an alias for LDC in std.math.

If some one has strong arguments against it, please let me know 
now.

[1] 
https://github.com/libmir/mir/blob/master/source/mir/internal/math.d
[2] https://github.com/ljubobratovicrelja/dcv

Best regards,
Ilya

Oct 06 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 6 October 2016 at 16:53:54 UTC, Ilya Yaroshenko 
wrote:
 Effective work with std.experimental.ndslice and and 
 mir.ndslice.array requires half of std.math be an exactly

EDIT: mir.ndslice.algorithm

Oct 06 2016

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Effective work with std.experimental.ndslice and and mir.ndslice.array
 requires half of std.math be an exactly aliases to LLVM intrinsics (for
 LDC).

 To enable vectorization for mir.ndslice.algorithm I created internal math
 module [1] in Mir. But this is weird, because third side packages like DCV
 [2] requires to use the module too. Also, some optimisation for std.complex
 and future std.exprimental.color would be very ugly without proposed change.

 Proposed change is very simple:
 Each math function listed in [1] should be a template for DMD/GDC and an
 alias for LDC in std.math.

 If some one has strong arguments against it, please let me know now.

 [1] https://github.com/libmir/mir/blob/master/source/mir/internal/math.d
 [2] https://github.com/ljubobratovicrelja/dcv

 Best regards,
 Ilya

If you can prove that llvm intrinsics are pure (gcc math intrinsics
are not) and that llvm intrinsics pass the unittest (gcc math
intrinsics aren't guaranteed to due to the vagary of libm
implementations and quirky cpu support that trades correctness for
efficiency).

I have a reasonable belief to say that the answer is no on both parts.
Even if some llvm intrinsics lower to native instructions on x86, most
other platforms will just forward it to an impure, mixed bag of long
double support libm.  :-)

If you need it specialized, do it yourself.  Phobos seems more of a
place for generalized application support, from what I gather, and how
I approach it.

Iain.

Oct 06 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d 
 <digitalmars-d puremagic.com> wrote:
 [...]

 If you can prove that llvm intrinsics are pure (gcc math 
 intrinsics are not) and that llvm intrinsics pass the unittest 
 (gcc math intrinsics aren't guaranteed to due to the vagary of 
 libm implementations and quirky cpu support that trades 
 correctness for efficiency).

 [...]

LLVM math functions are pure :P http://llvm.org/docs/LangRef.html

I can do a Phobos fork. But I hope I can fix it.

Oct 06 2016

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 [...]


 If you can prove that llvm intrinsics are pure (gcc math intrinsics are
 not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't
 guaranteed to due to the vagary of libm implementations and quirky cpu
 support that trades correctness for efficiency).

 [...]


 LLVM math functions are pure :P http://llvm.org/docs/LangRef.html

I picked a random example.

http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic

"""

Semantics:

This function returns the sine of the specified operand, returning the
same values as the libm sin functions would, and handles error
conditions in the same way.

"""

This would have me believe that they are infact not pure. ;-)

But, I've never looked under the hood of LLVM, so I can only believe
those who have.  In any case, IMO, you should focus on getting this
into core.math.  That's where compiler intrinsics should go.  The
intrinsics of std.math are historical baggage and are probably due a
deprecation - that is, in the sense that their symbols be converted
into aliases.

Iain.

Oct 06 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Thursday, 6 October 2016 at 20:45:24 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d 
 <digitalmars-d puremagic.com> wrote:
 On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d 
 <digitalmars-d puremagic.com> wrote:
 [...]


 If you can prove that llvm intrinsics are pure (gcc math 
 intrinsics are not) and that llvm intrinsics pass the 
 unittest (gcc math intrinsics aren't guaranteed to due to the 
 vagary of libm implementations and quirky cpu support that 
 trades correctness for efficiency).

 [...]


 LLVM math functions are pure :P 
 http://llvm.org/docs/LangRef.html

 I picked a random example.

 http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic

 """

 Semantics:

 This function returns the sine of the specified operand, 
 returning the same values as the libm sin functions would, and 
 handles error conditions in the same way.

 """

 This would have me believe that they are infact not pure. ;-)

 But, I've never looked under the hood of LLVM, so I can only 
 believe those who have.  In any case, IMO, you should focus on 
 getting this into core.math.  That's where compiler intrinsics 
 should go.  The intrinsics of std.math are historical baggage 
 and are probably due a deprecation - that is, in the sense that 
 their symbols be converted into aliases.

 Iain.

Current code is (please look in LDC's fork):

version(LDC)
{
     real   cos(real   x)  safe pure nothrow  nogc { return 
llvm_cos(x); }
     ///ditto
     double cos(double x)  safe pure nothrow  nogc { return 
llvm_cos(x); }
     ///ditto
     float  cos(float  x)  safe pure nothrow  nogc { return 
llvm_cos(x); }
}
else
{

real cos(real x)  safe pure nothrow  nogc { pragma(inline, true); 
return core.math.cos(x); }
//FIXME
///ditto
double cos(double x)  safe pure nothrow  nogc { return 
cos(cast(real)x); }
//FIXME
///ditto
float cos(float x)  safe pure nothrow  nogc { return 
cos(cast(real)x); }

}

So, I don't see a reason why this change break something, hehe

Oct 06 2016

Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 6 October 2016 at 22:55, Ilya Yaroshenko via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Thursday, 6 October 2016 at 20:45:24 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote:
 On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 [...]



 If you can prove that llvm intrinsics are pure (gcc math intrinsics are
 not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't
 guaranteed to due to the vagary of libm implementations and quirky cpu
 support that trades correctness for efficiency).

 [...]



 LLVM math functions are pure :P http://llvm.org/docs/LangRef.html

 I picked a random example.

 http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic

 """

 Semantics:

 This function returns the sine of the specified operand, returning the
 same values as the libm sin functions would, and handles error conditions in
 the same way.

 """

 This would have me believe that they are infact not pure. ;-)

 But, I've never looked under the hood of LLVM, so I can only believe those
 who have.  In any case, IMO, you should focus on getting this into
 core.math.  That's where compiler intrinsics should go.  The intrinsics of
 std.math are historical baggage and are probably due a deprecation - that
 is, in the sense that their symbols be converted into aliases.

 Iain.


 Current code is (please look in LDC's fork):

 version(LDC)
 {
     real   cos(real   x)  safe pure nothrow  nogc { return llvm_cos(x); }
     ///ditto
     double cos(double x)  safe pure nothrow  nogc { return llvm_cos(x); }
     ///ditto
     float  cos(float  x)  safe pure nothrow  nogc { return llvm_cos(x); }
 }
 else
 {

 real cos(real x)  safe pure nothrow  nogc { pragma(inline, true); return
 core.math.cos(x); }
 //FIXME
 ///ditto
 double cos(double x)  safe pure nothrow  nogc { return cos(cast(real)x); }
 //FIXME
 ///ditto
 float cos(float x)  safe pure nothrow  nogc { return cos(cast(real)x); }

 }

 So, I don't see a reason why this change break something, hehe

Well, sure, I could mark all gcc intrinsics as pure so you can use
__builtin_print() or malloc() in pure code.  Doesn't mean the compiler
is honest in allowing it. ;-)

Get this in core.math, there's no place for compiler-specific code in phobos.

Iain.

Oct 06 2016

kinke <noone nowhere.com> writes:

On Thursday, 6 October 2016 at 20:55:55 UTC, Ilya Yaroshenko 
wrote:
 So, I don't see a reason why this change break something, hehe

No, Iain is right. These LLVM intrinsics are most often simple 
forwarders to the C runtime functions; I was rather negatively 
surprised to find out a while ago.

Oct 06 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/6/16 12:53 PM, Ilya Yaroshenko wrote:
 Effective work with std.experimental.ndslice and and mir.ndslice.array
 requires half of std.math be an exactly aliases to LLVM intrinsics (for
 LDC).

Why?

 To enable vectorization for mir.ndslice.algorithm I created internal
 math module [1] in Mir. But this is weird, because third side packages
 like DCV [2] requires to use the module too. Also, some optimisation for
 std.complex and future std.exprimental.color would be very ugly without
 proposed change.

I'd love to understand this point better. In particular, how do you 
reconcile it with kinke's assertion that some of these intrinsics simply 
format to C routines?

Our high-level view is that doing efficient work should not require one 
to fork the standard library. On the other hand, the traditional place 
for compiler-specific code is in the core runtime, not the standard 
library. (There is a tiny bit of stdlib code that depends on dmd to be 
fair.)

So I'd like to be reasonably confident the right rocks are put in the 
right places. Have you considered (per Iain) migrating these symbols to 
core.math and then forward those in stdlib to them?


Thanks,

Andrei

Oct 06 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Friday, 7 October 2016 at 01:53:27 UTC, Andrei Alexandrescu 
wrote:
 On 10/6/16 12:53 PM, Ilya Yaroshenko wrote:
 Effective work with std.experimental.ndslice and and 
 mir.ndslice.array
 requires half of std.math be an exactly aliases to LLVM 
 intrinsics (for
 LDC).

 Why?

 To enable vectorization for mir.ndslice.algorithm I created 
 internal
 math module [1] in Mir. But this is weird, because third side 
 packages
 like DCV [2] requires to use the module too. Also, some 
 optimisation for
 std.complex and future std.exprimental.color would be very 
 ugly without
 proposed change.

 I'd love to understand this point better. In particular, how do 
 you reconcile it with kinke's assertion that some of these 
 intrinsics simply format to C routines?

 Our high-level view is that doing efficient work should not 
 require one to fork the standard library. On the other hand, 
 the traditional place for compiler-specific code is in the core 
 runtime, not the standard library. (There is a tiny bit of 
 stdlib code that depends on dmd to be fair.)

 So I'd like to be reasonably confident the right rocks are put 
 in the right places. Have you considered (per Iain) migrating 
 these symbols to core.math and then forward those in stdlib to 
 them?


 Thanks,

 Andrei

For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using 
mir.ndslice.algorithm.
vxorps instruction can be used for fabs.
vsqrtps instruction can be used for sqrt.
LDC's  fastmath allows to re-associate summation elements.

Depend on data cache level this allows to speed up iteration 8 
times for single precision floating point number for AVX (16 
times for AVX512?).

Furthermore, at least for x86,  fastmath flag does not break any 
math logic. It allows only to re-associate elementes (i mean 
exactly this example for x86).

Current std.math has following problems:

1. Math funcitons are not templates -> Phobos should be linked.
    1.a I strongly decided to move forward without DRuntime. A 
phobos as source library is partially OK, but no linking 
dependencies should be. BetterC mode is what required for Mir to 
replace OpenBLAS and Eigen. New cpuid, threads and mutexes should 
be provided too. New cpuid [1] is already implemented (I just 
need to replace module constructor with explicit initialization 
function). My strong opinion is that a D library for D is a wrong 
direction. A numeric D library should be a product for other 
languages too, like many C libraries does. One my client is 
thinking to invest to nothrow  nogc async I/O for production, so 
it may help to move to betterC direction too.
   2.b In context of 1.a, linking multiple binaries compiled with 
different DRuntime/Phobos versions may cause significant 
problems. DRuntime is not so stable like std C lib. One may say 
that I am doing something wrong if I need to link libraries 
compiled with different DRuntimes. But this is what will happen 
often with D in real world if D start to replace C libraries 
(1.a). So, betterC without DRuntime / Phobos linking dependencies 
is a direction to move forward. nothrow  nogc generic Phobos code 
seems to be OK.

2. Math funcitons are not templates -> They are not inlined -> No 
vectorization + function calls in a loop body. One day this may 
be fixed, but (1.a, 1.b).

3. Math funcitons are not aliases for LDC -> LDC's  fastmath 
would not work for them. To enable  fastmath for this functions 
they should be annotated with  fastmath, which is not acceptable. 
If a function is an alias for llvm intrinsics, than  fastmath 
flag can be applied to a function, which calls it.

[1] https://github.com/libmir/cpuid

Best regards,
Ilya

Oct 07 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote:
 For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using
 mir.ndslice.algorithm.
 vxorps instruction can be used for fabs.
 vsqrtps instruction can be used for sqrt.
 LDC's  fastmath allows to re-associate summation elements.

 Depend on data cache level this allows to speed up iteration 8 times for
 single precision floating point number for AVX (16 times for AVX512?).

Yah, 8 times is large enough to justify an important change.

 Current std.math has following problems:

 1. Math funcitons are not templates -> Phobos should be linked.

This is also the case for C++ - most math functions are linked from the 
C standard library. How do typical linear algebra libraries similar in 
functionality with Mir (such as Eigen) deal with this situation?

Also, one question is how does the existence of unused functions impede 
the working of faster functions provided separately? Is it a sticky 
point that std.math is he exact module used?

Trying to get a good grip on the matter. Generally you'd have a very 
easy time convincing me that templates are a better way to go :o). But 
we need to have a good motivation. Do you have a brief example 
illustrating one proposed template and how it is better than the old ways?

    1.a I strongly decided to move forward without DRuntime. A phobos as
 source library is partially OK, but no linking dependencies should be.
 BetterC mode is what required for Mir to replace OpenBLAS and Eigen.
 New
 cpuid, threads and mutexes should be provided too. New cpuid [1] is
 already implemented (I just need to replace module constructor with
 explicit initialization function).

Do you think you can integrate the new cpuid implementation with the 
existing interface (most likely greatly enhancing it) without breaking 
the existing clients?

Same question for threads.

Same question for mutexes.

 My strong opinion is that a D library
 for D is a wrong direction. A numeric D library should be a product for
 other languages too, like many C libraries does. One my client is
 thinking to invest to nothrow  nogc async I/O for production, so it may
 help to move to betterC direction too.

Sure. A different way to frame this is to make D friendlier toward 
linking with other languages. The way I see it, if we get alternatives 
for cpuid, threads, and mutexes in Mir, that would benefit clients 
interested in linear algebra. If we get them in druntime, that would 
benefit clients interested in linear algebra and everything else. 
Clearly the impact would be much larger.

   2.b In context of 1.a, linking multiple binaries compiled with
 different DRuntime/Phobos versions may cause significant problems.
 DRuntime is not so stable like std C lib. One may say that I am doing
 something wrong if I need to link libraries compiled with different
 DRuntimes. But this is what will happen often with D in real world if D
 start to replace C libraries (1.a). So, betterC without DRuntime /
 Phobos linking dependencies is a direction to move forward. nothrow
  nogc generic Phobos code seems to be OK.

Hmmm... well I seem to recall the C std lib in gcc has large 
interoperability issues with its own previous versions, even across 
minor releases. This has caused numerous headaches at Facebook because 
the breakages always come without warning and manifest themselves in 
obscure ways. On the Microsoft side things are even worse, because they 
virtually guarantee that a version of VS is not binary compatible with 
the previous ones (I'm not kidding; it's deliberate).

That sets a rather low baseline for us :o). Clearly we'd want to do 
better, and we probably can. But I think it would be an exaggeration to 
worry too much about such scenarios.

 2. Math funcitons are not templates -> They are not inlined -> No
 vectorization + function calls in a loop body. One day this may be
 fixed, but (1.a, 1.b).

How to the likes of Eigen do it? Do they provide their own templated 
implementation of <math.h>?

Have you investigated the much hailed link-time inlining?

 3. Math funcitons are not aliases for LDC -> LDC's  fastmath would not
 work for them. To enable  fastmath for this functions they should be
 annotated with  fastmath, which is not acceptable. If a function is an
 alias for llvm intrinsics, than  fastmath flag can be applied to a
 function, which calls it.

Not sure I udnerstand this, but it seems to me making the math functions 
templates would solve it?


Thanks,

Andrei

Oct 07 2016

Johan Engelen <j j.nl> writes:

On Friday, 7 October 2016 at 17:02:02 UTC, Andrei Alexandrescu 
wrote:
 On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote:
 
 2. Math funcitons are not templates -> They are not inlined -> 
 No vectorization + function calls in a loop body. One day this 
 may be fixed, but (1.a, 1.b).


That trivial non-template functions are not cross-module inlined 
by LDC is something I am working on (use 
`-enable-cross-module-inlining` with 1.1.0). I wouldn't use it as 
an argument for significant changes.

 How to the likes of Eigen do it? Do they provide their own 
 templated implementation of <math.h>?

 Have you investigated the much hailed link-time inlining?

Also a work-in-progress. It would at the very least require a 
special build of Phobos, something we don't do yet.

Oct 07 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Friday, 7 October 2016 at 17:02:02 UTC, Andrei Alexandrescu 
wrote:
 On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote:
 For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using
 mir.ndslice.algorithm.
 vxorps instruction can be used for fabs.
 vsqrtps instruction can be used for sqrt.
 LDC's  fastmath allows to re-associate summation elements.

 Depend on data cache level this allows to speed up iteration 8 
 times for
 single precision floating point number for AVX (16 times for 
 AVX512?).

 Yah, 8 times is large enough to justify an important change.

 Current std.math has following problems:

 1. Math funcitons are not templates -> Phobos should be linked.

 This is also the case for C++ - most math functions are linked 
 from the C standard library. How do typical linear algebra 
 libraries similar in functionality with Mir (such as Eigen) 
 deal with this situation?


1) BLAS-like API requires only sqrt and fabs. The solutions used 
in Eigen depend on compiler. For example, the following code can 
be found:

```c++
template<> EIGEN_DEVICE_FUNC inline float4  pabs<float4>(const 
float4& a) {
   return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), 
fabsf(a.w));
}
template<> EIGEN_DEVICE_FUNC inline double2 pabs<double2>(const 
double2& a) {
   return make_double2(fabs(a.x), fabs(a.y));
}
```

2) Eigen, uBLAS and other use Expression Templates [1], which are 
used to compose few multiplications, additions/subtractions and 
maybe some per element operations on matrices and vectors. In the 
same time I have never seen that a lambda can be passed. C/C++ 
high performance libraries uses macroses/templates for type 
specification, but lambdas are not used.

This makes upcoming ndslice.algorithm a unique solution, which is 
more flexible, fast, and universal comparing with C++ Expression 
Templates. It still requires some rework, and LDC based DMD 2.072 
for further optimization.

 Also, one question is how does the existence of unused 
 functions impede the working of faster functions provided 
 separately? Is it a sticky point that std.math is he exact 
 module used?

Of course a separate module or dub can be provided instead. In 
addition, std.math should be splitted into package and reworked. 
So, instead of modifying std.math we can start a new math package.

 Trying to get a good grip on the matter. Generally you'd have a 
 very easy time convincing me that templates are a better way to 
 go :o). But we need to have a good motivation. Do you have a 
 brief example illustrating one proposed template and how it is 
 better than the old ways?

Yes, the example can be found at [2].

First template is better for BetterC mode. The example contains
a C program. The last paragraph in this post contains second part 
about this example.
The first part:

```c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

float mir_alg_bar(float, float, float);

int main(int argc, char const *argv[])
{
	if(argc < 4)
	{
		puts("Usage: app number_a number_b number_с");
		return 1;
	}

	float a = atof(argv[1]);
	float b = atof(argv[2]);
	float c = atof(argv[3]);

	float d = mir_alg_bar(a, b, c);
	printf("%f\n", d);
	return 0;
}
```

This program should be linked with BetterC libray:

```sh
clang app.c alg/libmir-alg.a
```

`mir-alg` is a small betterC library, which uses a generic `mir` 
dummy (not a normal Mir for example simplicity). It can be linked 
as common C library and has  extern(C) nothrow  nogc interface.

```d
module alg_bar;

pragma(LDC_no_moduleinfo);

import ldc.attributes : fastmath;
import mir.alg;

extern(C) nothrow  nogc  fastmath:

float mir_alg_bar(float a, float b, float c) { return alg1!bar(a, 
b, c); };
```

Mir dummy contains 3 implementations `alg1`, `alg2`, `alg3`.

```d
module mir.alg;

import ldc.intrinsics : llvm_fabs;
import ldc.attributes : fastmath;

pragma(LDC_no_moduleinfo);

 fastmath
{
	auto alg1(alias f)(float a, float b, float c)
	{
		return f(a, llvm_fabs(b), c);
	}

	auto alg2(alias f)(float a, float b, float c)
	{
		return f(a, fabs(b), c);
	}

	auto alg3(alias f)(float a, float b, float c)
	{
		import std.math;
		return f(a, std.math.fabs(b), c);
	}
}

 fastmath
auto bar()(float a, float b, float c)
{
	return a * b + c;
}

float fabs(float  x)  safe pure nothrow  nogc { return 
llvm_fabs(x); }
```

`fabs` function declaration is the same as in LDC's Phobos fork.

`alg1` can be linked with C library in any optimization modes.
`alg2` and `alg3` uses function declarations and requir to link 
`libmir` dummy or `libphobos2` respectively. Making `fabs` 
template solves this problem.
LDC can inline `fabs` for `alg2` and `alg3`, but `O2` flag is 
required.

    1.a I strongly decided to move forward without DRuntime. A 
 phobos as
 source library is partially OK, but no linking dependencies 
 should be.
 BetterC mode is what required for Mir to replace OpenBLAS and 
 Eigen.
 New
 cpuid, threads and mutexes should be provided too. New cpuid 
 [1] is
 already implemented (I just need to replace module constructor 
 with
 explicit initialization function).

 Do you think you can integrate the new cpuid implementation 
 with the existing interface (most likely greatly enhancing it) 
 without breaking the existing clients?

New cpuid has low level and hight level API. The hight level API 
will be reworked to intermediate level API without the module 
constructor. This is required for BetterC mode. Current DRuntime 
cpuid API can be implemented on top of new cpuid low level 
interface. However current DRuntime API can not be used for Mir. 
The reasons are:
   1. It is not compatible with betterC mode.
   2. It performs additional weird computations for cache level 
sizes. This makes me crazy to predict what returned value means. 
If an engineer asks about Level3 cache size, Level3 cache size 
should be returned instead current hell. See also Issue 16028 [3].
   3. It can not represent complex CPU topology, which is required 
by ARM (especially by server ARM CPUs). CPU information is 
protected on ARM CPUs, but it can be predefined by a user of be 
fetched from an OS.

 Same question for threads.
 Same question for mutexes.

Current DRuntime mutexes and threads can be implemented on top of 
nothrow  nogc successors.

 My strong opinion is that a D library
 for D is a wrong direction. A numeric D library should be a 
 product for
 other languages too, like many C libraries does. One my client 
 is
 thinking to invest to nothrow  nogc async I/O for production, 
 so it may
 help to move to betterC direction too.

 Sure. A different way to frame this is to make D friendlier 
 toward linking with other languages. The way I see it, if we 
 get alternatives for cpuid, threads, and mutexes in Mir, that 
 would benefit clients interested in linear algebra. If we get 
 them in DRuntime, that would benefit clients interested in 
 linear algebra and everything else. Clearly the impact would be 
 much larger.

We do not need to have DRuntime for the future, but existing 
users. Fat runtime (except generic algorithms) is red flag for 
software developers if they need to creat something like Eigen or 
hight perfromance web server. The number of such libraries is 
always small. In the same time these libraries make weather and a 
lot of packages will be build on top of them after a while. Dub 
allows to overload dependencies versions in dub.selections.json. 
This is what is required for continuous development.

Assume you manage a set of integrated infrastructure projects, 
which use a set of third sides DUB projects, which depends on 
DRuntime. Part of this infrastructure is open source. Consultancy 
for clients is main income. And you want to add support for 
modern CPU or add new system API for another one appleOS. A 
release has time constraints, so you can not wait for new 
compiler release. Also clients wants new features and backward 
compatability for older compiler in the same time. Plus testing 
complex infrastructure with a compiler fork requires additional 
efforts and time and clients would not be happy to deploy your 
compiler fork into their infrastructure and for their clients. In 
addition, you need to update forked DRuntime API usage for the 
third side projects. This is a stalemate situation and red flag 
for business.

Cpuid, threads, mutexes, event loop, async I/O, and numeric 
software as low level DUB packages with good community support 
and small release cycle are what we really need. I am not against 
hight level API. Furthermore, bindings to other languages is an 
option to provide simple and familiar API for users. But low 
level API is required.

Users do not care about `std`/`core` or other prefix. They want 
good support. Business requires reliability and flexibility, bugs 
are not a huge problem if an architecture allows to find and fix 
them. Really huge problem is a high level object-oriented 
GC-oriented X86-oriented DRuntime, which is dependency almost 
everywhere. I would like to see `std.glas` instead of `mir.glas`, 
but it should be provided as common dub project.

   2.b In context of 1.a, linking multiple binaries compiled 
 with
 different DRuntime/Phobos versions may cause significant 
 problems.
 DRuntime is not so stable like std C lib. One may say that I 
 am doing
 something wrong if I need to link libraries compiled with 
 different
 DRuntimes. But this is what will happen often with D in real 
 world if D
 start to replace C libraries (1.a). So, betterC without 
 DRuntime /
 Phobos linking dependencies is a direction to move forward. 
 nothrow
  nogc generic Phobos code seems to be OK.

 Hmmm... well I seem to recall the C std lib in gcc has large 
 interoperability issues with its own previous versions, even 
 across minor releases. This has caused numerous headaches at 
 Facebook because the breakages always come without warning and 
 manifest themselves in obscure ways. On the Microsoft side 
 things are even worse, because they virtually guarantee that a 
 version of VS is not binary compatible with the previous ones 
 (I'm not kidding; it's deliberate).

 That sets a rather low baseline for us :o). Clearly we'd want 
 to do better, and we probably can. But I think it would be an 
 exaggeration to worry too much about such scenarios.

 2. Math funcitons are not templates -> They are not inlined -> 
 No
 vectorization + function calls in a loop body. One day this 
 may be
 fixed, but (1.a, 1.b).

 How to the likes of Eigen do it? Do they provide their own 
 templated implementation of <math.h>?

Seems like the recent LDC fixes this problem. Many thanks to our 
LDC team!
Eigen code is very weird, it uses templates and macroses in the 
same time with specialization for different compilers and clibs 
including Intel MKL.

 Have you investigated the much hailed link-time inlining?

This probably would not work for loop vectorization.

 3. Math funcitons are not aliases for LDC -> LDC's  fastmath 
 would not
 work for them. To enable  fastmath for this functions they 
 should be
 annotated with  fastmath, which is not acceptable. If a 
 function is an
 alias for llvm intrinsics, than  fastmath flag can be applied 
 to a
 function, which calls it.

 Not sure I udnerstand this, but it seems to me making the math 
 functions templates would solve it?

Yes. Templates can be replaced with aliases to the intrinsics for 
`version(LDC)`.
For the example above and for all optimization turning on only 
the `alg1`, which calls `llvm_fabs` directly, has fused 
operations. The reason is that fma composition is in the end of 
LLVM optimization pipeline. If one inlined function (bar) and 
root function (alg2) have ` fastmath` and another inlined 
function (fabs) has not ` fastmath`, the code for root will have 
fma, but inlined code for _both_ functions will not have fma. To 
perform other optimizations like vectorization LLVM needs to 
decompose fma and recompose it later.

Best regards,
Ilya

[1] https://en.wikipedia.org/wiki/Expression_templates
[2] 
https://github.com/libmir/temporary_experiments/tree/master/alias_vs_fun
[3] https://issues.dlang.org/show_bug.cgi?id=16028

Oct 09 2016

D Programming

C/C++ Programming

Other

digitalmars.D - std.math API rework