digitalmars.D - Is sorted using SIMD instructions

Per =?UTF-8?B?Tm9yZGzDtnc=?= (14/14) Apr 12 2018 Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the

Stefan Koch (7/21) Apr 12 2018 I highly doubt it.

Marco Leise (25/53) Apr 12 2018 I had it triggered in gcc one day, when I changed from 3 ubyte

rikki cattermole (3/22) Apr 12 2018 Just checked, change int to float and it uses ucomiss (SIMD, ldc -O3).

David Bennett (14/17) Apr 12 2018 Yep logic is much easier with floats and is available on SSE.

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the 
seemly simple function

bool is_sorted(const int32_t* input, size_t n) {
     if (n < 2) {
         return true;
     }

     for (size_t i=0; i < n - 1; i++) {
         if (input[i] > input[i + 1])
             return false;
     }

     return true;
}

Can D's compilers do better?

See http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html

Apr 12 2018

Stefan Koch <uplink.coder googlemail.com> writes:

On Thursday, 12 April 2018 at 07:25:27 UTC, Per Nordlöw wrote:
 Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the 
 seemly simple function

 bool is_sorted(const int32_t* input, size_t n) {
     if (n < 2) {
         return true;
     }

     for (size_t i=0; i < n - 1; i++) {
         if (input[i] > input[i + 1])
             return false;
     }

     return true;
 }

 Can D's compilers do better?

 See http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html

I highly doubt it.
this cannot be auto-vectorized.
There is no way to proof the loop dimensions.

I'd be a different story if you unrolled the loop by hand.
I guess then you'd see gcc and clang putting some simd in there 
maybe.

Apr 12 2018

Marco Leise <Marco.Leise gmx.de> writes:

Am Thu, 12 Apr 2018 09:37:58 +0000
schrieb Stefan Koch <uplink.coder googlemail.com>:

 On Thursday, 12 April 2018 at 07:25:27 UTC, Per Nordl=C3=B6w wrote:
 Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the=20
 seemly simple function

 bool is_sorted(const int32_t* input, size_t n) {
     if (n < 2) {
         return true;
     }

     for (size_t i=3D0; i < n - 1; i++) {
         if (input[i] > input[i + 1])
             return false;
     }

     return true;
 }

 Can D's compilers do better?

 See http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html =20

=20
 I highly doubt it.
 this cannot be auto-vectorized.
 There is no way to proof the loop dimensions.
=20
 I'd be a different story if you unrolled the loop by hand.
 I guess then you'd see gcc and clang putting some simd in there=20
 maybe.

I had it triggered in gcc one day, when I changed from 3 ubyte
color components to 4 in a struct and I do believe in both
cases it was padded to 4 bytes. One should probably read a bit
into what the respective backends can detect and modify code
to match that. D compilers are ultimately limited by what the
backends offer in terms of auto-vectorizing non-SIMD code.

It is easy to vectorize a loop without pointer aliasing that
performs C=3DA+B, but your code above requires complex logic.
There is no SIMD instruction in the SSE series that would
check if an array is sorted as far as I know. Instead you'd
load one vector starting at index 0 and another starting at
index 1 of the same "input" array and compare them
piece-wise. That's an difficult problem for the compiler:
* one of the vectors is always going to be an unaligned load,
  which may be degrade performance
* most of the data is loaded twice
You could alternatively load only one vector and shuffle that
to compare N-1 items at a time, but at that point it feels
like asking the compiler to automatically create source code
for your problem. As if one just says: "Hey, I need to print
all primes up to 100." And the compiler understands what to do.

--=20
Marco

Apr 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/04/2018 7:25 PM, Per Nordlöw wrote:
 Neither GCC, LLVM nor ICC can auto-vectorize (and use SIMD) the seemly 
 simple function
 
 bool is_sorted(const int32_t* input, size_t n) {
      if (n < 2) {
          return true;
      }
 
      for (size_t i=0; i < n - 1; i++) {
          if (input[i] > input[i + 1])
              return false;
      }
 
      return true;
 }
 
 Can D's compilers do better?
 
 See http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html

Just checked, change int to float and it uses ucomiss (SIMD, ldc -O3).

There may not be an instruction for int's.

Apr 12 2018

David Bennett <davidbennett bravevision.com> writes:

On Thursday, 12 April 2018 at 10:14:55 UTC, rikki cattermole 
wrote:
 Just checked, change int to float and it uses ucomiss (SIMD, 
 ldc -O3).

 There may not be an instruction for int's.

Yep logic is much easier with floats and is available on SSE.

It's hard to do logic on ints until SSE4.1 as theres no easy way 
to jump.
(With SSE I think it would require putting the result on the 
stack and checking it for zeros using 64bit regs)

Even with SSE4.1 it requires at lest one extra instruction as 
only PTEST sets flags.

That said, is it possible to get the result of PTEST as a bool 
using core.simd? I was wondering if it would be possible write 
the example using that.

Normally I just write simple functions like this in yasm and link 
them in extern(C).

Apr 12 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Is sorted using SIMD instructions