digitalmars.D - What is the better signature for this?
- Guillaume Piolat (25/25) Oct 09 2022 Consider the following "intrinsic" signature.
- Guillaume Piolat (2/8) Oct 09 2022 Erratum: it is `long[4]`, not `float[4]`
- Bruce Carneal (6/11) Oct 10 2022 Could using the static array representation type of the vector
- Guillaume Piolat (11/25) Oct 10 2022 That is solution C.
- Johan (17/20) Oct 11 2022 I think you should be able to define the unaligned type like this:
- Kagamin (13/20) Oct 11 2022 Static array works like this:
- Bruce Carneal (6/20) Oct 11 2022 Yes. Starting at line 57 you'll find examples of the above for a
Consider the following "intrinsic" signature. __m256i _mm256_loadu_si256 (const(__m256i)* mem_addr) pure trusted; // (A) The intel intrinsics signature have the problem that you must pass an implictely aligned `__m256i` (aka `long4`), however the pointer doesn't need to be aligned for an unaligned load. So, this is a bit playing with the type system. Inside the "intrinsic" implementation, nothing should use that non-existent alignment. Though in a way that hasn't blown up yet. It is tempting to fix that and just take a long* or void* instead. __m256i _mm256_loadu_si256 (const(void)* mem_addr) pure system; // (B) However, in that case, the function is not ` trusted` anymore, but becomes ` system`. Indeed, it is safe to dereference a pointer, but not index from it. What about `float[4]` then? We can get back ` trusted`. __m256i _mm256_loadu_si256 (const(float[4])* mem_addr) pure trusted; // (C) Then, we loose compatibility ith intrinsics code originally written in C++. Casting to `const(float[4])*` is even more annoying to type than casting to `const(__m256i)*`. What do you think is the better signature? I'd prefer to go A > B > C, but figured I might be missing something.
Oct 09 2022
On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat wrote:What about `float[4]` then? We can get back ` trusted`. __m256i _mm256_loadu_si256 (const(float[4])* mem_addr) pure trusted; // (C) Then, we loose compatibility ith intrinsics code originally written in C++. Casting to `const(float[4])*` is even more annoying to type than casting to `const(__m256i)*`.Erratum: it is `long[4]`, not `float[4]`
Oct 09 2022
On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat wrote:Consider the following "intrinsic" signature. ... What do you think is the better signature? I'd prefer to go A > B > C, but figured I might be missing something.Could using the static array representation type of the vector (.array) be a useful idiom here? I ask because I don't know the constraints/preferences of veteran intrinsic programmers. That idiom does work well in other SIMD formulations but may not be well suited here.
Oct 10 2022
On Monday, 10 October 2022 at 12:31:04 UTC, Bruce Carneal wrote:On Sunday, 9 October 2022 at 19:44:13 UTC, Guillaume Piolat wrote:That is solution C. It could work. The slight problem is that function that takes __m128i* use that as "any packed integer taking 128-bit" space, and it's not immediately obvious that __m128i is int4 and __m256i is long4, it's rather counterintuitive. Smenatically, it could be short8 or byte16... GCC vectors can be unaligned, and there are types for it (eg: __m128i_u), but I don't think the other compilers can do that. That would be a prime contender.Consider the following "intrinsic" signature. ... What do you think is the better signature? I'd prefer to go A > B > C, but figured I might be missing something.Could using the static array representation type of the vector (.array) be a useful idiom here? I ask because I don't know the constraints/preferences of veteran intrinsic programmers. That idiom does work well in other SIMD formulations but may not be well suited here.
Oct 10 2022
On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat wrote:GCC vectors can be unaligned, and there are types for it (eg: __m128i_u), but I don't think the other compilers can do that. That would be a prime contender.I think you should be able to define the unaligned type like this: ``` struct __m128u { align(1) __m128 data; alias data this; } ``` It works, but I am not 100% sure if this type will always behave the same (ABI) as __m128 when used as value, e.g. when passing to a function (`void fun(__m128u a, __m128u b`, passed in simd register?). But unfortunately currently it runs into this LDC bug: https://github.com/ldc-developers/ldc/issues/4236 . cheers, Johan
Oct 11 2022
On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat wrote:That is solution C. It could work.Static array works like this: ``` int Load4LE(in ref ubyte[4] b) pure { return (b[3]<<24)|(b[2]<<16)|(b[1]<<8)|b[0]; } ubyte[] data; int val=Load4LE(data[0..4]); ``` It's safe, bound checked, ctfeable, no casts.The slight problem is that function that takes __m128i* use that as "any packed integer taking 128-bit" space, and it's not immediately obvious that __m128i is int4 and __m256i is long4, it's rather counterintuitive. Smenatically, it could be short8 or byte16...There's no solution, only tradeoffs.
Oct 11 2022
On Tuesday, 11 October 2022 at 12:51:13 UTC, Kagamin wrote:On Monday, 10 October 2022 at 12:44:18 UTC, Guillaume Piolat wrote:Yes. Starting at line 57 you'll find examples of the above for a target-adaptive/generic environment: https://godbolt.org/z/qW6PYT3Yd I've not found a way to trigger those one-instruction unaligned loads from DMD but ldc and gdc are doing great.That is solution C. It could work.Static array works like this: ``` int Load4LE(in ref ubyte[4] b) pure { return (b[3]<<24)|(b[2]<<16)|(b[1]<<8)|b[0]; } ubyte[] data; int val=Load4LE(data[0..4]); ``` It's safe, bound checked, ctfeable, no casts.
Oct 11 2022