www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Targeting Vulkan and SPIR-V

reply "Joakim" <dlang joakim.fea.st> writes:
The ground-up redesign of OpenGL, now called Vulkan, has been 
announced at GDC:

http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

Both graphics shaders and the latest verson of OpenCL, which 
enables computation on the GPU, will target a new IR called 
SPIR-V:

http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl

Rather than being forced to use C-like languages like GLSL or 
OpenCL in the past, this new IR will allow writing graphics 
shaders and OpenCL code using any language, including a subset of 
C++14 stripped of exceptions, function pointers, and virtual 
functions.

This would be a good opportunity for D, if ldc or gdc could be 
made to target SPIR-V.  Ldc would seem to have a leg up, since 
SPIR was originally based on LLVM IR before diverging with SPIR-V.
Mar 06 2015
next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 6 Mar 2015 23:30, "Joakim via Digitalmars-d" <digitalmars-d puremagic.com>
wrote:
 The ground-up redesign of OpenGL, now called Vulkan, has been announced
at GDC:
 http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

 Both graphics shaders and the latest verson of OpenCL, which enables
computation on the GPU, will target a new IR called SPIR-V:

http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl
 Rather than being forced to use C-like languages like GLSL or OpenCL in
the past, this new IR will allow writing graphics shaders and OpenCL code using any language, including a subset of C++14 stripped of exceptions, function pointers, and virtual functions.
 This would be a good opportunity for D, if ldc or gdc could be made to
target SPIR-V. Ldc would seem to have a leg up, since SPIR was originally based on LLVM IR before diverging with SPIR-V. Unlike LDC, GDC doesn't need to be *made* to target anything. It's IR is high level enough that you don't need to think (nor care) about your backend target. GCC itself will need a backend to support it though. ;) Iain
Mar 06 2015
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 7 March 2015 at 02:18:22 UTC, Iain Buclaw wrote:
 Unlike LDC, GDC doesn't need to be *made* to target anything.  
 It's IR is
 high level enough that you don't need to think (nor care) about 
 your
 backend target.

 GCC itself will need a backend to support it though.  ;)

 Iain
Why is that unlike LDC ? LLVM IR is fairly high level (for a compiler IR), is there some specific blocker you are aware of ?
Mar 06 2015
next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 7 Mar 2015 04:00, "deadalnix via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 On Saturday, 7 March 2015 at 02:18:22 UTC, Iain Buclaw wrote:
 Unlike LDC, GDC doesn't need to be *made* to target anything.  It's IR is
 high level enough that you don't need to think (nor care) about your
 backend target.

 GCC itself will need a backend to support it though.  ;)

 Iain
Why is that unlike LDC ? LLVM IR is fairly high level (for a compiler
IR), is there some specific blocker you are aware of ? The necessity of the changes in PR 768 - infact just the fact that LDC needs them raises eyebrows. :) Iain
Mar 06 2015
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Saturday, 7 March 2015 at 03:57:15 UTC, deadalnix wrote:
 On Saturday, 7 March 2015 at 02:18:22 UTC, Iain Buclaw wrote:
 Unlike LDC, GDC doesn't need to be *made* to target anything.  
 It's IR is
 high level enough that you don't need to think (nor care) 
 about your
 backend target.

 GCC itself will need a backend to support it though.  ;)

 Iain
Why is that unlike LDC ? LLVM IR is fairly high level (for a compiler IR), is there some specific blocker you are aware of ?
Bitcode has target-depedent opcodes. http://llvm.org/devmtg/2011-09-16/EuroLLVM2011-MoreTargetIndependentLLVMBitcode.pdf
Mar 06 2015
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 7 March 2015 at 02:18:22 UTC, Iain Buclaw wrote:
 On 6 Mar 2015 23:30, "Joakim via Digitalmars-d" 
 <digitalmars-d puremagic.com>
 wrote:
 The ground-up redesign of OpenGL, now called Vulkan, has been 
 announced
at GDC:
 http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

 Both graphics shaders and the latest verson of OpenCL, which 
 enables
computation on the GPU, will target a new IR called SPIR-V:

 http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl
 Rather than being forced to use C-like languages like GLSL or 
 OpenCL in
the past, this new IR will allow writing graphics shaders and OpenCL code using any language, including a subset of C++14 stripped of exceptions, function pointers, and virtual functions.
 This would be a good opportunity for D, if ldc or gdc could be 
 made to
target SPIR-V. Ldc would seem to have a leg up, since SPIR was originally based on LLVM IR before diverging with SPIR-V. Unlike LDC, GDC doesn't need to be *made* to target anything. It's IR is high level enough that you don't need to think (nor care) about your backend target. GCC itself will need a backend to support it though. ;) Iain
Relevant: https://gcc.gnu.org/ml/gcc/2015-03/msg00020.html
Mar 12 2015
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 12 March 2015 at 15:57, John Colvin via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Saturday, 7 March 2015 at 02:18:22 UTC, Iain Buclaw wrote:
 On 6 Mar 2015 23:30, "Joakim via Digitalmars-d"
 <digitalmars-d puremagic.com>
 wrote:
 The ground-up redesign of OpenGL, now called Vulkan, has been announced
at GDC:
 http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

 Both graphics shaders and the latest verson of OpenCL, which enables
computation on the GPU, will target a new IR called SPIR-V:

 http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl
 Rather than being forced to use C-like languages like GLSL or OpenCL in
the past, this new IR will allow writing graphics shaders and OpenCL code using any language, including a subset of C++14 stripped of exceptions, function pointers, and virtual functions.
 This would be a good opportunity for D, if ldc or gdc could be made to
target SPIR-V. Ldc would seem to have a leg up, since SPIR was originally based on LLVM IR before diverging with SPIR-V. Unlike LDC, GDC doesn't need to be *made* to target anything. It's IR is high level enough that you don't need to think (nor care) about your backend target. GCC itself will need a backend to support it though. ;) Iain
Relevant: https://gcc.gnu.org/ml/gcc/2015-03/msg00020.html
David is an awesome guy. Would be great if he picks up the baton on this. I reckon most things would be hashed out via GCC builtins, in which someone writes a library for.
Mar 12 2015
parent reply "karl" <ultrano hotmail.com> writes:
Spir-V may be producable from HLL tools, but that doesn't mean 
it's perfectly ok to use any HLL. Capability for HLL-to-spir is 
exposed mainly for syntax sugar and shallow precompile 
optimisations, but mostly to avoid vendor-specific HLL bugs that 
have plagued GLSL and HLSL (those billion d3dx_1503.dll on your 
system are bugfixes). Plus, to give the community access to one 
or several opensource HLL compilers that they can find issues 
with and submit for everyone to benefit. So, it's mostly to get a 
flawless opensource GLSL compiler. Dlang's strengths are simply 
not applicable directly. Though with a bit of work can actually 
be applied completely. (I've done them in/with our GLSL/backend 
compilers)

- malloc. SpirV and such don't have malloc. Fix: Preallocate a 
big chunk of memory, and implement a massively-parallel allocator 
yourself (it should handle ~2000 requests to allocate per cycle, 
that's the gist of it). "atomic_add" on a memory location will 
help. If you don't want to preallocate too much, have a cpu 
thread poll while a gpu thread stalls (it should stall itself and 
60000 other threads) until the cpu allocates a new chunk for the 
heap and provides a base address. (hope the cpu thread responds 
quickly enough, or your gpu tasks will be mercilessly killed).

- function-pointers, largely a no-no. Extensions might give you 
that capability, but implement as big switch-case tables. With 
the extensions, you will need to guarantee an arbitrary number 
(64) of threads all happened to call the same actual function.

- stack. I don't know how to break it to you, there's no stack. 
Only around 256 dwords, that 8-200 threads get to allocate from. 
Your notion of a stack gets statically flattenized by the 
compilers. So, your whole program has e.g. 4 dwords to play 
around and have 64 things hide latency, or 64 dwords but only 4 
threads to hide latency - and is 2-4x slower for rudimentary 
things (and utterly fail at latency hiding, becoming 50 times 
slower with memory-accesses), or 1 thread with 256 dwords, which 
is 8-16 times slower at rudimentary stuff and 50+ times slower if 
you access memory even if cached. Add a manually-managed 
programmable memory-stack, and your performance goes poof.

- exceptions. A combined issue of the things above.

Combine the limitations of function-pointers and stack, and I 
hope you get the point. Or well, how pointless the exercise to 
get Dlang as we know and love it on a gpu. A single-threaded 
javascript app on a cpu will beat it at performance on everything 
that's not trivial.
Mar 13 2015
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 13 March 2015 at 18:44:18 UTC, karl wrote:
 Spir-V may be producable from HLL tools, but that doesn't mean 
 it's perfectly ok to use any HLL. Capability for HLL-to-spir is 
 exposed mainly for syntax sugar and shallow precompile 
 optimisations, but mostly to avoid vendor-specific HLL bugs 
 that have plagued GLSL and HLSL (those billion d3dx_1503.dll on 
 your system are bugfixes). Plus, to give the community access 
 to one or several opensource HLL compilers that they can find 
 issues with and submit for everyone to benefit. So, it's mostly 
 to get a flawless opensource GLSL compiler. Dlang's strengths 
 are simply not applicable directly. Though with a bit of work 
 can actually be applied completely. (I've done them in/with our 
 GLSL/backend compilers)

 - malloc. SpirV and such don't have malloc. Fix: Preallocate a 
 big chunk of memory, and implement a massively-parallel 
 allocator yourself (it should handle ~2000 requests to allocate 
 per cycle, that's the gist of it). "atomic_add" on a memory 
 location will help. If you don't want to preallocate too much, 
 have a cpu thread poll while a gpu thread stalls (it should 
 stall itself and 60000 other threads) until the cpu allocates a 
 new chunk for the heap and provides a base address. (hope the 
 cpu thread responds quickly enough, or your gpu tasks will be 
 mercilessly killed).

 - function-pointers, largely a no-no. Extensions might give you 
 that capability, but implement as big switch-case tables. With 
 the extensions, you will need to guarantee an arbitrary number 
 (64) of threads all happened to call the same actual function.

 - stack. I don't know how to break it to you, there's no stack. 
 Only around 256 dwords, that 8-200 threads get to allocate 
 from. Your notion of a stack gets statically flattenized by the 
 compilers. So, your whole program has e.g. 4 dwords to play 
 around and have 64 things hide latency, or 64 dwords but only 4 
 threads to hide latency - and is 2-4x slower for rudimentary 
 things (and utterly fail at latency hiding, becoming 50 times 
 slower with memory-accesses), or 1 thread with 256 dwords, 
 which is 8-16 times slower at rudimentary stuff and 50+ times 
 slower if you access memory even if cached. Add a 
 manually-managed programmable memory-stack, and your 
 performance goes poof.

 - exceptions. A combined issue of the things above.

 Combine the limitations of function-pointers and stack, and I 
 hope you get the point. Or well, how pointless the exercise to 
 get Dlang as we know and love it on a gpu. A single-threaded 
 javascript app on a cpu will beat it at performance on 
 everything that's not trivial.
The reason to use D for kernels / shaders would be for its metaprogramming, code-generation abilities and type-system (slices and structs in particular). Of course you wouldn't be allocating heap memory, using function pointers or exceptions. There's a still a lot that D has to offer without those. I regularly write thousands of lines of D in that subset. P.S. D is in pretty much the same boat as any other C-based language w.r.t. stack space. You have to be careful with the stack in OpenCL C, you would have to be careful with the stack in SPIR-D.
Mar 14 2015
prev sibling next sibling parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 2015-03-07 at 02:18 +0000, Iain Buclaw via Digitalmars-d wrote:
[=E2=80=A6]
=20
 Unlike LDC, GDC doesn't need to be *made* to target anything.  It's IR is
 high level enough that you don't need to think (nor care) about your
 backend target.
=20
 GCC itself will need a backend to support it though.  ;)
All Apple's effort will go into Clang, and I suspect they are one of the driving forces behind Vulkan as they were the initiators and driving force behind OpenCL. Thus LDC should be able to get all the work about as "for free" as it gets. The question is whether NVIDIA and Intel will put effort into GCC. If they do then GDC get this about as "for free" as it gets. No-one other than the D community will do this for DMD. It is not clear how quickly Vulkan compliant hardware will appear, a lot faster than the compilers most likely, but they will get hamstrung with OpenGL and OpenCL compliance =E2=80=93 which may end up very annoying, albe= it necessary. Also, of course, there is the huge problem of moving the AAA games world over to all this. So I suspect we have a few weeks of time (*) to mull over this before it is all in everyone face. But I can see this being big because it is a new thing that games and hardware manufacturers can use for marketing. There is nothing so useful to marketing as something that is genuinely new (**). (*) well tens of weeks probably. (**) OK so Vulkan is only new-ish, but the marketers won't care. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 06 2015
next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Russel Winder via Digitalmars-d"  wrote in message 
news:mailman.7407.1425714258.9932.digitalmars-d puremagic.com...

 No-one other than the D community will do this for DMD.
No-one anywhere will do this for DMD.
Mar 07 2015
parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 2015-03-07 at 19:16 +1100, Daniel Murphy via Digitalmars-d
wrote:
 "Russel Winder via Digitalmars-d"  wrote in message=20
 news:mailman.7407.1425714258.9932.digitalmars-d puremagic.com...
=20
 No-one other than the D community will do this for DMD.
=20 No-one anywhere will do this for DMD.=20
Which would mean that anyone interested in CPU/GPU computing will have to eschew DMD in favour of LDC and GDC. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 07 2015
parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Russel Winder via Digitalmars-d"  wrote in message 
news:mailman.7408.1425716535.9932.digitalmars-d puremagic.com...

 Which would mean that anyone interested in CPU/GPU computing will have
 to eschew DMD in favour of LDC and GDC.
Yes. Or for any of the other dozens of platforms that dmd will never support.
Mar 07 2015
prev sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 7 March 2015 at 07:44:18 UTC, Russel Winder wrote:
 It is not clear how quickly Vulkan compliant hardware will 
 appear, a lot
It already exits. Even PowerVR has an (experimental) implementation already...
 Also, of course, there is the huge problem of moving the AAA 
 games world
 over to all this.
Not really, since it is a lot like Mantle... Vulkan isn't particularly innovative, it is a trip back in time where you have to target the hardware and not the API. Manufacturers like it because there is no point in having high powered GPUs if the applications are CPU bound and Vulkan get's around that. But not having an abstraction over the hardware means that devs will suffer, do manual memory management on the GPU, write their own mipmap format routines etc. And you bet the vendors will add all kinds of extensions to it to stay competitive and we will end up with a mess.
Mar 07 2015
parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
On Saturday, 7 March 2015 at 08:41:26 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 7 March 2015 at 07:44:18 UTC, Russel Winder wrote:
 It is not clear how quickly Vulkan compliant hardware will 
 appear, a lot
It already exits. Even PowerVR has an (experimental) implementation already...
 Also, of course, there is the huge problem of moving the AAA 
 games world
 over to all this.
Not really, since it is a lot like Mantle... Vulkan isn't particularly innovative, it is a trip back in time where you have to target the hardware and not the API. Manufacturers like it because there is no point in having high powered GPUs if the applications are CPU bound and Vulkan get's around that. But not having an abstraction over the hardware means that devs will suffer, do manual memory management on the GPU, write their own mipmap format routines etc. And you bet the vendors will add all kinds of extensions to it to stay competitive and we will end up with a mess.
I saw a comment in a random forum entry that I cannot recall, that stated an Hello World (triangle) is around 800 lines of C code. Of course, this doesn't matter when using engines, which every sane developer should do anyway. Any applications coded straight to graphics APIs ends up being a use case specific mini engine. -- Paulo
Mar 07 2015
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 7 March 2015 at 09:05:03 UTC, Paulo Pinto wrote:
 Of course, this doesn't matter when using engines, which every 
 sane developer should do anyway.

 Any applications coded straight to graphics APIs ends up being 
 a use case specific mini engine.
We'll see, but the downside to having a slim driver is that you risk ending up writing the application engine N times for each GPU rather than once. With a buffering high level driver you get some optimization for free, done by the manufacturer using inside knowledge.
Mar 07 2015
prev sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Friday, 6 March 2015 at 23:25:40 UTC, Joakim wrote:
 The ground-up redesign of OpenGL, now called Vulkan, has been 
 announced at GDC:

 http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

 Both graphics shaders and the latest verson of OpenCL, which 
 enables computation on the GPU, will target a new IR called 
 SPIR-V:

 http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl

 Rather than being forced to use C-like languages like GLSL or 
 OpenCL in the past, this new IR will allow writing graphics 
 shaders and OpenCL code using any language, including a subset 
 of C++14 stripped of exceptions, function pointers, and virtual 
 functions.

 This would be a good opportunity for D, if ldc or gdc could be 
 made to target SPIR-V.  Ldc would seem to have a leg up, since 
 SPIR was originally based on LLVM IR before diverging with 
 SPIR-V.
Sure, you might target SPIR-V with a C-like language, but how will you generate the IR corresponding to: - texture accesses - local memory vs global memory vs mapped pinned host memory. Looks like you need annotations for your pointers. - sub-blocks operations made core in OpenCL 2.x All things that OpenCL C or GLSL are aware of. Having a GPU backend doesn't make general code fit for high level of parallelism. GPUs are not designed to work-around the poor efficiency of the programs they run.
Mar 07 2015
parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Saturday, 7 March 2015 at 11:35:59 UTC, ponce wrote:
 On Friday, 6 March 2015 at 23:25:40 UTC, Joakim wrote:
 The ground-up redesign of OpenGL, now called Vulkan, has been 
 announced at GDC:

 http://www.phoronix.com/scan.php?page=article&item=khronos-vulcan-spirv

 Both graphics shaders and the latest verson of OpenCL, which 
 enables computation on the GPU, will target a new IR called 
 SPIR-V:

 http://www.anandtech.com/show/9039/khronos-announces-opencl-21-c-comes-to-opencl

 Rather than being forced to use C-like languages like GLSL or 
 OpenCL in the past, this new IR will allow writing graphics 
 shaders and OpenCL code using any language, including a subset 
 of C++14 stripped of exceptions, function pointers, and 
 virtual functions.

 This would be a good opportunity for D, if ldc or gdc could be 
 made to target SPIR-V.  Ldc would seem to have a leg up, since 
 SPIR was originally based on LLVM IR before diverging with 
 SPIR-V.
Sure, you might target SPIR-V with a C-like language, but how will you generate the IR corresponding to: - texture accesses - local memory vs global memory vs mapped pinned host memory. Looks like you need annotations for your pointers. - sub-blocks operations made core in OpenCL 2.x All things that OpenCL C or GLSL are aware of. Having a GPU backend doesn't make general code fit for high level of parallelism. GPUs are not designed to work-around the poor efficiency of the programs they run.
The same way the Haskell, Java, Python and .NET implementations targeting CUDA PTX and HSAIL do. -- Paulo
Mar 07 2015