digitalmars.D - Has anyone used D with Nvidia's Cuda?

Walter Bright (1/1) Apr 03 2015 http://www.nvidia.com/object/cuda_home_new.html

Rikki Cattermole (5/6) Apr 03 2015 Honestly, I don't think anyone has even tried to create bindings. Let

weaselcat (2/8) Apr 03 2015 Derelict offers cuda bindings.

Walter Bright (7/17) Apr 04 2015 Ahh, I see:

weaselcat (3/6) Apr 04 2015 Top 3 results for me for `dlang derelict` are all his github

Walter Bright (15/20) Apr 04 2015 `D programming language derelict`

weaselcat (11/34) Apr 04 2015 AFAIK almost all derelict repos are maintained almost solely by

Walter Bright (11/12) Apr 04 2015 Exactly!

Rikki Cattermole (10/23) Apr 04 2015 On that idea, just a thought. But DMD-FE is using the visitor pattern
weaselcat (8/22) Apr 04 2015 I really think you're barking up the wrong tree here - cuda is a

Walter Bright (8/13) Apr 04 2015 That's right. On the other hand,

ponce (3/15) Apr 04 2015 If NVIDIA wants full support for NPP, cuBLAS, and the myriad of

ponce (2/16) Apr 04 2015 A good OpenCL wrapper library like cl4d would do wonders.

Dmitri Makarov (6/7) Apr 04 2015 I can contribute at least three examples running code on a GPU
ponce (16/20) Apr 04 2015 The problem with example is that someone have to maintain them.

Walter Bright (12/29) Apr 04 2015 Oh, I understand that keeping things up to date is always a problem with...

ponce (5/41) Apr 04 2015 They doesn't seem to have deprecated any function indeed. That

Walter Bright (3/5) Apr 04 2015 Thanks. I think I'll give it a try and see what it takes to get a simple...

Dmitri Makarov (5/6) Apr 03 2015 No, but I'm building an embedded dsl that will allow to generate

Vlad Levenfeld (2/8) Apr 04 2015 How would it be used? At the client level, I mean.

Dmitri Makarov via Digitalmars-d (36/48) Apr 04 2015 The programmer describes the computations to be done on a device,

John Colvin (2/3) Apr 04 2015 http://code.dlang.org/packages/derelict-cuda

Walter Bright (2/5) Apr 04 2015 I know you have interest in CUDA, have you gotten any D code to work wit...

John Colvin (4/11) Apr 04 2015 I use OpenCL as I don't want to be locked to one vendor's

Walter Bright (2/13) Apr 04 2015 A reasonable viewpoint.

ponce (7/8) Apr 04 2015 I wrote the Driver and Runtime API bindings for

Walter Bright (6/18) Apr 04 2015 It's slower:

Dmitri Makarov (7/8) Apr 04 2015 However, it's an open standard, will improve, and will be
ponce (21/48) Apr 04 2015 Not far. I'm currently trying to bootstrap a business solo

ponce (7/7) Apr 04 2015 Also consider costs: NVIDIA will artificially limit the speed of

Walter Bright (10/15) Apr 04 2015 The only thing I can add to that is the people who really want performan...

Walter Bright <newshound2 digitalmars.com> writes:

http://www.nvidia.com/object/cuda_home_new.html

Apr 03 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

Honestly, I don't think anyone has even tried to create bindings. Let 
alone use it.

Although I think there are OpenCL bindings floating around which has a 
similar purpose.

Apr 03 2015

"weaselcat" <weaselcat gmail.com> writes:

On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:
 On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 Honestly, I don't think anyone has even tried to create 
 bindings. Let alone use it.

 Although I think there are OpenCL bindings floating around 
 which has a similar purpose.

Derelict offers cuda bindings.

Apr 03 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/3/2015 11:12 PM, weaselcat wrote:
 On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:
 On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 Honestly, I don't think anyone has even tried to create bindings. Let alone
 use it.

 Although I think there are OpenCL bindings floating around which has a similar
 purpose.

 Derelict offers cuda bindings.

Ahh, I see:

   https://github.com/DerelictOrg/DerelictCUDA

I don't see it here:

   http://svn.dsource.org/projects/derelict/branches/Derelict2/doc/index.html

If the latter is obsolete, it should perhaps be updated to point to the newer 
one. The svn one is the first google hit for Derelict.

Apr 04 2015

"weaselcat" <weaselcat gmail.com> writes:

On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to 
 point to the newer one. The svn one is the first google hit for 
 Derelict.

Top 3 results for me for `dlang derelict` are all his github 
page/projects, did you just google `derelict` or?

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 2:34 AM, weaselcat wrote:
 On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to point to the newer
 one. The svn one is the first google hit for Derelict.

 Top 3 results for me for `dlang derelict` are all his github page/projects, did
 you just google `derelict` or?

`D programming language derelict`

In any case, the dsource.org page should be fixed or removed.

The github page also has problems,

* the "Using Derelict" link is dead

* "DerelictUtil for Users" has zero information about using D with CUDA, and 
seems completely irrelevant

* no link for "DerelictUtil Wiki"

* the example shown is useless

* there are no examples of actually running code on a GPU

It looks like nothing more than a couple header files (which is a great start, 
but that's all).

In contrast, there's a package to use CUDA with Go:

   https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/

which is still pretty thin, but much further along.

Apr 04 2015

"weaselcat" <weaselcat gmail.com> writes:

On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 On 4/4/2015 2:34 AM, weaselcat wrote:
 On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to 
 point to the newer
 one. The svn one is the first google hit for Derelict.

 Top 3 results for me for `dlang derelict` are all his github 
 page/projects, did
 you just google `derelict` or?

 `D programming language derelict`

 In any case, the dsource.org page should be fixed or removed.

 The github page also has problems,

 * the "Using Derelict" link is dead

 * "DerelictUtil for Users" has zero information about using D 
 with CUDA, and seems completely irrelevant

 * no link for "DerelictUtil Wiki"

 * the example shown is useless

 * there are no examples of actually running code on a GPU

PR?

 It looks like nothing more than a couple header files (which is 
 a great start, but that's all).

 In contrast, there's a package to use CUDA with Go:

   https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/

 which is still pretty thin, but much further along.

AFAIK almost all derelict repos are maintained almost solely by 
aldacron, and he maintains a _lot_ of them. 
https://github.com/DerelictOrg

p.s., googling "golang cuda" comes up with almost nothing useful 
at the top - 4-5 links to the FOSDEM video and some pdfs. I'm not 
being biased, I seriously can't figure out anything beyond the 
fosdem video for cuda with go.

First result for "dlang cuda" for me is the dub repo for derelict 
cuda.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?

Exactly!

The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants 
to appeal to high performance computing programmers, we need to have a workable 
way to program the GPU.

At this point, it doesn't have to be slick or great, but it has to be doable.

Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard 
to work with CUDA given the Derelict D headers, and will give us an answer to D 
users who want to leverage the GPU.

It would also be dazz if someone were to look at std.algorithm and see what 
could be accelerated with GPU code.

Apr 04 2015

Rikki Cattermole <alphaglosined gmail.com> writes:

On 4/04/2015 11:26 p.m., Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?

 Exactly!

 The idea is that GPUs can greatly accelerate code (2x to 1000x), and if
 D wants to appeal to high performance computing programmers, we need to
 have a workable way to program the GPU.

 At this point, it doesn't have to be slick or great, but it has to be
 doable.

 Nvidia appears to have put a lot of effort into CUDA, and it shouldn't
 be hard to work with CUDA given the Derelict D headers, and will give us
 an answer to D users who want to leverage the GPU.

 It would also be dazz if someone were to look at std.algorithm and see
 what could be accelerated with GPU code.


On that idea, just a thought. But DMD-FE is using the visitor pattern 
quite a lot to allow the backend to hook into it easily.

What if we exposed a set block of code, to CTFE that acted like a 
backend, but only transformed for the real backend.

In other words, allow CTFE to extend the compiler a little like the 
backend does. To add language features such as transform x code into 
OpenCL code and have it wrapped nicely into D code.

Theoretically if this was done, we could move the iasm into library.
Because of CTFE, surely this wouldn't add much code to the front end?

Apr 04 2015

"weaselcat" <weaselcat gmail.com> writes:

On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?

 Exactly!

 The idea is that GPUs can greatly accelerate code (2x to 
 1000x), and if D wants to appeal to high performance computing 
 programmers, we need to have a workable way to program the GPU.

 At this point, it doesn't have to be slick or great, but it has 
 to be doable.

 Nvidia appears to have put a lot of effort into CUDA, and it 
 shouldn't be hard to work with CUDA given the Derelict D 
 headers, and will give us an answer to D users who want to 
 leverage the GPU.

 It would also be dazz if someone were to look at std.algorithm 
 and see what could be accelerated with GPU code.

I really think you're barking up the wrong tree here - cuda is a 
closed proprietary solution only implemented by one vendor 
effectively cutting off anyone that doesn't work with nvidia 
hardware.

also, the std.algorithm thing sounds a lot like the C++ library 
Bolt/Thrust
https://github.com/HSA-Libraries/Bolt

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 3:45 AM, weaselcat wrote:
 I really think you're barking up the wrong tree here - cuda is a closed
 proprietary solution only implemented by one vendor effectively cutting off
 anyone that doesn't work with nvidia hardware.

That's right. On the other hand,

1. Nvidia hardware is pervasive, and CUDA has been around for many years. I 
doubt it is going away anytime soon.

2. It is little effort on our part to support it.

3. We'd have some co-marketing opportunities with Nvidia if we support it.

4. Supporting CUDA doesn't impede supporting OpenCL.


 also, the std.algorithm thing sounds a lot like the C++ library Bolt/Thrust
 https://github.com/HSA-Libraries/Bolt

Yup.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 17:21:45 UTC, Walter Bright wrote:
 On 4/4/2015 3:45 AM, weaselcat wrote:
 I really think you're barking up the wrong tree here - cuda is 
 a closed
 proprietary solution only implemented by one vendor 
 effectively cutting off
 anyone that doesn't work with nvidia hardware.

 That's right. On the other hand,

 1. Nvidia hardware is pervasive, and CUDA has been around for 
 many years. I doubt it is going away anytime soon.

 2. It is little effort on our part to support it.

 3. We'd have some co-marketing opportunities with Nvidia if we 
 support it.

If NVIDIA wants full support for NPP, cuBLAS, and the myriad of 
libraries depending on the Runtime API then it's more effort.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?

 Exactly!

 The idea is that GPUs can greatly accelerate code (2x to 
 1000x), and if D wants to appeal to high performance computing 
 programmers, we need to have a workable way to program the GPU.

 At this point, it doesn't have to be slick or great, but it has 
 to be doable.

 Nvidia appears to have put a lot of effort into CUDA, and it 
 shouldn't be hard to work with CUDA given the Derelict D 
 headers, and will give us an answer to D users who want to 
 leverage the GPU.

 It would also be dazz if someone were to look at std.algorithm 
 and see what could be accelerated with GPU code.


A good OpenCL wrapper library like cl4d would do wonders.

Apr 04 2015

"Dmitri Makarov" <dmakarv gmail.com> writes:

On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * there are no examples of actually running code on a GPU

I can contribute at least three examples running code on a GPU 
(the domains are neural networks, bioinformatics, and grid 
traversal -- these are my ports to D/OpenCL of Rodinia benchmarks 
http://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications
with_Accelerators), 
but these examples using OpenCL, not in CUDA.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless

The problem with example is that someone have to maintain them.
For DerelictBgfx we removed all translated examples.
So the Derelict policy is to remove example to avoid them 
becoming out of date.

For the record Aldacron maintains approx. 22 Derelict bindings 
and I maintain 7 of them, in our free time. Keeping up with all 
library change is impossible if everyone excpect everything to be 
up-to-date and with examples.


 * there are no examples of actually running code on a GPU

Because it's similar to using the Driver/Runtime API in C++, you 
have to read CUDA documentation.


 It looks like nothing more than a couple header files (which is 
 a great start, but that's all).

Maybe we can delete them so that it's not too embarrassing? 
Serious proposal.

In my opinion, the couple header files provide all you need to 
use CUDA, if you know what you are doing. If you don't, don't do 
GPGPU.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 4:29 AM, ponce wrote:
 On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless

 The problem with example is that someone have to maintain them.
 For DerelictBgfx we removed all translated examples.
 So the Derelict policy is to remove example to avoid them becoming out of date.

 For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7
 of them, in our free time. Keeping up with all library change is impossible if
 everyone excpect everything to be up-to-date and with examples.

Oh, I understand that keeping things up to date is always a problem with every 
third party tool. On the plus side, however, Nvidia seems very good with 
backwards compatiblity, meaning that when the D bindings get out of date, they 
will still work. They just won't work with new features.


 * there are no examples of actually running code on a GPU

 Because it's similar to using the Driver/Runtime API in C++, you have to read
 CUDA documentation.

Of course. But having a couple examples to show it really does work will go a 
long way. I am not suggesting making any attempt to duplicate Nvidia's 
documentation in D.


 It looks like nothing more than a couple header files (which is a great start,
 but that's all).

 Maybe we can delete them so that it's not too embarrassing? Serious proposal.

If that's the state of things, I'd be happy to take them over and put them in 
Deimos.


 In my opinion, the couple header files provide all you need to use CUDA, if you
 know what you are doing. If you don't, don't do GPGPU.

That does work for someone who really wants to use CUDA, but not much for 
someone who is evaluating using D and wants to use the GPU with CUDA.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 17:16:19 UTC, Walter Bright wrote:
 On 4/4/2015 4:29 AM, ponce wrote:
 On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless

 The problem with example is that someone have to maintain them.
 For DerelictBgfx we removed all translated examples.
 So the Derelict policy is to remove example to avoid them 
 becoming out of date.

 For the record Aldacron maintains approx. 22 Derelict bindings 
 and I maintain 7
 of them, in our free time. Keeping up with all library change 
 is impossible if
 everyone excpect everything to be up-to-date and with examples.

 Oh, I understand that keeping things up to date is always a 
 problem with every third party tool. On the plus side, however, 
 Nvidia seems very good with backwards compatiblity, meaning 
 that when the D bindings get out of date, they will still work. 
 They just won't work with new features.

They doesn't seem to have deprecated any function indeed. That 
could make examples practical.


 * there are no examples of actually running code on a GPU

 Because it's similar to using the Driver/Runtime API in C++, 
 you have to read
 CUDA documentation.

 Of course. But having a couple examples to show it really does 
 work will go a long way. I am not suggesting making any attempt 
 to duplicate Nvidia's documentation in D.


 It looks like nothing more than a couple header files (which 
 is a great start,
 but that's all).

 Maybe we can delete them so that it's not too embarrassing? 
 Serious proposal.

 If that's the state of things, I'd be happy to take them over 
 and put them in Deimos.

Sure, the licensing of Derelict probably allows it, and deimos 
and Derelict are complimentary anyway.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 2:59 PM, ponce wrote:
 Sure, the licensing of Derelict probably allows it, and deimos and Derelict are
 complimentary anyway.

Thanks. I think I'll give it a try and see what it takes to get a simple
example 
working.

Apr 04 2015

"Dmitri Makarov" <dmakarv gmail.com> writes:

On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

No, but I'm building an embedded dsl that will allow to generate
opencl kernels and supporting boilerplate opencl api calls at
compile-time. it's called clop (openCL OPtimizer). It uses
derelict.opencl bindings.

Apr 03 2015

"Vlad Levenfeld" <vlevenfeld gmail.com> writes:

On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 No, but I'm building an embedded dsl that will allow to generate
 opencl kernels and supporting boilerplate opencl api calls at
 compile-time. it's called clop (openCL OPtimizer). It uses
 derelict.opencl bindings.

How would it be used? At the client level, I mean.

Apr 04 2015

Dmitri Makarov via Digitalmars-d <digitalmars-d puremagic.com> writes:

The programmer describes the computations to be done on a device,
invokes the clop compiler via mixin expression passing the string
describing the computations in an OpenCL-like syntax. The compiler
returns D code that includes the generated OpenCL kernel and all the
boiler plate code. The computations can refer to variables declared in
the host application, CLOP will generate the necessary CL buffers and
kernel arguments. Here's an example:

// use CLOP DSL to generate OpenCL kernel and API calls.
mixin( compile(
q{
int max3( int a, int b, int c )
{
int k = a > b ? a : b;
return k > c ? k : c;
}
Antidiagonal NDRange( c : 1 .. cols, r : 1 .. rows ) {
F[c, r] = max3( F[c - 1, r - 1] + S[c + cols * r], F[c - 1, r] -
penalty, F[c, r - 1] - penalty );
} apply( rectangular_blocking( 8, 8 ) )
} ) );

This implements Needleman-Wunsch algorithm in CLOP. It says that the
computation to be done over 2D index space 1..cols by 1..rows. It
requires anti-diagonal synchronization pattern, meaning that the
elements on every anti-diagonal of the index space can be processed in
parallel, but there is global synchronization point between the
diagonals. Also the user requests to optimize this using rectangular
blocking. The variables: cols, rows, S, F, penalty are normal D
variables declared and defined in the application that contains the
above mixin statement.

 You can look at my github repository for more examples
https://github.com/dmakarov/clop but the project is in very early
stage and not yet usable.

Regards,

Dmitri


On Sat, Apr 4, 2015 at 9:00 AM, Vlad Levenfeld via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html


 No, but I'm building an embedded dsl that will allow to generate
 opencl kernels and supporting boilerplate opencl api calls at
 compile-time. it's called clop (openCL OPtimizer). It uses
 derelict.opencl bindings.


 How would it be used? At the client level, I mean.

Apr 04 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

http://code.dlang.org/packages/derelict-cuda

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 http://code.dlang.org/packages/derelict-cuda

I know you have interest in CUDA, have you gotten any D code to work with it?

Apr 04 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:
 On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 http://code.dlang.org/packages/derelict-cuda

 I know you have interest in CUDA, have you gotten any D code to 
 work with it?

I use OpenCL as I don't want to be locked to one vendor's
hardware. It's hard enough to write portable, efficient GPGPU
code without swapping frameworks as well.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 3:58 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:
 On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 http://code.dlang.org/packages/derelict-cuda

 I know you have interest in CUDA, have you gotten any D code to work with it?

 I use OpenCL as I don't want to be locked to one vendor's
 hardware. It's hard enough to write portable, efficient GPGPU
 code without swapping frameworks as well.

A reasonable viewpoint.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

I wrote the Driver and Runtime API bindings for 
https://github.com/DerelictOrg/DerelictCUDA

And the one thing I've done with them is loading the functions, 
create a context and destroy it. So yeah I think using CUDA with 
D is possible.
OpenCL 2.x is much more interesting though.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 2:35 AM, ponce wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 I wrote the Driver and Runtime API bindings for
 https://github.com/DerelictOrg/DerelictCUDA

 And the one thing I've done with them is loading the functions, create a
 context and destroy it. So yeah I think using CUDA with D is possible.

Thank you. How far are you interested in going with it?


 OpenCL 2.x is much more interesting though.

It's slower:

 Furthermore, in studies of straightforward translation of CUDA programs to
 OpenCL C programs, CUDA has been found to outperform OpenCL;[83][86] but the
 performance differences can mostly be attributed to differences in the
 programming model (especially the memory model) and in the optimizations that
 OpenCL C compilers performed as compared to those in the CUDA compiler.

   -- 
http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternatives

No reason not to support both, however.

Apr 04 2015

"Dmitri Makarov" <dmakarv gmail.com> writes:

On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:
 It's slower:

However, it's an open standard, will improve, and will be 
available on devices of any vendor interested in implementing the 
compiler and the runtime API, which is essentially every vendor 
of compute devices (CPU, GPU, FPGA, or other accelerators). CUDA 
will be for Nvidia hardware only. (Not that I am against 
providing CUDA support for D programmers).

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:
 On 4/4/2015 2:35 AM, ponce wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html

 I wrote the Driver and Runtime API bindings for
 https://github.com/DerelictOrg/DerelictCUDA

 And the one thing I've done with them is loading the 
 functions, create a
 context and destroy it. So yeah I think using CUDA with D is 
 possible.

 Thank you. How far are you interested in going with it?

Not far. I'm currently trying to bootstrap a business solo 
(hopefully with the help of D) and available time has became 
significantly sparser.
I'd much prefer pass time on Derelict OpenCL bindings (brought to 
you by MeinMein) if time was an option.

 OpenCL 2.x is much more interesting though.

 It's slower:

 Furthermore, in studies of straightforward translation of CUDA 
 programs to
 OpenCL C programs, CUDA has been found to outperform 
 OpenCL;[83][86] but the
 performance differences can mostly be attributed to 
 differences in the
 programming model (especially the memory model) and in the 
 optimizations that
 OpenCL C compilers performed as compared to those in the CUDA 
 compiler.

   -- 
 http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternatives

It used to be that CUDA had warps and pinned memory and OpenCL 
didn't. Now OpenCL 2.0 has several driver providers and also has 
warps ("sub-groups") and associated warp operations which are 
super useful for performance.

To the extent that I wouldn't recommend building anything new in 
CUDA.
I don't really see what could make OpenCL it slower. But I see 
really well what is dangerous in making new projects in CUDA 
nowadays. I was certainly burned by it to some extent.

The newest CUDA features don't improve performance (Unified 
Memory Addressing, Peer copy, and friends).

OpenCL compiles to FPGAs, CPUs, GPUs, and has no missing features 
anymore. We must now forget what was once true about it.

With Intel OpenCL SDK even tooling is on par with NVIDIA.

 No reason not to support both, however.

Yep.

Apr 04 2015

"ponce" <contact gam3sfrommars.fr> writes:

Also consider costs: NVIDIA will artificially limit the speed of 
pinned memory transferts to push you to buy expensive $3000 
discrete GPUs. They have segmented the market to make the most of 
people performance-starved. It goes to the point that you are 
left with $3000 GPUs that are slower than $300 ones, just to get 
the right driver. Hopefully the market will correct them after so 
much milking.

Apr 04 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 4/4/2015 4:16 AM, ponce wrote:
 Also consider costs: NVIDIA will artificially limit the speed of pinned memory
 transferts to push you to buy expensive $3000 discrete GPUs. They have
segmented
 the market to make the most of people performance-starved. It goes to the point
 that you are left with $3000 GPUs that are slower than $300 ones, just to get
 the right driver. Hopefully the market will correct them after so much milking.

The only thing I can add to that is the people who really want performance will 
be more than willing to buy the GPU to do it and the $3000 means nothing to 
them. I.e. people to whom microseconds means money, such as trading software.

I don't want to leave any tern unstoned.

Also, it seems that we are 95% there in supporting CUDA already. thanks to your 
header work. Just need to write some examples to make sure it works, and write
a 
few pages of "how to do it". Once that is done, we can approach Nvidia and get 
them to mention on their site that D supports CUDA. Nvidia is really pushing 
CUDA, and it will be of mutual benefit for them to promote D and us to support
CUDA.

Apr 04 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Has anyone used D with Nvidia's Cuda?