digitalmars.D - Has anyone used D with Nvidia's Cuda?
- Walter Bright (1/1) Apr 03 2015 http://www.nvidia.com/object/cuda_home_new.html
- Rikki Cattermole (5/6) Apr 03 2015 Honestly, I don't think anyone has even tried to create bindings. Let
- weaselcat (2/8) Apr 03 2015 Derelict offers cuda bindings.
- Walter Bright (7/17) Apr 04 2015 Ahh, I see:
- weaselcat (3/6) Apr 04 2015 Top 3 results for me for `dlang derelict` are all his github
- Walter Bright (15/20) Apr 04 2015 `D programming language derelict`
- weaselcat (11/34) Apr 04 2015 AFAIK almost all derelict repos are maintained almost solely by
- Walter Bright (11/12) Apr 04 2015 Exactly!
- Rikki Cattermole (10/23) Apr 04 2015 On that idea, just a thought. But DMD-FE is using the visitor pattern
- weaselcat (8/22) Apr 04 2015 I really think you're barking up the wrong tree here - cuda is a
- Walter Bright (8/13) Apr 04 2015 That's right. On the other hand,
- ponce (3/15) Apr 04 2015 If NVIDIA wants full support for NPP, cuBLAS, and the myriad of
- ponce (2/16) Apr 04 2015 A good OpenCL wrapper library like cl4d would do wonders.
- Dmitri Makarov (6/7) Apr 04 2015 I can contribute at least three examples running code on a GPU
- ponce (16/20) Apr 04 2015 The problem with example is that someone have to maintain them.
- Walter Bright (12/29) Apr 04 2015 Oh, I understand that keeping things up to date is always a problem with...
- ponce (5/41) Apr 04 2015 They doesn't seem to have deprecated any function indeed. That
- Walter Bright (3/5) Apr 04 2015 Thanks. I think I'll give it a try and see what it takes to get a simple...
- Dmitri Makarov (5/6) Apr 03 2015 No, but I'm building an embedded dsl that will allow to generate
- Vlad Levenfeld (2/8) Apr 04 2015 How would it be used? At the client level, I mean.
- Dmitri Makarov via Digitalmars-d (36/48) Apr 04 2015 The programmer describes the computations to be done on a device,
- John Colvin (2/3) Apr 04 2015 http://code.dlang.org/packages/derelict-cuda
- Walter Bright (2/5) Apr 04 2015 I know you have interest in CUDA, have you gotten any D code to work wit...
- John Colvin (4/11) Apr 04 2015 I use OpenCL as I don't want to be locked to one vendor's
- Walter Bright (2/13) Apr 04 2015 A reasonable viewpoint.
- ponce (7/8) Apr 04 2015 I wrote the Driver and Runtime API bindings for
- Walter Bright (6/18) Apr 04 2015 It's slower:
- Dmitri Makarov (7/8) Apr 04 2015 However, it's an open standard, will improve, and will be
- ponce (21/48) Apr 04 2015 Not far. I'm currently trying to bootstrap a business solo
- ponce (7/7) Apr 04 2015 Also consider costs: NVIDIA will artificially limit the speed of
- Walter Bright (10/15) Apr 04 2015 The only thing I can add to that is the people who really want performan...
http://www.nvidia.com/object/cuda_home_new.html
Apr 03 2015
On 4/04/2015 3:49 p.m., Walter Bright wrote:http://www.nvidia.com/object/cuda_home_new.htmlHonestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Apr 03 2015
On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:On 4/04/2015 3:49 p.m., Walter Bright wrote:Derelict offers cuda bindings.http://www.nvidia.com/object/cuda_home_new.htmlHonestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Apr 03 2015
On 4/3/2015 11:12 PM, weaselcat wrote:On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:Ahh, I see: https://github.com/DerelictOrg/DerelictCUDA I don't see it here: http://svn.dsource.org/projects/derelict/branches/Derelict2/doc/index.html If the latter is obsolete, it should perhaps be updated to point to the newer one. The svn one is the first google hit for Derelict.On 4/04/2015 3:49 p.m., Walter Bright wrote:Derelict offers cuda bindings.http://www.nvidia.com/object/cuda_home_new.htmlHonestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Apr 04 2015
On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:If the latter is obsolete, it should perhaps be updated to point to the newer one. The svn one is the first google hit for Derelict.Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?
Apr 04 2015
On 4/4/2015 2:34 AM, weaselcat wrote:On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:`D programming language derelict` In any case, the dsource.org page should be fixed or removed. The github page also has problems, * the "Using Derelict" link is dead * "DerelictUtil for Users" has zero information about using D with CUDA, and seems completely irrelevant * no link for "DerelictUtil Wiki" * the example shown is useless * there are no examples of actually running code on a GPU It looks like nothing more than a couple header files (which is a great start, but that's all). In contrast, there's a package to use CUDA with Go: https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/ which is still pretty thin, but much further along.If the latter is obsolete, it should perhaps be updated to point to the newer one. The svn one is the first google hit for Derelict.Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?
Apr 04 2015
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:On 4/4/2015 2:34 AM, weaselcat wrote:PR?On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:`D programming language derelict` In any case, the dsource.org page should be fixed or removed. The github page also has problems, * the "Using Derelict" link is dead * "DerelictUtil for Users" has zero information about using D with CUDA, and seems completely irrelevant * no link for "DerelictUtil Wiki" * the example shown is useless * there are no examples of actually running code on a GPUIf the latter is obsolete, it should perhaps be updated to point to the newer one. The svn one is the first google hit for Derelict.Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?It looks like nothing more than a couple header files (which is a great start, but that's all). In contrast, there's a package to use CUDA with Go: https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/ which is still pretty thin, but much further along.AFAIK almost all derelict repos are maintained almost solely by aldacron, and he maintains a _lot_ of them. https://github.com/DerelictOrg p.s., googling "golang cuda" comes up with almost nothing useful at the top - 4-5 links to the FOSDEM video and some pdfs. I'm not being biased, I seriously can't figure out anything beyond the fosdem video for cuda with go. First result for "dlang cuda" for me is the dub repo for derelict cuda.
Apr 04 2015
On 4/4/2015 3:04 AM, weaselcat wrote:PR?Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
Apr 04 2015
On 4/04/2015 11:26 p.m., Walter Bright wrote:On 4/4/2015 3:04 AM, weaselcat wrote:On that idea, just a thought. But DMD-FE is using the visitor pattern quite a lot to allow the backend to hook into it easily. What if we exposed a set block of code, to CTFE that acted like a backend, but only transformed for the real backend. In other words, allow CTFE to extend the compiler a little like the backend does. To add language features such as transform x code into OpenCL code and have it wrapped nicely into D code. Theoretically if this was done, we could move the iasm into library. Because of CTFE, surely this wouldn't add much code to the front end?PR?Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
Apr 04 2015
On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:On 4/4/2015 3:04 AM, weaselcat wrote:I really think you're barking up the wrong tree here - cuda is a closed proprietary solution only implemented by one vendor effectively cutting off anyone that doesn't work with nvidia hardware. also, the std.algorithm thing sounds a lot like the C++ library Bolt/Thrust https://github.com/HSA-Libraries/BoltPR?Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
Apr 04 2015
On 4/4/2015 3:45 AM, weaselcat wrote:I really think you're barking up the wrong tree here - cuda is a closed proprietary solution only implemented by one vendor effectively cutting off anyone that doesn't work with nvidia hardware.That's right. On the other hand, 1. Nvidia hardware is pervasive, and CUDA has been around for many years. I doubt it is going away anytime soon. 2. It is little effort on our part to support it. 3. We'd have some co-marketing opportunities with Nvidia if we support it. 4. Supporting CUDA doesn't impede supporting OpenCL.also, the std.algorithm thing sounds a lot like the C++ library Bolt/Thrust https://github.com/HSA-Libraries/BoltYup.
Apr 04 2015
On Saturday, 4 April 2015 at 17:21:45 UTC, Walter Bright wrote:On 4/4/2015 3:45 AM, weaselcat wrote:If NVIDIA wants full support for NPP, cuBLAS, and the myriad of libraries depending on the Runtime API then it's more effort.I really think you're barking up the wrong tree here - cuda is a closed proprietary solution only implemented by one vendor effectively cutting off anyone that doesn't work with nvidia hardware.That's right. On the other hand, 1. Nvidia hardware is pervasive, and CUDA has been around for many years. I doubt it is going away anytime soon. 2. It is little effort on our part to support it. 3. We'd have some co-marketing opportunities with Nvidia if we support it.
Apr 04 2015
On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:On 4/4/2015 3:04 AM, weaselcat wrote:A good OpenCL wrapper library like cl4d would do wonders.PR?Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
Apr 04 2015
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:* there are no examples of actually running code on a GPUI can contribute at least three examples running code on a GPU (the domains are neural networks, bioinformatics, and grid traversal -- these are my ports to D/OpenCL of Rodinia benchmarks http://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications with_Accelerators), but these examples using OpenCL, not in CUDA.
Apr 04 2015
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:* the example shown is uselessThe problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.* there are no examples of actually running code on a GPUBecause it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.It looks like nothing more than a couple header files (which is a great start, but that's all).Maybe we can delete them so that it's not too embarrassing? Serious proposal. In my opinion, the couple header files provide all you need to use CUDA, if you know what you are doing. If you don't, don't do GPGPU.
Apr 04 2015
On 4/4/2015 4:29 AM, ponce wrote:On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:Oh, I understand that keeping things up to date is always a problem with every third party tool. On the plus side, however, Nvidia seems very good with backwards compatiblity, meaning that when the D bindings get out of date, they will still work. They just won't work with new features.* the example shown is uselessThe problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.Of course. But having a couple examples to show it really does work will go a long way. I am not suggesting making any attempt to duplicate Nvidia's documentation in D.* there are no examples of actually running code on a GPUBecause it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.If that's the state of things, I'd be happy to take them over and put them in Deimos.It looks like nothing more than a couple header files (which is a great start, but that's all).Maybe we can delete them so that it's not too embarrassing? Serious proposal.In my opinion, the couple header files provide all you need to use CUDA, if you know what you are doing. If you don't, don't do GPGPU.That does work for someone who really wants to use CUDA, but not much for someone who is evaluating using D and wants to use the GPU with CUDA.
Apr 04 2015
On Saturday, 4 April 2015 at 17:16:19 UTC, Walter Bright wrote:On 4/4/2015 4:29 AM, ponce wrote:They doesn't seem to have deprecated any function indeed. That could make examples practical.On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:Oh, I understand that keeping things up to date is always a problem with every third party tool. On the plus side, however, Nvidia seems very good with backwards compatiblity, meaning that when the D bindings get out of date, they will still work. They just won't work with new features.* the example shown is uselessThe problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.Sure, the licensing of Derelict probably allows it, and deimos and Derelict are complimentary anyway.Of course. But having a couple examples to show it really does work will go a long way. I am not suggesting making any attempt to duplicate Nvidia's documentation in D.* there are no examples of actually running code on a GPUBecause it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.If that's the state of things, I'd be happy to take them over and put them in Deimos.It looks like nothing more than a couple header files (which is a great start, but that's all).Maybe we can delete them so that it's not too embarrassing? Serious proposal.
Apr 04 2015
On 4/4/2015 2:59 PM, ponce wrote:Sure, the licensing of Derelict probably allows it, and deimos and Derelict are complimentary anyway.Thanks. I think I'll give it a try and see what it takes to get a simple example working.
Apr 04 2015
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:http://www.nvidia.com/object/cuda_home_new.htmlNo, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
Apr 03 2015
On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:How would it be used? At the client level, I mean.http://www.nvidia.com/object/cuda_home_new.htmlNo, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
Apr 04 2015
The programmer describes the computations to be done on a device, invokes the clop compiler via mixin expression passing the string describing the computations in an OpenCL-like syntax. The compiler returns D code that includes the generated OpenCL kernel and all the boiler plate code. The computations can refer to variables declared in the host application, CLOP will generate the necessary CL buffers and kernel arguments. Here's an example: // use CLOP DSL to generate OpenCL kernel and API calls. mixin( compile( q{ int max3( int a, int b, int c ) { int k = a > b ? a : b; return k > c ? k : c; } Antidiagonal NDRange( c : 1 .. cols, r : 1 .. rows ) { F[c, r] = max3( F[c - 1, r - 1] + S[c + cols * r], F[c - 1, r] - penalty, F[c, r - 1] - penalty ); } apply( rectangular_blocking( 8, 8 ) ) } ) ); This implements Needleman-Wunsch algorithm in CLOP. It says that the computation to be done over 2D index space 1..cols by 1..rows. It requires anti-diagonal synchronization pattern, meaning that the elements on every anti-diagonal of the index space can be processed in parallel, but there is global synchronization point between the diagonals. Also the user requests to optimize this using rectangular blocking. The variables: cols, rows, S, F, penalty are normal D variables declared and defined in the application that contains the above mixin statement. You can look at my github repository for more examples https://github.com/dmakarov/clop but the project is in very early stage and not yet usable. Regards, Dmitri On Sat, Apr 4, 2015 at 9:00 AM, Vlad Levenfeld via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:How would it be used? At the client level, I mean.http://www.nvidia.com/object/cuda_home_new.htmlNo, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
Apr 04 2015
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:http://www.nvidia.com/object/cuda_home_new.htmlhttp://code.dlang.org/packages/derelict-cuda
Apr 04 2015
On 4/4/2015 2:16 AM, John Colvin wrote:On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:I know you have interest in CUDA, have you gotten any D code to work with it?http://www.nvidia.com/object/cuda_home_new.htmlhttp://code.dlang.org/packages/derelict-cuda
Apr 04 2015
On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:On 4/4/2015 2:16 AM, John Colvin wrote:I use OpenCL as I don't want to be locked to one vendor's hardware. It's hard enough to write portable, efficient GPGPU code without swapping frameworks as well.On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:I know you have interest in CUDA, have you gotten any D code to work with it?http://www.nvidia.com/object/cuda_home_new.htmlhttp://code.dlang.org/packages/derelict-cuda
Apr 04 2015
On 4/4/2015 3:58 AM, John Colvin wrote:On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:A reasonable viewpoint.On 4/4/2015 2:16 AM, John Colvin wrote:I use OpenCL as I don't want to be locked to one vendor's hardware. It's hard enough to write portable, efficient GPGPU code without swapping frameworks as well.On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:I know you have interest in CUDA, have you gotten any D code to work with it?http://www.nvidia.com/object/cuda_home_new.htmlhttp://code.dlang.org/packages/derelict-cuda
Apr 04 2015
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:http://www.nvidia.com/object/cuda_home_new.htmlI wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible. OpenCL 2.x is much more interesting though.
Apr 04 2015
On 4/4/2015 2:35 AM, ponce wrote:On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:Thank you. How far are you interested in going with it?http://www.nvidia.com/object/cuda_home_new.htmlI wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible.OpenCL 2.x is much more interesting though.It's slower:Furthermore, in studies of straightforward translation of CUDA programs to OpenCL C programs, CUDA has been found to outperform OpenCL;[83][86] but the performance differences can mostly be attributed to differences in the programming model (especially the memory model) and in the optimizations that OpenCL C compilers performed as compared to those in the CUDA compiler.-- http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternatives No reason not to support both, however.
Apr 04 2015
On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:It's slower:However, it's an open standard, will improve, and will be available on devices of any vendor interested in implementing the compiler and the runtime API, which is essentially every vendor of compute devices (CPU, GPU, FPGA, or other accelerators). CUDA will be for Nvidia hardware only. (Not that I am against providing CUDA support for D programmers).
Apr 04 2015
On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:On 4/4/2015 2:35 AM, ponce wrote:Not far. I'm currently trying to bootstrap a business solo (hopefully with the help of D) and available time has became significantly sparser. I'd much prefer pass time on Derelict OpenCL bindings (brought to you by MeinMein) if time was an option.On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:Thank you. How far are you interested in going with it?http://www.nvidia.com/object/cuda_home_new.htmlI wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible.It used to be that CUDA had warps and pinned memory and OpenCL didn't. Now OpenCL 2.0 has several driver providers and also has warps ("sub-groups") and associated warp operations which are super useful for performance. To the extent that I wouldn't recommend building anything new in CUDA. I don't really see what could make OpenCL it slower. But I see really well what is dangerous in making new projects in CUDA nowadays. I was certainly burned by it to some extent. The newest CUDA features don't improve performance (Unified Memory Addressing, Peer copy, and friends). OpenCL compiles to FPGAs, CPUs, GPUs, and has no missing features anymore. We must now forget what was once true about it. With Intel OpenCL SDK even tooling is on par with NVIDIA.OpenCL 2.x is much more interesting though.It's slower:Furthermore, in studies of straightforward translation of CUDA programs to OpenCL C programs, CUDA has been found to outperform OpenCL;[83][86] but the performance differences can mostly be attributed to differences in the programming model (especially the memory model) and in the optimizations that OpenCL C compilers performed as compared to those in the CUDA compiler.-- http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternativesNo reason not to support both, however.Yep.
Apr 04 2015
Also consider costs: NVIDIA will artificially limit the speed of pinned memory transferts to push you to buy expensive $3000 discrete GPUs. They have segmented the market to make the most of people performance-starved. It goes to the point that you are left with $3000 GPUs that are slower than $300 ones, just to get the right driver. Hopefully the market will correct them after so much milking.
Apr 04 2015
On 4/4/2015 4:16 AM, ponce wrote:Also consider costs: NVIDIA will artificially limit the speed of pinned memory transferts to push you to buy expensive $3000 discrete GPUs. They have segmented the market to make the most of people performance-starved. It goes to the point that you are left with $3000 GPUs that are slower than $300 ones, just to get the right driver. Hopefully the market will correct them after so much milking.The only thing I can add to that is the people who really want performance will be more than willing to buy the GPU to do it and the $3000 means nothing to them. I.e. people to whom microseconds means money, such as trading software. I don't want to leave any tern unstoned. Also, it seems that we are 95% there in supporting CUDA already. thanks to your header work. Just need to write some examples to make sure it works, and write a few pages of "how to do it". Once that is done, we can approach Nvidia and get them to mention on their site that D supports CUDA. Nvidia is really pushing CUDA, and it will be of mutual benefit for them to promote D and us to support CUDA.
Apr 04 2015