digitalmars.D.announce - DCompute: First kernels run successfully
- Nicholas Wilson (66/66) Sep 11 2017 I'm pleased to announce that I have run the first dcompute kernel
- jmh530 (3/5) Sep 11 2017 Keep up the good work.
- kerdemdemir (12/16) Sep 11 2017 Hi Wilson,
- Nicholas Wilson (13/30) Sep 11 2017 Hi Erdem
- Walter Bright (3/5) Sep 11 2017 Excellent!
- Nicholas Wilson (5/10) Sep 11 2017 Indeed the the world domination begin!
I'm pleased to announce that I have run the first dcompute kernel and it was a success! There is still a fair bit of polish to the driver needed to make the API sane and more complete, not to mention more similar to the (untested) OpenCL driver API. But it works! (Contributions are of course greatly welcomed) The kernel: ``` compute(CompileFor.deviceOnly) module dcompute.tests.dummykernels; import ldc.dcompute; import dcompute.std.index; kernel void saxpy(GlobalPointer!(float) res, float alpha,GlobalPointer!(float) x, GlobalPointer!(float) y, size_t N) { auto i = GlobalIndex.x; if (i >= N) return; res[i] = alpha*x[i] + y[i]; } ``` The host code: ``` import dcompute.driver.cuda; import dcompute.tests.dummykernels : saxpy; Platform.initialise(); auto devs = Platform.getDevices(theAllocator); auto ctx = Context(devs[0]); scope(exit) ctx.detach(); // Change the file to match your GPU. Program.globalProgram = Program.fromFile("./.dub/obj/kernels_cuda210_64.ptx"); auto q = Queue(false); enum size_t N = 128; float alpha = 5.0; float[N] res, x,y; foreach (i; 0 .. N) { x[i] = N - i; y[i] = i * i; } Buffer!(float) b_res, b_x, b_y; b_res = Buffer!(float)(res[]); scope(exit) b_res.release(); b_x = Buffer!(float)(x[]); scope(exit) b_x.release(); b_y = Buffer!(float)(y[]); scope(exit) b_y.release(); b_x.copy!(Copy.hostToDevice); // not quite sold on this interface yet. b_y.copy!(Copy.hostToDevice); q.enqueue!(saxpy) // <-- the main magic happens here ([N,1,1],[1,1,1]) // the grid (b_res,alpha,b_x,b_y, N); // the kernel arguments b_res.copy!(Copy.deviceToHost); foreach(i; 0 .. N) enforce(res[i] == alpha * x[i] + y[i]); writeln(res[]); // [640, 636, ... 16134] ``` Simple as that! Dcompute, as always, is at https://github.com/libmir/dcompute and on dub. To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU. *Or wait for LDC 1.4 release real soon(™). Thanks to the LDC folks for putting up with me ;) Have fun GPU programming, Nic
Sep 11 2017
On Monday, 11 September 2017 at 12:23:16 UTC, Nicholas Wilson wrote:I'm pleased to announce that I have run the first dcompute kernel and it was a success!Keep up the good work.
Sep 11 2017
Hi Wilson, Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future.To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU. *Or wait for LDC 1.4 release real soon(™).Can you please describe a bit about for starters like me how to build recent LDC. Is this "NVPTX backend" a cmake option? And what should I do for making my "CUDA environment" ready? Which packages should I install? Sorry if my questions are so dummy I hope I will be able to add an example. Regards Erdem
Sep 11 2017
On Monday, 11 September 2017 at 20:45:43 UTC, kerdemdemir wrote:Hi Wilson, Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future.Hi Erdem Sorry I've been a bit busy with uni. To build LDC just clone ldc and `git submodule --init` and run cmake, setting LLVM_CONFIG to /path/to/llvm/build/bin/llvm-config and LLVM_INTRINSIC_TD_PATH to /path/to/llvm/source/include/llvm/IR The nvptx backend is enabled by setting LLVM's cmake variable LLVM_TARGETS_TO_BUILD to either "all", or "X86;NVPTX" along with any other archs you want to enable, (without the quotes) and then building LLVM with cmake. This will get picked up by LDC automatically. I just installed the CUDA sdk in its entirety, but I'm sure you don't need everything from it.To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU. *Or wait for LDC 1.4 release real soon(™).Can you please describe a bit about for starters like me how to build recent LDC. Is this "NVPTX backend" a cmake option? And what should I do for making my "CUDA environment" ready? Which packages should I install? Sorry if my questions are so dummy I hope I will be able to add an example. Regards Erdem
Sep 11 2017
On 9/11/2017 5:23 AM, Nicholas Wilson wrote:I'm pleased to announce that I have run the first dcompute kernel and it was a success!Excellent! https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg
Sep 11 2017
On Monday, 11 September 2017 at 22:40:02 UTC, Walter Bright wrote:On 9/11/2017 5:23 AM, Nicholas Wilson wrote:Indeed the the world domination begin! I just need to get some OpenCL 2.0 capable hardware to test that and we'll be well on the way. AlsoLDC1.4 was just released Yay!I'm pleased to announce that I have run the first dcompute kernel and it was a success!Excellent! https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg
Sep 11 2017