digitalmars.D.announce - Libdivide ported to D
- Tomer Filiba (18/18) May 14 2017 https://code.dlang.org/packages/divide
- Walter Bright (4/13) May 14 2017 I hate to say this, but modern compilers already do this for generated r...
- David Nadlinger (5/11) May 14 2017 As Tomer points out, a runtime implementation can still be useful
- Jonathan M Davis via Digitalmars-d-announce (15/26) May 15 2017 Liran was telling me last year about how the folks at Weka had used this...
- Walter Bright (10/22) May 15 2017 One can do things like this:
https://code.dlang.org/packages/divide Libdivide (http://libdivide.com/) allows converting the DIV instruction (in runtime) to a series of shifts and MULs, which is much more efficient in execution time. It works by taking a number (the divisor or "denominator") and doing some preprocessing to it, after which dividing by it can be ~8 times faster (my own measurements). I use it to divide CPU cycles by the CPU frequency (i.e., two big ugly numbers) to obtain wall time from it. Of course it only applies to runtime division -- the compiler can do the same if the divisor is known in compile time. Notes: * It's a header-only library so I ported the code itself * I tried to keep my port as mechanical as possible; I can't really say I know what's going on there * I only ported the POSIX x86-64 code because that's what I needed * Signes-ness is a big issue, be sure you use the right variant -tomer
May 14 2017
On 5/14/2017 3:39 AM, Tomer Filiba wrote:https://code.dlang.org/packages/divide Libdivide (http://libdivide.com/) allows converting the DIV instruction (in runtime) to a series of shifts and MULs, which is much more efficient in execution time. It works by taking a number (the divisor or "denominator") and doing some preprocessing to it, after which dividing by it can be ~8 times faster (my own measurements). I use it to divide CPU cycles by the CPU frequency (i.e., two big ugly numbers) to obtain wall time from it. Of course it only applies to runtime division -- the compiler can do the same if the divisor is known in compile time.I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. Here's dmd's version: https://github.com/dlang/dmd/blob/master/src/ddmd/backend/divcoeff.c
May 14 2017
On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:On 5/14/2017 3:39 AM, Tomer Filiba wrote:As Tomer points out, a runtime implementation can still be useful if the divisor is only known dynamically (but constant across number of operations). — DavidOf course it only applies to runtime division -- the compiler can do the same if the divisor is known in compile time.I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. […]
May 14 2017
On Sunday, May 14, 2017 16:20:21 David Nadlinger via Digitalmars-d-announce wrote:On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:Liran was telling me last year about how the folks at Weka had used this to speed up the stuff in core.time and std.datetime in their local branch and wanted me to look into updating the official implementation to use it (unfortunately, I haven't had the time to spend on D that I would have liked and haven't managed to look into that yet - though that would require putting at least some of this in druntime). I confess though that I was highly confused about the whole thing, because it sounded like this was an optimization that the compiler already did, and yet the Weka guys were having to use libdivide some portion of the time. I suppose that it makes sense though if the issue is that the divisor is not known until runtime. But unfortunately, my understanding of compiler optimizations like this is fairly poor. - Jonathan M DavisOn 5/14/2017 3:39 AM, Tomer Filiba wrote:As Tomer points out, a runtime implementation can still be useful if the divisor is only known dynamically (but constant across number of operations).Of course it only applies to runtime division -- the compiler can do the same if the divisor is known in compile time.I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. […]
May 15 2017
On 5/15/2017 3:51 AM, Jonathan M Davis via Digitalmars-d-announce wrote:Liran was telling me last year about how the folks at Weka had used this to speed up the stuff in core.time and std.datetime in their local branch and wanted me to look into updating the official implementation to use it (unfortunately, I haven't had the time to spend on D that I would have liked and haven't managed to look into that yet - though that would require putting at least some of this in druntime). I confess though that I was highly confused about the whole thing, because it sounded like this was an optimization that the compiler already did, and yet the Weka guys were having to use libdivide some portion of the time. I suppose that it makes sense though if the issue is that the divisor is not known until runtime. But unfortunately, my understanding of compiler optimizations like this is fairly poor.One can do things like this: if (divisor == 10) foreach (i; 1..1000) result += i / 10; // compiler generates faster code here else foreach (i; 1..1000) result += i / divisor; if one knows in advance popular values of divisor. This sort of thing is already done in Phobos when converting numbers <==> strings (optimizing for radix==10).
May 15 2017