digitalmars.D.learn - Faster Dlang Execution
- seany (10/10) Jun 08 2021 Hello
- mw (5/11) Jun 08 2021 You need to write parallel code yourself, the compiler won't know
- Basile B. (10/21) Jun 08 2021 try `dub build -b release --compiler=ldc2`
- Jack (3/13) Jun 08 2021 also there's the --parallel flag itself supported by ldc2, I have
- H. S. Teoh (22/24) Jun 08 2021 [...]
Hello How can I increase the speed of executable files created via : `dub build -b release` I am unable to parallellise all of it, as it depends on part of the result being calculated before something else can be calculated. I have many `nonsafe, nonpure` functions. Classes are virtual by defalut. Profiling doesn't help, because different input is causing different parts of the program to become slow. Thank you.
Jun 08 2021
On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:Hello How can I increase the speed of executable files created via : `dub build -b release` I am unable to parallellise all of it, as it depends on part of the result being calculated before something else can be calculated.You need to write parallel code yourself, the compiler won't know your app's logic. Here is how the Dlang std lib can help: https://tour.dlang.org/tour/en/multithreading/std-parallelism
Jun 08 2021
On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:Hello How can I increase the speed of executable files created via : `dub build -b release`try `dub build -b release --compiler=ldc2` Then you can set some specific DFlags for ldc, like -O3 or --mcpuI am unable to parallellise all of it, as it depends on part of the result being calculated before something else can be calculated. I have many `nonsafe, nonpure` functions.`nothrow` presumably opens optimisation opportunities with the stack management, although it's not verifiedClasses are virtual by defalut.set them final when possible. When not possible set the virtual methods that are not overridden `final`.Profiling doesn't help, because different input is causing different parts of the program to become slow.if you're on linux, you can try callgrind + kcachegrind instead of builtin intrumentation.Thank you.
Jun 08 2021
On Tuesday, 8 June 2021 at 17:40:19 UTC, Basile B. wrote:On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:also there's the --parallel flag itself supported by ldc2, I have quite a while ago but I'm pretty sure it still is thereHello How can I increase the speed of executable files created via : `dub build -b release`try `dub build -b release --compiler=ldc2` Then you can set some specific DFlags for ldc, like -O3 or --mcpu
Jun 08 2021
On Tue, Jun 08, 2021 at 05:10:47PM +0000, seany via Digitalmars-d-learn wrote: [...]Profiling doesn't help, because different input is causing different parts of the program to become slow.[...] Do you have any more specific information about what kind of inputs cause which parts of the program to slow down? Without more details it's hard to say what the problem is. But I'd say, if you care about performance you should fix *all* of the slow parts that your profiler finds. There are some performance best practices that you should follow, such as reduce frequent GC allocations, avoid expensive algorithms (like O(n^2) or worse) where possible, avoid excessive copying of data, perform I/O in larger blocks instead of small bits at a time, avoid excessive indirection (final methods where they don't need to be virtual, by-value types instead of deep dereferencing, etc.), cache frequently-computed results, etc.. But more importantly, if you can elaborate a bit more on what your program is trying to do, it would help us give more specific recommendations. There may be domain-specific optimizations that you could apply as well. T -- MS Windows: 64-bit rehash of 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand 1-bit of competition.
Jun 08 2021
On Tuesday, 8 June 2021 at 18:03:32 UTC, H. S. Teoh wrote:But more importantly, if you can elaborate a bit more on what your program is trying to do, it would help us give more specific recommendations. There may be domain-specific optimizations that you could apply as well. THi The program is trying to categorize GPS tracks. It has to identify track that count as (somewhat) parallel (this is difficult to define) . So I draw lines through points that have at most 5 m (as measured by vicenty -'s formula) RMS error from the trend line. Then i look for lines that can be considered "turn lines" ( a turn joining two parallel lines). Then I draw a best fit boundary around it. I lay a square grid, and remove the squares where no line can be found. Then I use this algorithm : https://stackoverflow.com/questions/50885339/polygon-from-a-grid-of-squares This runs at O(N²) for sure. Does this help?
Jun 08 2021
On Tuesday, 8 June 2021 at 22:04:26 UTC, seany wrote:The program is trying to categorize GPS tracks. It has to identify track that count as (somewhat) parallel (this is difficult to define) .Maybe this is what you looking for: https://en.wikipedia.org/wiki/Dynamic_time_warping and you can run on GPU with this: https://github.com/garrettwrong/cuTWED
Jun 08 2021