digitalmars.D.learn - Faster Dlang Execution

seany (10/10) Jun 08 2021 Hello

mw (5/11) Jun 08 2021 You need to write parallel code yourself, the compiler won't know
Basile B. (10/21) Jun 08 2021 try `dub build -b release --compiler=ldc2`

Jack (3/13) Jun 08 2021 also there's the --parallel flag itself supported by ldc2, I have

H. S. Teoh (22/24) Jun 08 2021 [...]

seany (15/20) Jun 08 2021 Hi

mw (5/8) Jun 08 2021 Maybe this is what you looking for:

seany <seany uni-bonn.de> writes:

Hello

How can I increase the speed of executable files created via :

`dub build -b release`

I am unable to parallellise all of it, as it depends on part of 
the result being calculated before something else can be 
calculated.

I have many `nonsafe, nonpure` functions. Classes are virtual by 
defalut. Profiling doesn't help, because different input is 
causing different parts of the program to become slow.

Thank you.

Jun 08 2021

mw <mingwu gmail.com> writes:

On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:
 Hello

 How can I increase the speed of executable files created via :

 `dub build -b release`

 I am unable to parallellise all of it, as it depends on part of 
 the result being calculated before something else can be 
 calculated.


You need to write parallel code yourself, the compiler won't know 
your app's logic.

Here is how the Dlang std lib can help:

https://tour.dlang.org/tour/en/multithreading/std-parallelism

Jun 08 2021

Basile B. <b2.temp gmx.com> writes:

On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:
 Hello

 How can I increase the speed of executable files created via :

 `dub build -b release`

try `dub build -b release --compiler=ldc2`

Then you can set some specific DFlags for ldc, like -O3 or --mcpu

 I am unable to parallellise all of it, as it depends on part of 
 the result being calculated before something else can be 
 calculated.

 I have many `nonsafe, nonpure` functions.

`nothrow` presumably opens optimisation opportunities with the 
stack management, although it's not verified

Classes are virtual  by defalut.

set them final when possible. When not possible set the virtual 
methods that
are not overridden `final`.

 Profiling doesn't help, because different input is causing 
 different parts of the program to become slow.

if you're on linux, you can try callgrind + kcachegrind instead 
of builtin intrumentation.

 Thank you.

Jun 08 2021

Jack <jckj33 gmail.com> writes:

On Tuesday, 8 June 2021 at 17:40:19 UTC, Basile B. wrote:
 On Tuesday, 8 June 2021 at 17:10:47 UTC, seany wrote:
 Hello

 How can I increase the speed of executable files created via :

 `dub build -b release`

 try `dub build -b release --compiler=ldc2`

 Then you can set some specific DFlags for ldc, like -O3 or 
 --mcpu

also there's the --parallel flag itself supported by ldc2, I have 
quite a while ago but I'm pretty sure it still is there

Jun 08 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jun 08, 2021 at 05:10:47PM +0000, seany via Digitalmars-d-learn wrote:
[...]
 Profiling doesn't help, because different input is causing different
 parts of the program to become slow.

[...]

Do you have any more specific information about what kind of inputs
cause which parts of the program to slow down?  Without more details
it's hard to say what the problem is.

But I'd say, if you care about performance you should fix *all* of the
slow parts that your profiler finds.

There are some performance best practices that you should follow, such
as reduce frequent GC allocations, avoid expensive algorithms (like
O(n^2) or worse) where possible, avoid excessive copying of data,
perform I/O in larger blocks instead of small bits at a time, avoid
excessive indirection (final methods where they don't need to be
virtual, by-value types instead of deep dereferencing, etc.), cache
frequently-computed results, etc..

But more importantly, if you can elaborate a bit more on what your
program is trying to do, it would help us give more specific
recommendations. There may be domain-specific optimizations that you
could apply as well.


T

-- 
MS Windows: 64-bit rehash of 32-bit extensions and a graphical shell for a
16-bit patch to an 8-bit operating system originally coded for a 4-bit
microprocessor, written by a 2-bit company that can't stand 1-bit of
competition.

Jun 08 2021

seany <seany uni-bonn.de> writes:

On Tuesday, 8 June 2021 at 18:03:32 UTC, H. S. Teoh wrote:

 But more importantly, if you can elaborate a bit more on what 
 your program is trying to do, it would help us give more 
 specific recommendations. There may be domain-specific 
 optimizations that you could apply as well.


 T


Hi

The program is trying to categorize GPS tracks.
It has to identify track that count as (somewhat) parallel (this 
is difficult to define) .

So I draw lines through points that have at most 5 m (as measured 
by vicenty -'s formula) RMS error from the trend line. Then i 
look for lines that can be considered "turn lines" ( a turn 
joining two parallel lines).

Then I draw a best fit boundary around it. I lay a square grid, 
and remove the squares where no line can be found.

Then I use this algorithm :
https://stackoverflow.com/questions/50885339/polygon-from-a-grid-of-squares

This runs at O(N²) for sure.

Does this help?

Jun 08 2021

mw <mingwu gmail.com> writes:

On Tuesday, 8 June 2021 at 22:04:26 UTC, seany wrote:
 The program is trying to categorize GPS tracks.
 It has to identify track that count as (somewhat) parallel 
 (this is difficult to define) .

Maybe this is what you looking for:

https://en.wikipedia.org/wiki/Dynamic_time_warping

and you can run on GPU with this:

https://github.com/garrettwrong/cuTWED

Jun 08 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Faster Dlang Execution