digitalmars.D - Tracy Profiler Integration

Stefan Koch (12/12) Aug 03 2020 Hello Folks,

DepresseD (5/17) Aug 04 2020 Wouldn't it be more useful for everyone, and for you too, to

Stefan Koch (10/33) Aug 04 2020 newCTFE in the end, has a lot less benefit than I first assumed.

Sebastiaan Koppe (4/9) Aug 04 2020 The speedup will certainly be useful to me since I am doing more
WebFreak001 (11/21) Aug 04 2020 I think it's great that you are doing this. Tracing is an

Stefan Koch (6/18) Aug 04 2020 Yeah I am running into that limit right now :p
Stefan Koch (9/11) Aug 05 2020 Ah sorry, I missed this question.

Steven Schveighoffer (9/13) Aug 04 2020 If CTFE becomes way less expensive (in CPU usage and memory usage), then...

Avrina (14/29) Aug 04 2020 CTFE takes 500 ms for my project. It takes a total of about 10

Stefan Koch (11/33) Aug 04 2020 It's not quite that AST insertion is slow. It's the fact that you

Per =?UTF-8?B?Tm9yZGzDtnc=?= (6/12) Aug 04 2020 Can you elaborate a bit on this statement? Is this problem

Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/4) Aug 04 2020 I guess I understand what you mean by "semantically processed

Stefan Koch (18/25) Aug 05 2020 completely processed means just that.

aberba (5/17) Aug 05 2020 How much of a part does non-templated nested

Stefan Koch (4/23) Aug 05 2020 It's all about the point of definition.

Steven Schveighoffer (17/48) Aug 04 2020 I have faced much different problems (mostly with memory consumption).

Adam D. Ruppe (8/10) Aug 04 2020 Yes, indeed, but worth noting with newer versions of the compiler
Avrina (21/74) Aug 04 2020 If you use -lowmem and you still have memory issues, then it

Stefan Koch (6/29) Aug 04 2020 Of course tracy integration also helps to profile newCTFE ;)

aberba (6/18) Aug 04 2020 I agree with Webfreak on this. Tooling is very lacking in D and a
Stefan Koch (23/35) Aug 10 2020 There are some news of this:

Nils Lankila (3/44) Aug 11 2020 that cant be slower than callgrind !

Alexandru Ermicioi (9/21) Aug 12 2020 Hi,

Stefan Koch (13/21) Aug 12 2020 I am not sure what you are talking about.

Alexandru Ermicioi (18/42) Aug 13 2020 There is in your comparison between compiled binary, tracy client

Stefan Koch <uplink.coder googlemail.com> writes:

Hello Folks,

I am currently integrating the tracy profiler 
(https://github.com/wolfpld/tracy)
with dmd.

Such that instead of the profiler in druntime, tracy can be used 
for instrumented profiling.


The current progress is here: 
https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident that 
it will be useful soon.

Cheers,

Stefan

Aug 03 2020

DepresseD <DepresseD nohope.org> writes:

On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan


Wouldn't it be more useful for everyone, and for you too, to 
complete a project instead of starting dozens at the same time? I 
have completely lost hope of being able to see benefits from the 
newCTFE

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 08:13:15 UTC, DepresseD wrote:
 On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan


 Wouldn't it be more useful for everyone, and for you too, to 
 complete a project instead of starting dozens at the same time? 
 I have completely lost hope of being able to see benefits from 
 the newCTFE

newCTFE in the end, has a lot less benefit than I first assumed.

Currently it is missing execption handling, and associative array 
support.
And of course there are some bugs in the featureset which is 
supported.

If I had had a good integrated profiler back then, and some of 
the code which I have access to now, I would probably never have 
started newCTFE, and would have tried to fix the template system 
itself.

Aug 04 2020

Sebastiaan Koppe <mail skoppe.eu> writes:

On Tuesday, 4 August 2020 at 08:44:08 UTC, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first assumed.

The speedup will certainly be useful to me since I am doing more 
and more at CTFE.

 If I had had a good integrated profiler back then, and some of 
 the code which I have access to now, I would probably never 
 have started newCTFE, and would have tried to fix the template 
 system itself.

There is still time :)

Aug 04 2020

WebFreak001 <d.forum webfreak.org> writes:

On Tuesday, 4 August 2020 at 08:44:08 UTC, Stefan Koch wrote:
 [...]

 newCTFE in the end, has a lot less benefit than I first assumed.

 Currently it is missing execption handling, and associative 
 array support.
 And of course there are some bugs in the featureset which is 
 supported.

 If I had had a good integrated profiler back then, and some of 
 the code which I have access to now, I would probably never 
 have started newCTFE, and would have tried to fix the template 
 system itself.

I think it's great that you are doing this. Tracing is an 
extremely lacking part in D, so I am grateful for what you have 
started here.

I'm curious though, why does this need to be a compiler change 
instead of a library addition? It says in the user manual that 
"There can be no more than 65534 unique source locations", so 
wouldn't automatically inserting the boilerplate into every 
function pretty quickly blow that up?

Also where does the referenced TracyClientNoExit.o file in your 
changes come from?

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 09:09:41 UTC, WebFreak001 wrote:
 On Tuesday, 4 August 2020 at 08:44:08 UTC, Stefan Koch wrote:
 [...]

 I think it's great that you are doing this. Tracing is an 
 extremely lacking part in D, so I am grateful for what you have 
 started here.

 I'm curious though, why does this need to be a compiler change 
 instead of a library addition? It says in the user manual that 
 "There can be no more than 65534 unique source locations", so 
 wouldn't automatically inserting the boilerplate into every 
 function pretty quickly blow that up?

Yeah I am running into that limit right now :p

I'll have to hack tracy :)

 Also where does the referenced TracyClientNoExit.o file in your 
 changes come from?

TracyClientNoExit.so is produced by a build of the tracy cpp 
codebase.
I'll provide a script and instructions later.

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 09:09:41 UTC, WebFreak001 wrote:
 I'm curious though, why does this need to be a compiler change 
 instead of a library addition?

Ah sorry, I missed this question.

The interface druntime has to the profiler does not expose all 
the information, tracy needs.
Because you can attach the tracy frontend to your program at 
runtime and see which functions currently need how long to 
execute.
Tracy needs location information to be provided with the 
measurement because, tracy is a realtime frame profiler.

Aug 05 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/4/20 4:44 AM, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first assumed.

If CTFE becomes way less expensive (in CPU usage and memory usage), then 
the template problem becomes easier as well, as we can do more CTFE to 
replace templates.

 If I had had a good integrated profiler back then, and some of the code 
 which I have access to now, I would probably never have started newCTFE, 
 and would have tried to fix the template system itself.

I still think newCTFE has worthwhile benefit, even if it alone cannot 
fix all the problems.

I think newCTFE and type functions (along with Manu's ... DIP) would be 
a good combination to attack the problem.

-Steve

Aug 04 2020

Avrina <avrina12309412342 gmail.com> writes:

On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 4:44 AM, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first 
 assumed.

 If CTFE becomes way less expensive (in CPU usage and memory 
 usage), then the template problem becomes easier as well, as we 
 can do more CTFE to replace templates.

CTFE takes 500 ms for my project. It takes a total of about 10 
seconds for the frontend to do everything, without -inline. The 
more significant problem is definitely templates, their expansion 
and how everything is processed back into a AST. Such as the case 
if your run CTFE, even "newCTFE" the result is still going to 
have to be expanded back into an AST, which is the core problem.

 If I had had a good integrated profiler back then, and some of 
 the code which I have access to now, I would probably never 
 have started newCTFE, and would have tried to fix the template 
 system itself.

 I still think newCTFE has worthwhile benefit, even if it alone 
 cannot fix all the problems.

 I think newCTFE and type functions (along with Manu's ... DIP) 
 would be a good combination to attack the problem.

 -Steve

It doesn't help, it just introduces a whole lot of more 
complexity into the compiler as well. Effectively it creates a 
second compiler within the compiler. It is much more complicated 
than the current solution, and I don't imagine the speed up is 
going to be that much as, for my case it will only be able to 
reduce the build time by a maximum of 500 ms.

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 16:01:54 UTC, Avrina wrote:
 On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven Schveighoffer 
 wrote:
 On 8/4/20 4:44 AM, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first 
 assumed.

 If CTFE becomes way less expensive (in CPU usage and memory 
 usage), then the template problem becomes easier as well, as 
 we can do more CTFE to replace templates.

 CTFE takes 500 ms for my project. It takes a total of about 10 
 seconds for the frontend to do everything, without -inline. The 
 more significant problem is definitely templates, their 
 expansion and how everything is processed back into a AST. Such 
 as the case if your run CTFE, even "newCTFE" the result is 
 still going to have to be expanded back into an AST, which is 
 the core problem.

It's not quite that AST insertion is slow. It's the fact that you
have to do semantic processing piece by piece, which is expensive.
If you have completely semantically processed nodes, linking them
into the tree is quite painless.
 It doesn't help, it just introduces a whole lot of more 
 complexity into the compiler as well. Effectively it creates a 
 second compiler within the compiler. It is much more 
 complicated than the current solution, and I don't imagine the 
 speed up is going to be that much as, for my case it will only 
 be able to reduce the build time by a maximum of 500 ms.

What exactly do you mean.
type functions are a relatively simple extension of ctfe.

"..." is a little more involved but also not massively 
complicated.
both should stay under 1000 lines when implemented
*fingers crossed*

Aug 04 2020

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 4 August 2020 at 16:11:38 UTC, Stefan Koch wrote:
 It's not quite that AST insertion is slow. It's the fact that 
 you
 have to do semantic processing piece by piece, which is 
 expensive.

Can you elaborate a bit on this statement? Is this problem 
specific to `dmd`'s non-lazy implementation of semantic analysis, 
D or templated statically typed languages in general?

Further is this problem related to the frontend only?

 If you have completely semantically processed nodes, linking 
 them into the tree is quite painless.

What do you mean by "semantically processed nodes"?

Aug 04 2020

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
 What do you mean by "semantically processed nodes"?

I guess I understand what you mean by "semantically processed 
nodes". The other question remains unanswered, though.

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
 On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:
 What do you mean by "semantically processed nodes"?

 I guess I understand what you mean by "semantically processed 
 nodes". The other question remains unanswered, though.

completely processed means just that.
That they are finished and can be just linked into the tree.

i.e. they know what they are.

In the example f(a, b).
That would mean we know the type, size, location and meaning of 
f, a, and b.
As well as anything f, a, and b might refer to.
Determining this can be a very expensive process,
just searching scopes upwards to know the meaning of a name can 
take very long,
if you have deep nesting (this happens in recursive templates for 
example.)

 Is this problem specific to `dmd`'s non-lazy implementation of 
 semantic analysis, D or templated statically typed languages in 
 general?

That's a tricky question. I don't know.
It is my strong believe however that templates (static 
polymorphism) as used in C++ or D,
is fundamentally hard to implement efficiently and fast.
Don't quote me on that unless I turn out to be right ;p

Aug 05 2020

aberba <karabutaworld gmail.com> writes:

On Wednesday, 5 August 2020 at 10:34:55 UTC, Stefan Koch wrote:
 On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
 On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:

 i.e. they know what they are.

 In the example f(a, b).
 That would mean we know the type, size, location and meaning of 
 f, a, and b.
 As well as anything f, a, and b might refer to.
 Determining this can be a very expensive process,
 just searching scopes upwards to know the meaning of a name can 
 take very long,
 if you have deep nesting (this happens in recursive templates 
 for example.)

How much of a part does non-templated nested 
function/classes/struct play in this?

And is it more about the scope where they are called or where 
they are defined in code?

Aug 05 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 5 August 2020 at 10:44:24 UTC, aberba wrote:
 On Wednesday, 5 August 2020 at 10:34:55 UTC, Stefan Koch wrote:
 On Tuesday, 4 August 2020 at 20:55:11 UTC, Per Nordlöw wrote:
 On Tuesday, 4 August 2020 at 20:54:01 UTC, Per Nordlöw wrote:

 i.e. they know what they are.

 In the example f(a, b).
 That would mean we know the type, size, location and meaning 
 of f, a, and b.
 As well as anything f, a, and b might refer to.
 Determining this can be a very expensive process,
 just searching scopes upwards to know the meaning of a name 
 can take very long,
 if you have deep nesting (this happens in recursive templates 
 for example.)

 How much of a part does non-templated nested 
 function/classes/struct play in this?

 And is it more about the scope where they are called or where 
 they are defined in code?

It's all about the point of definition.
I doubt regular nested functions/aggregates ever have a nesting 
level of over 20 which is when this stuff starts to matter.

Aug 05 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 8/4/20 12:01 PM, Avrina wrote:
 On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven Schveighoffer wrote:
 On 8/4/20 4:44 AM, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first assumed.

 If CTFE becomes way less expensive (in CPU usage and memory usage), 
 then the template problem becomes easier as well, as we can do more 
 CTFE to replace templates.

 
 CTFE takes 500 ms for my project. It takes a total of about 10 seconds 
 for the frontend to do everything, without -inline. The more significant 
 problem is definitely templates, their expansion and how everything is 
 processed back into a AST. Such as the case if your run CTFE, even 
 "newCTFE" the result is still going to have to be expanded back into an 
 AST, which is the core problem.

I have faced much different problems (mostly with memory consumption). 
Part of the problem of current CTFE is that it consumes lots of memory 
needlessly (at least that's my recollection). That is part (but not all) 
of the reason we use recursive templates instead of CTFE to solve our 
compile-time computation problems.

I don't know whether newCTFE fixes ALL the problems or not. But it will 
still help.

 
 If I had had a good integrated profiler back then, and some of the 
 code which I have access to now, I would probably never have started 
 newCTFE, and would have tried to fix the template system itself.

 I still think newCTFE has worthwhile benefit, even if it alone cannot 
 fix all the problems.

 I think newCTFE and type functions (along with Manu's ... DIP) would 
 be a good combination to attack the problem.

 
 It doesn't help, it just introduces a whole lot of more complexity into 
 the compiler as well. Effectively it creates a second compiler within 
 the compiler. It is much more complicated than the current solution, and 
 I don't imagine the speed up is going to be that much as, for my case it 
 will only be able to reduce the build time by a maximum of 500 ms.

What is "It" that you are talking about?

I imagine part of the problem here is that CTFE is avoided because it 
doesn't deal with types and compile-time lists very well -- you have 
only one solution there. In which case, CTFE is not used in many places 
where it really is a natural fit.

So really, if CTFE was more usable, it could replace the vast template 
usage that is likely causing your build to be slower, and then an 
optimized CTFE becomes more relevant.

-Steve

Aug 04 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 4 August 2020 at 16:32:06 UTC, Steven Schveighoffer 
wrote:
 Part of the problem of current CTFE is that it consumes lots of 
 memory needlessly (at least that's my recollection).

Yes, indeed, but worth noting with newer versions of the compiler 
*some* of it is freed in between instances and if you preallocate 
memory; write your ctfe code carefully enough, this can be very 
effectively managed with existing old ctfe.

I'm still for improving it, of course, just we can do nicer 
things with the existing thing if done carefully.

Aug 04 2020

Avrina <avrina12309412342 gmail.com> writes:

On Tuesday, 4 August 2020 at 16:32:06 UTC, Steven Schveighoffer 
wrote:
 On 8/4/20 12:01 PM, Avrina wrote:
 On Tuesday, 4 August 2020 at 12:47:54 UTC, Steven 
 Schveighoffer wrote:
 On 8/4/20 4:44 AM, Stefan Koch wrote:
 newCTFE in the end, has a lot less benefit than I first 
 assumed.

 If CTFE becomes way less expensive (in CPU usage and memory 
 usage), then the template problem becomes easier as well, as 
 we can do more CTFE to replace templates.

 
 CTFE takes 500 ms for my project. It takes a total of about 10 
 seconds for the frontend to do everything, without -inline. 
 The more significant problem is definitely templates, their 
 expansion and how everything is processed back into a AST. 
 Such as the case if your run CTFE, even "newCTFE" the result 
 is still going to have to be expanded back into an AST, which 
 is the core problem.

 I have faced much different problems (mostly with memory 
 consumption). Part of the problem of current CTFE is that it 
 consumes lots of memory needlessly (at least that's my 
 recollection). That is part (but not all) of the reason we use 
 recursive templates instead of CTFE to solve our compile-time 
 computation problems.

If you use -lowmem and you still have memory issues, then it 
probably won't help. If it's CTFE. The reason so much memory is 
being used is because at times the compiler is working backwards. 
Even when it has a representation that is smaller and closer to 
what the final result should be, it has to convert it backwards 
into an AST.

 I don't know whether newCTFE fixes ALL the problems or not. But 
 it will still help.

It will very likely introduce new problems as well.

 If I had had a good integrated profiler back then, and some 
 of the code which I have access to now, I would probably 
 never have started newCTFE, and would have tried to fix the 
 template system itself.

 I still think newCTFE has worthwhile benefit, even if it 
 alone cannot fix all the problems.

 I think newCTFE and type functions (along with Manu's ... 
 DIP) would be a good combination to attack the problem.

 
 It doesn't help, it just introduces a whole lot of more 
 complexity into the compiler as well. Effectively it creates a 
 second compiler within the compiler. It is much more 
 complicated than the current solution, and I don't imagine the 
 speed up is going to be that much as, for my case it will only 
 be able to reduce the build time by a maximum of 500 ms.

 What is "It" that you are talking about?

 I imagine part of the problem here is that CTFE is avoided 
 because it doesn't deal with types and compile-time lists very 
 well -- you have only one solution there. In which case, CTFE 
 is not used in many places where it really is a natural fit.

 So really, if CTFE was more usable, it could replace the vast 
 template usage that is likely causing your build to be slower, 
 and then an optimized CTFE becomes more relevant.

 -Steve

It is newCTFE. I haven't looked at type functions, maybe they can 
help, but the larger issue is just how the compiler is 
structured. They can help in some cases, but not all. Part of 
what takes so much memory in my project and the build time, is in 
fact the runtime. I wonder how fast the compile time would be 
with -betterC but I'm not going to modify my project that much to 
figure out.

Have you looked at newCTFE and how it is being implemented and 
exactly what it affects? I think that's just overly positive 
optimism and not how it will end up working (without heavy 
modification, more than what is already being done in regards to 
newCTFE).

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 August 2020 at 08:13:15 UTC, DepresseD wrote:
 On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan


 Wouldn't it be more useful for everyone, and for you too, to 
 complete a project instead of starting dozens at the same time? 
 I have completely lost hope of being able to see benefits from 
 the newCTFE

Of course tracy integration also helps to profile newCTFE ;)

There's one more thing, I am almost done with tracy.
It was like 2 days of work.

In the timescale that newCTFE took, and still takes that's 
nothing!

Aug 04 2020

aberba <karabutaworld gmail.com> writes:

On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan


I agree with Webfreak on this. Tooling is very lacking in D and a 
profiler like tracy (along with its ecosystem of tools) is a very 
necessary addition for code optimization.

As a side note, one the most spoken about of browser dev tools is 
their JavaScript profiler.

Aug 04 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan

There are some news of this:

Here is a screenshot of the Tracy profiler integration being 
used, for a "trivial" hello world example, (in reliity the 
druntime and phobos templates gave me a bit of trouble  ...)

https://ibb.co/0XKyJd3

The code on the left hand side can be compiled with the -tracy 
switch.
and run (provided you copy TracyClientNoExit.o) into 
/tmp/TracyClientNoExit.o
(which you have to compile yourself using 
https://github.com/UplinkCoder/tracy/tree/fixit
and running the build.sh from there).

Cheers everyone!
Note, currently I am using the c-api for dynamic langauges, which 
creates source locations every time, that makes the profiling a 
little more expensive than it would usually be.
Hence even a no-op function will take about a microsecond ... But 
this can be improved later.

Using tracy I found out that the unsignedTempString function in 
Phobos is somewhat slow ...
No wonder it divides in a loop.
I'll replace it soon.

Aug 10 2020

Nils Lankila <NilsLankila gmx.us> writes:

On Monday, 10 August 2020 at 21:54:33 UTC, Stefan Koch wrote:
 On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan

 There are some news of this:

 Here is a screenshot of the Tracy profiler integration being 
 used, for a "trivial" hello world example, (in reliity the 
 druntime and phobos templates gave me a bit of trouble  ...)

 https://ibb.co/0XKyJd3

Nice

 The code on the left hand side can be compiled with the -tracy 
 switch.
 and run (provided you copy TracyClientNoExit.o) into 
 /tmp/TracyClientNoExit.o
 (which you have to compile yourself using 
 https://github.com/UplinkCoder/tracy/tree/fixit
 and running the build.sh from there).

 Cheers everyone!
 Note, currently I am using the c-api for dynamic langauges, 
 which creates source locations every time, that makes the 
 profiling a little more expensive than it would usually be.
 Hence even a no-op function will take about a microsecond ... 
 But this can be improved later.

that cant be slower than callgrind !

 Using tracy I found out that the unsignedTempString function in 
 Phobos is somewhat slow ...
 No wonder it divides in a loop.
 I'll replace it soon.

Aug 11 2020

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Monday, 3 August 2020 at 20:36:51 UTC, Stefan Koch wrote:
 Hello Folks,

 I am currently integrating the tracy profiler 
 (https://github.com/wolfpld/tracy)
 with dmd.

 Such that instead of the profiler in druntime, tracy can be 
 used for instrumented profiling.


 The current progress is here: 
 https://github.com/dlang/dmd/compare/master...UplinkCoder:tracy?expand=1
  there are still some teething problems, but I am confident 
 that it will be useful soon.

 Cheers,

 Stefan

Hi,

Looked at source code, and wondered, is it possible to have an 
abstraction layer between tracy and injected code?

That way we could eliminate tracy depenedency from dmd, and move 
it into a library solution, which would then allow us to replace 
tracy woth other tools if needed. It also would allow 
preprocessing if needed before event would be sent to tracy 
server.

Aug 12 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 12 August 2020 at 14:51:08 UTC, Alexandru Ermicioi 
wrote:
 Hi,

 Looked at source code, and wondered, is it possible to have an 
 abstraction layer between tracy and injected code?

 That way we could eliminate tracy depenedency from dmd, and 
 move it into a library solution, which would then allow us to 
 replace tracy woth other tools if needed. It also would allow 
 preprocessing if needed before event would be sent to tracy 
 server.

I am not sure what you are talking about.
There is no dependency between dmd and tracy.

Are you talking about providing a general profiling interface?
That might be possible but outside of the scope what I am trying 
to do right now.
Also keep in mind that the number of layers between the profiler 
and the profiled application diminish the usefulness.

To be honest having to link with a library to use tracy already 
sucks.
I'd rather inject all the profiling code verbatim, but that's too 
involved at the moment.

Aug 12 2020

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Thursday, 13 August 2020 at 00:32:34 UTC, Stefan Koch wrote:
 On Wednesday, 12 August 2020 at 14:51:08 UTC, Alexandru 
 Ermicioi wrote:
 Hi,

 Looked at source code, and wondered, is it possible to have an 
 abstraction layer between tracy and injected code?

 That way we could eliminate tracy depenedency from dmd, and 
 move it into a library solution, which would then allow us to 
 replace tracy woth other tools if needed. It also would allow 
 preprocessing if needed before event would be sent to tracy 
 server.

 I am not sure what you are talking about.
 There is no dependency between dmd and tracy.

 Are you talking about providing a general profiling interface?
 That might be possible but outside of the scope what I am 
 trying to do right now.
 Also keep in mind that the number of layers between the 
 profiler and the profiled application diminish the usefulness.

 To be honest having to link with a library to use tracy already 
 sucks.
 I'd rather inject all the profiling code verbatim, but that's 
 too involved at the moment.

There is in your comparison between compiled binary, tracy client 
and dmd, a hardcoded one (linking tracy client, and injecting 
calls to tracy client in binary in semantic3 visitor of dmd). If 
you make a *thin* abstraction between tracy client and dmd, you 
may move tracy client dependency to druntime instead of dmd, and 
hide it by a version switch if needed. Then you won't need any 
-tracy flag in dmd itself, just standard profiling switch. D user 
would also be able to switch to whatever profiling agent he'd 
like by providing their own implementation in custom druntime, 
which would be a lot easier than modifying actual dmd to add 
support for their profiler. This is also in trend of moving 
compiler magic into a library solution btw.

Also tracy is client-server application, right? You'll still 
require some lib or implementation of tracy client side to 
communicate with server, so I'm not sure what do you mean by 
injecting profiling code verbatim.

- Alex

Aug 13 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Tracy Profiler Integration