digitalmars.D - [std.concurrency] prioritySend is 1000 times slower than send?
- osa (58/58) Sep 29 2010 I started using std.concurrency in some projects and overall it feels
- Steven Schveighoffer (6/13) Sep 29 2010 Note, core.demangle will probably soon replace std.demangle, and is
- Denis Koroskin (4/18) Sep 29 2010 IIRC, core.demangle is already in dmd2.049 (i.e. no need for an svn
- Sean Kelly (2/6) Sep 30 2010 Thanks for this. I can tell you that prioritySend performs an extra all...
- Sean Kelly (6/14) Sep 30 2010 Okay, I've fixed one issue with priority messages that, aside from broke...
- osa (8/12) Sep 30 2010 I've also thought about switching to 'send' if the receiver queue is
- Sean Kelly (3/10) Sep 30 2010 The current API is designed to apply to in-process and out-of-process me...
- osa (9/19) Sep 30 2010 I see. It is reasonable if out-of-process messaging is going to be
- Sean Kelly (2/16) Sep 30 2010 It will be. But I want to get the bumps smoothed out for in-process mes...
- Sean Kelly (27/41) Oct 08 2010 thrown as PriorityMessage!(T), and this exception is generated when the ...
- osa (5/17) Oct 08 2010 Wow! This is a really good improvement. Thanks! I assume this is in
I started using std.concurrency in some projects and overall it feels like a solid (albeit minimalistic) design. However, current implementation has some issues. For example, I've noticed that using prioritySend slows everything considerably. Here is a simple benchmark to demonstrate the problem: --------- import std.concurrency; import std.date; import std.stdio; struct Message {} void main() { enum TIME_LIMIT = 5 * ticksPerSecond; auto started = getUTCtime(); d_time running = 0; long iterations = 0; while( running < TIME_LIMIT ) { version( priority ) { prioritySend( thisTid, Message() ); } else { send( thisTid, Message() ); } receive( (Message){} ); if( ++iterations % 100 == 0 ) { running = getUTCtime() - started; } } auto seconds = cast(double)running / ticksPerSecond; writeln( "Benchmark: ", iterations, " iterations in ", seconds, " seconds (", iterations / seconds, "/second)" ); } --------- Using dmd v2.049 on linux, this produces: Benchmark: 4469600 iterations in 5 seconds (893920/second) But when compiled with -version=priority, result is quite different: Benchmark: 3700 iterations in 5.177 seconds (714.7/second) This is about 1250 times slower than using send! Is there any reason for such penalty for using prioritySend? Note that benchmark code is single-threaded. Initial version was using two threads (with similar discrepancy between send and prioritySend) but when I've tried to run it after compiling with -profile, it did not work. I assume that profiling is not supported for multi-threaded programs yet? So I profiled single-threaded benchmark and it seems that the main offender is PriorityMessageException constructor: Num Tree Func Per Calls Time Time Call 1700 777986427 777986427 457639 class std.concurrency.PriorityMessageException!(struct concur1.Message).PriorityMessageException std.concurrency.PriorityMessageException!(struct concur1.Message).PriorityMessageException.__ctor(struct concur1.Message) P.S. demangle program example at http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken -- it does not compile. P.P.S. std.demangle fails for some symbols, for example: _D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb _D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec and many other.
Sep 29 2010
On Wed, 29 Sep 2010 14:25:07 -0400, osa <osa aso.osa> wrote:P.S. demangle program example at http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken -- it does not compile. P.P.S. std.demangle fails for some symbols, for example: _D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb _D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec and many other.Note, core.demangle will probably soon replace std.demangle, and is actively being developed. You may need to download the svn version of druntime. ref: http://lists.puremagic.com/pipermail/phobos/2010-September/002376.html -Steve
Sep 29 2010
On Wed, 29 Sep 2010 22:31:53 +0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 29 Sep 2010 14:25:07 -0400, osa <osa aso.osa> wrote:IIRC, core.demangle is already in dmd2.049 (i.e. no need for an svn version unless there are significant changes in core.demangle in trunk).P.S. demangle program example at http://www.digitalmars.com/d/2.0/phobos/std_demangle.html is broken -- it does not compile. P.P.S. std.demangle fails for some symbols, for example: _D3std5array13__T5emptyTyaZ5emptyFNdxAyaZb _D3std6format19__T10FormatSpecTyaZ10FormatSpec6__ctorMFNcxAyaZS3std6format19__T10FormatSpecTyaZ10FormatSpec and many other.Note, core.demangle will probably soon replace std.demangle, and is actively being developed. You may need to download the svn version of druntime. ref: http://lists.puremagic.com/pipermail/phobos/2010-September/002376.html -Steve
Sep 29 2010
osa Wrote:I started using std.concurrency in some projects and overall it feels like a solid (albeit minimalistic) design. However, current implementation has some issues. For example, I've noticed that using prioritySend slows everything considerably.Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it's thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.
Sep 30 2010
Sean Kelly Wrote:osa Wrote:Okay, I've fixed one issue with priority messages that, aside from broken behavior, has increased performance somewhat. Here are the timings: Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built without -version=priority Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with -version=priority before fix Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with version=priority after fix The remaining issue has to do with the fact that the exception is constructed when the send is issued and when this exception is constructed a stack trace is generated as well. I'll have to modify Throwable so that derived classes can specify that no trace be generated. That or eliminate constructing the exception at the send site and change how that exception is represented.I started using std.concurrency in some projects and overall it feels like a solid (albeit minimalistic) design. However, current implementation has some issues. For example, I've noticed that using prioritySend slows everything considerably.Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it's thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.
Sep 30 2010
On 09/30/2010 01:45 PM, Sean Kelly wrote:Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built without -version=priority Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with -version=priority before fix Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with version=priority after fixSeems to be about an order of magnitude improvement. Not too bad.The remaining issue has to do with the fact that the exception is constructed when the send is issued and when this exception is constructed a stack trace is generated as well. I'll have to modify Throwable so that derived classes can specify that no trace be generated. That or eliminate constructing the exception at the send site and change how that exception is represented.I've also thought about switching to 'send' if the receiver queue is empty, but there is no way in std.concurrency API to check for that. Is there any serious issue with adding such method? I understand that in multi-threaded environment an empty queue as told by 'isEmpty' call may become non-empty before that fact is used, but in some situations approximate result (means empty or almost empty) is fine.
Sep 30 2010
osa Wrote:I've also thought about switching to 'send' if the receiver queue is empty, but there is no way in std.concurrency API to check for that. Is there any serious issue with adding such method? I understand that in multi-threaded environment an empty queue as told by 'isEmpty' call may become non-empty before that fact is used, but in some situations approximate result (means empty or almost empty) is fine.The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit. I think this is really more of just a tuning issue. And in fact, that the PriorityMessage exception is a template isn't feasible for out-of-process messaging, so this is an issue that has to be addressed at some point anyway. I think I'm going to both change the exception to be generated within receive() only if needed, have it contain a variant instead of a templated type, and possibly also not generate a stack trace for it. I haven't decided whether a trace is meaningful in this context. Getting a PriorityMessage exception could imply a failure to receive() a type required by the application design so a trace might be a good indication of where the error is... or maybe that's just wrong. I'm looking into the hang issue as well... it's just less obvious where the problem is there.
Sep 30 2010
On 09/30/2010 03:33 PM, Sean Kelly wrote:osa Wrote:I see. It is reasonable if out-of-process messaging is going to be implemented.I've also thought about switching to 'send' if the receiver queue is empty, but there is no way in std.concurrency API to check for that. Is there any serious issue with adding such method? I understand that in multi-threaded environment an empty queue as told by 'isEmpty' call may become non-empty before that fact is used, but in some situations approximate result (means empty or almost empty) is fine.The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit.Getting a PriorityMessage exception could imply a failure to receive() a type required by the application design so a trace might be a good indication of where the error is... or maybe that's just wrong.I'd say that having a trace for exceptions thrown by recieve may be useful only if you have many receieve() calls scattered all over the code, with try...catch on the very top level. But my (limited) experience with std.concurrency way of thread communication tells me that it is a bad idea; I'd use as few calls to receive() as possible and keep them close to each other. But people's mileage may vary.
Sep 30 2010
osa Wrote:On 09/30/2010 03:33 PM, Sean Kelly wrote:It will be. But I want to get the bumps smoothed out for in-process messaging first.osa Wrote:I see. It is reasonable if out-of-process messaging is going to be implemented.I've also thought about switching to 'send' if the receiver queue is empty, but there is no way in std.concurrency API to check for that. Is there any serious issue with adding such method? I understand that in multi-threaded environment an empty queue as told by 'isEmpty' call may become non-empty before that fact is used, but in some situations approximate result (means empty or almost empty) is fine.The current API is designed to apply to in-process and out-of-process messaging, so a function like that doesn't really fit.
Sep 30 2010
== Quote from Sean Kelly (sean invisibleduck.org)'s articleSean Kelly Wrote:thrown as PriorityMessage!(T), and this exception is generated when the send occurs, since static type info isn't available at the receive side when it's needed for this). I had originally thought that the difference was just more garbage collections, but calling GC.disable only increases the number of priority messages sent by about 1000. I'll have to look at the code to see if I can figure out what's going on.osa Wrote:I started using std.concurrency in some projects and overall it feels like a solid (albeit minimalistic) design. However, current implementation has some issues. For example, I've noticed that using prioritySend slows everything considerably.Thanks for this. I can tell you that prioritySend performs an extra allocation to account for a design requirement (if a priority message isn't received it'sOkay, I've fixed one issue with priority messages that, aside from broken behavior, has increased performance somewhat. Here are the timings: Benchmark: 5944400 iterations in 5 seconds (1.18888e+06/second) -- built without -version=priority Benchmark: 4900 iterations in 5.119 seconds (957.218/second) -- build with -version=priority before fix Benchmark: 39700 iterations in 5.001 seconds (7938.41/second) -- built with version=priority after fix The remaining issue has to do with the fact that the exception is constructed when the send is issued and when this exception is constructed a stack trace isgenerated as well. I'll have to modify Throwable so that derived classes can specify that no trace be generated. That or eliminate constructing the exception at the send site and change how that exception is represented. I just made some functional changes to how priority messages are sent and added a few performance tweaks to messaging in general. The only visible difference should be that PriorityMessageException is no longer a template class but instead contains a Variant, which is something that would have been necessary for inter-process messaging anyway. Here are the timings: --- Before --- $ dmd -inline -release -O priority Benchmark: 5749600 iterations in 5 seconds (1.14992e+06/second) Benchmark: 5747800 iterations in 5 seconds (1.14956e+06/second) Benchmark: 5748200 iterations in 5 seconds (1.14964e+06/second) $ dmd -inline -release -O priority -version=priority Benchmark: 39100 iterations in 5.01 seconds (7804.39/second) Benchmark: 39100 iterations in 5.01 seconds (7804.39/second) Benchmark: 39100 iterations in 5 seconds (7820/second) --- After --- $ dmd -inline -release -O priority Benchmark: 7204200 iterations in 5 seconds (1.44084e+06/second) Benchmark: 7167000 iterations in 5 seconds (1.4334e+06/second) Benchmark: 7164400 iterations in 5 seconds (1.43288e+06/second) $ dmd -inline -release -O priority -version=priority Benchmark: 7442500 iterations in 5 seconds (1.4885e+06/second) Benchmark: 7448600 iterations in 5 seconds (1.48972e+06/second) Benchmark: 7421800 iterations in 5 seconds (1.48436e+06/second)
Oct 08 2010
On 10/08/2010 04:29 PM, Sean Kelly wrote:I just made some functional changes to how priority messages are sent and added a few performance tweaks to messaging in general. The only visible difference should be that PriorityMessageException is no longer a template class but instead contains a Variant, which is something that would have been necessary for inter-process messaging anyway. Here are the timings: --- After --- $ dmd -inline -release -O priority Benchmark: 7204200 iterations in 5 seconds (1.44084e+06/second) Benchmark: 7167000 iterations in 5 seconds (1.4334e+06/second) Benchmark: 7164400 iterations in 5 seconds (1.43288e+06/second) $ dmd -inline -release -O priority -version=priority Benchmark: 7442500 iterations in 5 seconds (1.4885e+06/second) Benchmark: 7448600 iterations in 5 seconds (1.48972e+06/second) Benchmark: 7421800 iterations in 5 seconds (1.48436e+06/second)Wow! This is a really good improvement. Thanks! I assume this is in phobos SVN already, so I'll try to build my application (not simplified benchmark) using updated std.concurrency to see how it performs now. I'll let you know if something is wrong ;)
Oct 08 2010