www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Phaser (single writer Disruptor) feedback

https://bitbucket.org/nriddick/phaser

So I made this thing for cross-thread data sharing/passing. It 
has the functionality I wanted out of it. Here's the gist of how 
I use it:

auto phaser = new Phaser!(type,power2_number);
foreach(i;0..threads) {
	spawn(&pull_function, phaser.subscribe);
}
phaser.put(stuff);

void pull_function(Subscription!type sub)() {
	//something like this
	auto bite = sub.chomp;
	//stuff
	sub.advance(bite.chompLength);
	//or this
	sub.pull(SomeOutputRange);
	//stuff
}

See the p_blah_setup functions in handy.phaser for a more 
complete picture.

Basically you make a buffer, you make subscriptions to that 
buffer and pass those subscriptions around. I have no idea what 
happens if two readers share a subscription, but you probably 
shouldn't do it.

I have no idea if it is really thread-safe, but I haven't seen a 
source/sink mismatch since I hashed out the structure and that 
Works For Me.

On my i5 2500K, using DMD 2.065 -m64 under Windows 7 64 the 
typical test bench output looks like this:

secs push uint[1]x2097153 buflen 1 (factor 2.09715e+06) threads 1:
	avg:0.886566 min:0.87332 max:0.915093
secs push uint[1]x2097153 buflen 1 (factor 2.09715e+06) threads 3:
	avg:1.24023 min:1.15614 max:1.30319
secs push uint[1]x67108864 buflen 65536 (factor 1024) threads 1:
	avg:0.109152 min:0.0946374 max:0.123237
secs push uint[1]x67108864 buflen 65536 (factor 1024) threads 3:
	avg:0.175861 min:0.108439 max:0.383244
secs push long[1]x33554432 buflen 32768 (factor 1024) threads 1:
	avg:0.0576159 min:0.0573254 max:0.0578606
rate push uint[1]x8193 buflen 65536 threads 1:
	avg:4.19864e+09 min:4.18432e+09 max:4.24545e+09
rate push uint[1]x8193 buflen 65536 threads 3:
	avg:3.96824e+09 min:3.9259e+09 max:4.00455e+09
rate push long[1]x4097 buflen 65536 threads 1:
	avg:2.0372e+09 min:1.99014e+09 max:2.0575e+09
rate push long[1]x4097 buflen 65536 threads 3:
	avg:1.94786e+09 min:1.93523e+09 max:1.95909e+09

So pushing 2^21 uints through a buffer of size 1 takes an average 
of 422ns per element for 1 thread -> 1 thread, or 591ns for 1 
thread -> 3 threads. I haven't tried to precisely time the 
one-way visibility latency of putting a single element.
1->1 bulk push: I get about 600-700 million uint/s throughput on 
the next two tests (2^26 through buffer of 2^16). Going for 
really, really big numbers here eats a lot of memory to the point 
of swapping to disk.
Then there's the "absolute" throughput tests where the writer 
spins pushing the same chunk over and over, and the readers just 
chomp, advance and increment a counter with the length. 1->1, 
this gets about 4.2 billion uint/s or 15.64GB/s; ~15.5GB/s is 
consistent when switching to longs.
Maybe there's some medium between copying through the buffer, and 
not even touching the read elements, for a more useful "absolute" 
throughput.

Fire at will.
Aug 09 2014