digitalmars.D.learn - Phaser (single writer Disruptor) feedback
- Low Functioning (64/64) Aug 09 2014 https://bitbucket.org/nriddick/phaser
https://bitbucket.org/nriddick/phaser So I made this thing for cross-thread data sharing/passing. It has the functionality I wanted out of it. Here's the gist of how I use it: auto phaser = new Phaser!(type,power2_number); foreach(i;0..threads) { spawn(&pull_function, phaser.subscribe); } phaser.put(stuff); void pull_function(Subscription!type sub)() { //something like this auto bite = sub.chomp; //stuff sub.advance(bite.chompLength); //or this sub.pull(SomeOutputRange); //stuff } See the p_blah_setup functions in handy.phaser for a more complete picture. Basically you make a buffer, you make subscriptions to that buffer and pass those subscriptions around. I have no idea what happens if two readers share a subscription, but you probably shouldn't do it. I have no idea if it is really thread-safe, but I haven't seen a source/sink mismatch since I hashed out the structure and that Works For Me. On my i5 2500K, using DMD 2.065 -m64 under Windows 7 64 the typical test bench output looks like this: secs push uint[1]x2097153 buflen 1 (factor 2.09715e+06) threads 1: avg:0.886566 min:0.87332 max:0.915093 secs push uint[1]x2097153 buflen 1 (factor 2.09715e+06) threads 3: avg:1.24023 min:1.15614 max:1.30319 secs push uint[1]x67108864 buflen 65536 (factor 1024) threads 1: avg:0.109152 min:0.0946374 max:0.123237 secs push uint[1]x67108864 buflen 65536 (factor 1024) threads 3: avg:0.175861 min:0.108439 max:0.383244 secs push long[1]x33554432 buflen 32768 (factor 1024) threads 1: avg:0.0576159 min:0.0573254 max:0.0578606 rate push uint[1]x8193 buflen 65536 threads 1: avg:4.19864e+09 min:4.18432e+09 max:4.24545e+09 rate push uint[1]x8193 buflen 65536 threads 3: avg:3.96824e+09 min:3.9259e+09 max:4.00455e+09 rate push long[1]x4097 buflen 65536 threads 1: avg:2.0372e+09 min:1.99014e+09 max:2.0575e+09 rate push long[1]x4097 buflen 65536 threads 3: avg:1.94786e+09 min:1.93523e+09 max:1.95909e+09 So pushing 2^21 uints through a buffer of size 1 takes an average of 422ns per element for 1 thread -> 1 thread, or 591ns for 1 thread -> 3 threads. I haven't tried to precisely time the one-way visibility latency of putting a single element. 1->1 bulk push: I get about 600-700 million uint/s throughput on the next two tests (2^26 through buffer of 2^16). Going for really, really big numbers here eats a lot of memory to the point of swapping to disk. Then there's the "absolute" throughput tests where the writer spins pushing the same chunk over and over, and the readers just chomp, advance and increment a counter with the length. 1->1, this gets about 4.2 billion uint/s or 15.64GB/s; ~15.5GB/s is consistent when switching to longs. Maybe there's some medium between copying through the buffer, and not even touching the read elements, for a more useful "absolute" throughput. Fire at will.
Aug 09 2014