www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - improve concurrent queue

reply "luminousone" <rd.hunt gmail.com> writes:
I have been working on a project and needed a good concurrent 
queue, so I wrote one, is their anyone more familiar with the 
atomic arts that could tell me if their is anyway i can further 
improve it.

module container.concurrentqueue;

import std.typetuple;
import core.atomic;

class ConcurrentQueue( items...  ) {
	
	align(64) class nodeType {
		align(1):
		this( ) { atomicStore( this.next, cast(shared nodeType) null ); 
}
		this( TypeTuple!items value ) {
			foreach( k, v ; value ) {
				this.value[k] = v;
			}
			this();
		}
		
		TypeTuple!items value;
		shared nodeType next;
	}
	
	class ConsumerResult {
		TypeTuple!items value;
		
		alias value this;
	}
	
	public this() {
		shared nodeType temp = cast(shared)new nodeType( );
		
		atomicStore( first, temp );
		atomicStore( last , temp );
		atomicStore( producerLock, false );
		atomicStore( consumerLock, false );
	}	
	
	public void Produce( items item ) {
		
		TypeTuple!items t = item;
		shared nodeType temp = cast(shared)new nodeType ( t );
		
		while( !cas(&producerLock, false, true ) ) { }
		
		atomicStore( last.next   , temp );
		atomicStore( last        , temp );
		atomicStore( producerLock, false );
	}
	
	public ConsumerResult Consume( ) {
		while( !cas(&consumerLock, false, true ) ) { }
		
		shared nodeType temp = cast(shared)atomicLoad( first );
		shared nodeType next = cast(shared)atomicLoad( temp.next );
		
		ConsumerResult result = new ConsumerResult();
		
		if( next !is null ) {
			foreach( k, v ; items ) {
				result[k] = cast(v)next.value[k];
			}
			first = next;
			atomicStore( consumerLock, false );
			return result;
		}
		atomicStore( consumerLock, false );
		return null;
	}
	
	private shared nodeType first;
	
	private byte padd1[64];
	
	private shared nodeType last;
	
	private byte padd2[64];
	
	private shared bool consumerLock;
	
	private byte padd3[64];
	
	private shared bool producerLock;
}
Aug 26 2013
parent reply "qznc" <qznc web.de> writes:
On Monday, 26 August 2013 at 19:35:28 UTC, luminousone wrote:
 I have been working on a project and needed a good concurrent 
 queue
What does "good" mean? Concurrent queues have so many design parameters (single reader? single writer? blocking? bounded? etc) and there is no queue, which performs optimal for every case. I can recommand this paper (paywalled though): http://dl.acm.org/citation.cfm?id=248106 Nevertheless, I believe you rather wanted some comments on your specific code. I wonder why you implemented spinlocks yourself? If you write a blocking algorithm, you should probably use the phobos locks.
Aug 27 2013
next sibling parent reply "luminousone" <rd.hunt gmail.com> writes:
On Tuesday, 27 August 2013 at 17:35:13 UTC, qznc wrote:
 On Monday, 26 August 2013 at 19:35:28 UTC, luminousone wrote:
 I have been working on a project and needed a good concurrent 
 queue
What does "good" mean? Concurrent queues have so many design parameters (single reader? single writer? blocking? bounded? etc) and there is no queue, which performs optimal for every case.
Generally, multiple writers and a single reader, but multiple writers and multiple readers may also occur.
 I can recommand this paper (paywalled though):
 http://dl.acm.org/citation.cfm?id=248106

 Nevertheless, I believe you rather wanted some comments on your 
 specific code. I wonder why you implemented spinlocks yourself? 
 If you write a blocking algorithm, you should probably use the 
 phobos locks.
I was under the impression that the atomic spinlock has a lower latency for any waiters, then a mutex when its unlocked? I am using this for a temporary or depending on performance, a perminate replacement for std.concurrency send/receive until such time as bugs in std.variant are fixed. I have a thread with an opengl context, and I am using the queue to send messages to this thread containing interface Drawable objects, that have any needed preprocessing on them done such that only opengl calls have to be made to the video card to render it.
Aug 27 2013
parent Guillaume Piolat <first.last gmail.com> writes:
On Tuesday, 27 August 2013 at 19:50:03 UTC, luminousone wrote:
 I was under the impression that the atomic spinlock has a lower 
 latency for any waiters, then a mutex when its unlocked?

 I am using this for a temporary or depending on performance, a 
 perminate replacement for std.concurrency send/receive until 
 such time as bugs in std.variant are fixed.

 I have a thread with an opengl context, and I am using the 
 queue to send messages to this thread containing interface 
 Drawable objects, that have any needed preprocessing on them 
 done such that only opengl calls have to be made to the video 
 card to render it.
A mutex usually have a very fast uncontended path. It only blocks and is slow when already taken. You can think of it like a spinlock + a blocking safety net. In the uncontended case, the mutex with fast path will hardly be any slower than a spinlock. Use a spinlock instead if: - you are dead sure you won't spin for "too long" - you don't need being reentrant - you don't want this performance hysteresis (fast while uncontended, slow when things begin to creep in thread queues)
Jun 04 2016
prev sibling parent Dejan Lekic <dejan.lekic gmail.com> writes:
On Tuesday, 27 August 2013 at 17:35:13 UTC, qznc wrote:
 I can recommand this paper (paywalled though):
 http://dl.acm.org/citation.cfm?id=248106
The research paper can actually be freely downloaded from http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA309412 . Good article, thanks!
Jun 03 2016