digitalmars.D.learn - improve concurrent queue

luminousone (85/85) Aug 26 2013 I have been working on a project and needed a good concurrent

qznc (10/12) Aug 27 2013 What does "good" mean? Concurrent queues have so many design

luminousone (13/26) Aug 27 2013 Generally, multiple writers and a single reader, but multiple

Guillaume Piolat (11/21) Jun 04 2016 A mutex usually have a very fast uncontended path. It only blocks

Dejan Lekic (4/6) Jun 03 2016 The research paper can actually be freely downloaded from

"luminousone" <rd.hunt gmail.com> writes:

I have been working on a project and needed a good concurrent 
queue, so I wrote one, is their anyone more familiar with the 
atomic arts that could tell me if their is anyway i can further 
improve it.

module container.concurrentqueue;

import std.typetuple;
import core.atomic;

class ConcurrentQueue( items...  ) {
	
	align(64) class nodeType {
		align(1):
		this( ) { atomicStore( this.next, cast(shared nodeType) null ); 
}
		this( TypeTuple!items value ) {
			foreach( k, v ; value ) {
				this.value[k] = v;
			}
			this();
		}
		
		TypeTuple!items value;
		shared nodeType next;
	}
	
	class ConsumerResult {
		TypeTuple!items value;
		
		alias value this;
	}
	
	public this() {
		shared nodeType temp = cast(shared)new nodeType( );
		
		atomicStore( first, temp );
		atomicStore( last , temp );
		atomicStore( producerLock, false );
		atomicStore( consumerLock, false );
	}	
	
	public void Produce( items item ) {
		
		TypeTuple!items t = item;
		shared nodeType temp = cast(shared)new nodeType ( t );
		
		while( !cas(&producerLock, false, true ) ) { }
		
		atomicStore( last.next   , temp );
		atomicStore( last        , temp );
		atomicStore( producerLock, false );
	}
	
	public ConsumerResult Consume( ) {
		while( !cas(&consumerLock, false, true ) ) { }
		
		shared nodeType temp = cast(shared)atomicLoad( first );
		shared nodeType next = cast(shared)atomicLoad( temp.next );
		
		ConsumerResult result = new ConsumerResult();
		
		if( next !is null ) {
			foreach( k, v ; items ) {
				result[k] = cast(v)next.value[k];
			}
			first = next;
			atomicStore( consumerLock, false );
			return result;
		}
		atomicStore( consumerLock, false );
		return null;
	}
	
	private shared nodeType first;
	
	private byte padd1[64];
	
	private shared nodeType last;
	
	private byte padd2[64];
	
	private shared bool consumerLock;
	
	private byte padd3[64];
	
	private shared bool producerLock;
}

Aug 26 2013

"qznc" <qznc web.de> writes:

On Monday, 26 August 2013 at 19:35:28 UTC, luminousone wrote:
 I have been working on a project and needed a good concurrent 
 queue

What does "good" mean? Concurrent queues have so many design 
parameters (single reader? single writer? blocking? bounded? etc) 
and there is no queue, which performs optimal for every case.

I can recommand this paper (paywalled though):
http://dl.acm.org/citation.cfm?id=248106

Nevertheless, I believe you rather wanted some comments on your 
specific code. I wonder why you implemented spinlocks yourself? 
If you write a blocking algorithm, you should probably use the 
phobos locks.

Aug 27 2013

"luminousone" <rd.hunt gmail.com> writes:

On Tuesday, 27 August 2013 at 17:35:13 UTC, qznc wrote:
 On Monday, 26 August 2013 at 19:35:28 UTC, luminousone wrote:
 I have been working on a project and needed a good concurrent 
 queue

 What does "good" mean? Concurrent queues have so many design 
 parameters (single reader? single writer? blocking? bounded? 
 etc) and there is no queue, which performs optimal for every 
 case.

Generally, multiple writers and a single reader, but multiple 
writers and multiple readers may also occur.

 I can recommand this paper (paywalled though):
 http://dl.acm.org/citation.cfm?id=248106

 Nevertheless, I believe you rather wanted some comments on your 
 specific code. I wonder why you implemented spinlocks yourself? 
 If you write a blocking algorithm, you should probably use the 
 phobos locks.

I was under the impression that the atomic spinlock has a lower 
latency for any waiters, then a mutex when its unlocked?

I am using this for a temporary or depending on performance, a 
perminate replacement for std.concurrency send/receive until such 
time as bugs in std.variant are fixed.

I have a thread with an opengl context, and I am using the queue 
to send messages to this thread containing interface Drawable 
objects, that have any needed preprocessing on them done such 
that only opengl calls have to be made to the video card to 
render it.

Aug 27 2013

Guillaume Piolat <first.last gmail.com> writes:

On Tuesday, 27 August 2013 at 19:50:03 UTC, luminousone wrote:
 I was under the impression that the atomic spinlock has a lower 
 latency for any waiters, then a mutex when its unlocked?

 I am using this for a temporary or depending on performance, a 
 perminate replacement for std.concurrency send/receive until 
 such time as bugs in std.variant are fixed.

 I have a thread with an opengl context, and I am using the 
 queue to send messages to this thread containing interface 
 Drawable objects, that have any needed preprocessing on them 
 done such that only opengl calls have to be made to the video 
 card to render it.

A mutex usually have a very fast uncontended path. It only blocks 
and is slow when already taken.

You can think of it like a spinlock + a blocking safety net.
In the uncontended case, the mutex with fast path will hardly be 
any slower than a spinlock.


Use a spinlock instead if:
- you are dead sure you won't spin for "too long"
- you don't need being reentrant
- you don't want this performance hysteresis (fast while 
uncontended, slow when things begin to creep in thread queues)

Jun 04 2016

Dejan Lekic <dejan.lekic gmail.com> writes:

On Tuesday, 27 August 2013 at 17:35:13 UTC, qznc wrote:
 I can recommand this paper (paywalled though):
 http://dl.acm.org/citation.cfm?id=248106

The research paper can actually be freely downloaded from 
http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA309412 . Good article, 
thanks!

Jun 03 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - improve concurrent queue