digitalmars.D - Wait-free thread communication

Jin (27/27) Jan 08 2016 Idea: no mutex, no CAS, only one direction queues without any

=?UTF-8?B?Tm9yZGzDtnc=?= (5/9) Jan 08 2016 Very interesting. D has builtin unittests. You should add stress

Jin (5/15) Jan 08 2016 I just add unit tests. But how to idiomatic implement benchmarks

thedeemon (18/22) Jan 09 2016 Sorry, that's not looking any good.

Jin (17/33) Jan 09 2016 No, jin.go creates new native thread on every call. And this is

thedeemon (3/6) Jan 09 2016 Ah, sorry, I misread the source. FiberScheduler got me

Jin (2/8) Jan 09 2016 This is artefact of experiments with fibers :-)

Andy Smith (8/36) Jan 09 2016 I'm a little worried you have no volatile writes or fences around

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Jan 09 2016 But not on ARM, so he should use atomic acquire-release semantics

David Nadlinger (21/31) Jan 09 2016 Not only that. It's a problem on x86 as well because advanced

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/28) Jan 09 2016 Yes, well, he had a sleep() in there that might prevent the

David Nadlinger (10/13) Jan 10 2016 I wholeheartedly agree with that statement. However, if one

Jin (3/9) Jan 09 2016 This works fine in my tests, but how to enforce memory barrier in
Jin (5/11) Jan 09 2016 I just add atomic fence for push and take:

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/27) Jan 09 2016 You need it in the tests.

Jin (8/14) Jan 09 2016 If memory writes will reorder, then current tests will fail.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/7) Jan 09 2016 They have to be atomic if you want your code to be portable.

Jin (3/4) Jan 09 2016 Do not want yet :-)

Andy Smith (23/27) Jan 09 2016 Okay I've just refreshed my mental X86/64 memory model :-) What

David Nadlinger (7/10) Jan 09 2016 It might work on DMD, but I know from experience (one-to-many

Andy Smith (5/15) Jan 09 2016 Ah Cheers David... I think our messages crossed :-)

David Nadlinger (6/7) Jan 09 2016 Either you are not calling it in the way you think you are, then,

Martin Nowak (4/9) Jan 10 2016 Don't do this, memory fences are expensive.

Martin Nowak (12/20) Jan 10 2016 Yes single-reader single-writer queue are the fastest way for

Jin (3/10) Jan 13 2016 I am using Waiter for this

Jin (3/3) Mar 27 2016 I just use this queues to implement Go like api for concurrency

Dmitry Olshansky (4/7) Mar 28 2016 If nothing changed implementation-wise this is just data-racy queues :)

Jin (2/10) Mar 28 2016 Why?

Dmitry Olshansky (8/18) Mar 28 2016 All I see is a ring buffer with hand-wavy atomicFence on one of mutating...

Jin (7/17) Mar 28 2016 Here the fibers are used.

Jin <nin-jin ya.ru> writes:

Idea: no mutex, no CAS, only one direction queues without any 
locks.

My prototype (https://github.com/nin-jin/go.d) is up to 10x 
faster than std.concurrency.send/receive

---
writers =512
readers =1
std.concurency milliseconds=1313
jin.go milliseconds=113
---

Realization:

Queue! - buffered static typed channel. One and only one thread 
can push data to queue, and one and only one can take data from 
queue. Thread blocks when he takes from empty queue and pushes to 
fulfilled queue.

Queues! - list of queues with round robin balancing

go! - starts new thread, in future i want to implement something 
like goroutine, but i do not understand how yet.

I need some good advice, i am newbie in D :-)

Problems:

For blocking thread i use loop with Thread.sleep - this is bad 
decision IMHO.

On my system i can create only up to 1000 threads without errors. 
Fibers from core.thread or Tasks from std.parallelism potentiaдly 
can resolve this problem, but how to integrate their gracefully.

API of jin-go can be better?

Which dub modules can help me?

Jan 08 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Friday, 8 January 2016 at 16:58:59 UTC, Jin wrote:
 Idea: no mutex, no CAS, only one direction queues without any 
 locks.

 My prototype (https://github.com/nin-jin/go.d) is up to 10x 
 faster than std.concurrency.send/receive

Very interesting. D has builtin unittests. You should add stress 
unittests to assure that your logic is correct. You can start by 
searching for keyword `unittest` in the std.concurrency module 
for advice how to do this.

Jan 08 2016

Jin <nin-jin ya.ru> writes:

On Friday, 8 January 2016 at 18:07:49 UTC, Nordlöw wrote:
 On Friday, 8 January 2016 at 16:58:59 UTC, Jin wrote:
 Idea: no mutex, no CAS, only one direction queues without any 
 locks.

 My prototype (https://github.com/nin-jin/go.d) is up to 10x 
 faster than std.concurrency.send/receive

 Very interesting. D has builtin unittests. You should add 
 stress unittests to assure that your logic is correct. You can 
 start by searching for keyword `unittest` in the 
 std.concurrency module for advice how to do this.

I just add unit tests. But how to idiomatic implement benchmarks 
(compiling must be in release mode)? Currently, i was implement 
main function in app.d and run with "dub --build=release", but 
nobody can import my module.

Jan 08 2016

thedeemon <dlang thedeemon.com> writes:

On Friday, 8 January 2016 at 16:58:59 UTC, Jin wrote:
 Idea: no mutex, no CAS, only one direction queues without any 
 locks.

 My prototype (https://github.com/nin-jin/go.d) is up to 10x 
 faster than std.concurrency.send/receive

Sorry, that's not looking any good.
You call it wait-free when in fact it's just the opposite: if a 
queue buffer is full on push  it just waits in Thread.sleep which 
is
1) not wait-free at all
2) very heavy (call to kernel, context switch)

And when buffer is not full/empty it works as a simple 
single-threaded queue, which means it's fine for using with 
fibers inside one thread but will not work correctly in 
multithreaded setting.

Your benchmarks show time difference in your favor just because 
you compare very different things: your queue is benchmarked in 
single thread with fibers while std.concurrency is measured with 
multiple threads communicating with each other. Doing many 
switches between threads is much slower than switching between 
fibers in one thread, hence the time difference. It's not that 
your queue is any good, it's just you measure it wrong.

Jan 09 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 08:28:33 UTC, thedeemon wrote:
 Your benchmarks show time difference in your favor just because 
 you compare very different things: your queue is benchmarked in 
 single thread with fibers while std.concurrency is measured 
 with multiple threads communicating with each other. Doing many 
 switches between threads is much slower than switching between 
 fibers in one thread, hence the time difference. It's not that 
 your queue is any good, it's just you measure it wrong.

No, jin.go creates new native thread on every call. And this is 
problem :-) We can not create thousand of threads without errors.

std.concurrency uses mutex to synchromize access to message 
queue. Cost of syncronization is proportional to count of threads.

On Saturday, 9 January 2016 at 08:28:33 UTC, thedeemon wrote:
 You call it wait-free when in fact it's just the opposite: if a 
 queue buffer is full on push  it just waits in Thread.sleep 
 which is
 1) not wait-free at all
 2) very heavy (call to kernel, context switch)
 And when buffer is not full/empty it works as a simple 
 single-threaded queue, which means it's fine for using with 
 fibers inside one thread but will not work correctly in 
 multithreaded setting.

Taking data from empty queue and pushing data to full queue is 
logicaly impossible for queues. We have 3 strategies for this 
cases:
1. Throwing runtime exception, like std.concurrency.send on full 
message box by default.
2. Blocking thread, like std.concurrency.receive on empty message 
box.
2. Skipping action.

jin-go uses blocking strategy by default and you can check 
queue.empty and queue.full if you want to implement other 
strategy.

Jan 09 2016

thedeemon <dlang thedeemon.com> writes:

On Saturday, 9 January 2016 at 11:00:01 UTC, Jin wrote:

 No, jin.go creates new native thread on every call. And this is 
 problem :-) We can not create thousand of threads without 
 errors.

Ah, sorry, I misread the source. FiberScheduler got me 
distracted. Why is it there?

Jan 09 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 13:55:16 UTC, thedeemon wrote:
 On Saturday, 9 January 2016 at 11:00:01 UTC, Jin wrote:

 No, jin.go creates new native thread on every call. And this 
 is problem :-) We can not create thousand of threads without 
 errors.

 Ah, sorry, I misread the source. FiberScheduler got me 
 distracted. Why is it there?

This is artefact of experiments with fibers :-)

Jan 09 2016

Andy Smith <andyrsmith googlemail.com> writes:

On Friday, 8 January 2016 at 16:58:59 UTC, Jin wrote:
 Idea: no mutex, no CAS, only one direction queues without any 
 locks.

 My prototype (https://github.com/nin-jin/go.d) is up to 10x 
 faster than std.concurrency.send/receive

 ---
 writers =512
 readers =1
 std.concurency milliseconds=1313
 jin.go milliseconds=113
 ---

 Realization:

 Queue! - buffered static typed channel. One and only one thread 
 can push data to queue, and one and only one can take data from 
 queue. Thread blocks when he takes from empty queue and pushes 
 to fulfilled queue.

 Queues! - list of queues with round robin balancing

 go! - starts new thread, in future i want to implement 
 something like goroutine, but i do not understand how yet.

 I need some good advice, i am newbie in D :-)

 Problems:

 For blocking thread i use loop with Thread.sleep - this is bad 
 decision IMHO.

 On my system i can create only up to 1000 threads without 
 errors. Fibers from core.thread or Tasks from std.parallelism 
 potentiaдly can resolve this problem, but how to integrate 
 their gracefully.

 API of jin-go can be better?

 Which dub modules can help me?


I'm a little worried you have no volatile writes or fences around 
your code when you 'publish' an event using head/tail etc. It 
looks like it's working but how are you ensuring no compiler/CPU 
reordering is ocurring. Does x86_64 actually allow you to get 
away with this? I know its memory model is stricter than others...

Cheeers,

A.

Jan 09 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 9 January 2016 at 14:20:18 UTC, Andy Smith wrote:
 I'm a little worried you have no volatile writes or fences 
 around your code when you 'publish' an event using head/tail 
 etc. It looks like it's working but how are you ensuring no 
 compiler/CPU reordering is ocurring. Does x86_64 actually allow 
 you to get away with this? I know its memory model is stricter 
 than others...

But not on ARM, so he should use atomic acquire-release semantics 
on indices for push/pull.

I suggest porting over spsc_queue from Boost.

Jan 09 2016

David Nadlinger <code klickverbot.at> writes:

On Saturday, 9 January 2016 at 15:16:43 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 9 January 2016 at 14:20:18 UTC, Andy Smith wrote:
 I'm a little worried you have no volatile writes or fences 
 around your code when you 'publish' an event using head/tail 
 etc. It looks like it's working but how are you ensuring no 
 compiler/CPU reordering is ocurring. Does x86_64 actually 
 allow you to get away with this? I know its memory model is 
 stricter than others...

 But not on ARM, so he should use atomic acquire-release 
 semantics on indices for push/pull.

Not only that. It's a problem on x86 as well because advanced 
optimizers like those in GDC or LDC will happily assume that the 
members are not written to by another thread (because there would 
be a race otherwise) and cache the loads or even eliminate some 
stores. Any guarantees the processor would offer are irrelevant 
if the compiler has already reordered your operations. It should 
be quite easy to see such effects. Just compile a simple test 
case on GDC or LDC with optimizations on.

 I suggest porting over spsc_queue from Boost.

That would certainly be a starting point, although a 
high-performance single-producer single-consumer queue is trivial 
to implement once you understand atomics.

Then again, I couldn't convince Andrei that a well-defined memory 
model (in which it even makes sense to talk about atomics in the 
first place) is important for D yet. Right now, you basically 
have to hope that everything works as in C++ – which is not a bad 
bet on GDC and LDC, and the weaker optimizer in DMD hides a lot 
of potential issues there – and stay away from things like 
`consume` loads that would depend on language semantics.

  — David

Jan 09 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 9 January 2016 at 17:42:41 UTC, David Nadlinger 
wrote:
 Not only that. It's a problem on x86 as well because advanced 
 optimizers like those in GDC or LDC will happily assume that 
 the members are not written to by another thread (because there 
 would be a race otherwise) and cache the loads or even 
 eliminate some stores. Any guarantees the processor would offer 
 are irrelevant if the compiler has already reordered your 
 operations. It should be quite easy to see such effects. Just 
 compile a simple test case on GDC or LDC with optimizations on.

Yes, well, he had a sleep() in there that might prevent the 
compiler from moving instructions etc, but even if so... it is a 
bad thing to rely on.

Explicit atomics document what is going on (even when not 
technically needed), and should always be used where you want 
atomic operations, I think.

 That would certainly be a starting point, although a 
 high-performance single-producer single-consumer queue is 
 trivial to implement once you understand atomics.

Yes. But my experience from writing custom multi-single queues is 
that it can end up harder than it looks to get it working and 
efficient. Don't be surprised if it takes 5-10x more time than 
anticipated to get it 100% right...

So why not start with something that is correct? You still have 
to spend quite a bit of time to verify that the code is 
identical... but that is much easier than to "formally" prove 
that your own implementation is correct.

(Intuition is often wrong in this area...)

Jan 09 2016

David Nadlinger <code klickverbot.at> writes:

On Saturday, 9 January 2016 at 22:44:53 UTC, Ola Fosheim Grøstad 
wrote:
 Yes. But my experience from writing custom multi-single queues 
 is that it can end up harder than it looks to get it working 
 and efficient. […] (Intuition is often wrong in this area...)

I wholeheartedly agree with that statement. However, if one 
doesn't understand atomics well enough to correctly implement 
even a simple single-single queue, I'm not sure whether one would 
be able to correctly port an existing implementation either.

Then again, the primitives might be similar enough between C++11 
and D so that a literal translation is possible without 
understanding what is going on.

  — David

Jan 10 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 14:20:18 UTC, Andy Smith wrote:
 I'm a little worried you have no volatile writes or fences 
 around your code when you 'publish' an event using head/tail 
 etc. It looks like it's working but how are you ensuring no 
 compiler/CPU reordering is ocurring. Does x86_64 actually allow 
 you to get away with this? I know its memory model is stricter 
 than others...

This works fine in my tests, but how to enforce memory barrier in 
D?

Jan 09 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 14:20:18 UTC, Andy Smith wrote:
 I'm a little worried you have no volatile writes or fences 
 around your code when you 'publish' an event using head/tail 
 etc. It looks like it's working but how are you ensuring no 
 compiler/CPU reordering is ocurring. Does x86_64 actually allow 
 you to get away with this? I know its memory model is stricter 
 than others...

I just add atomic fence for push and take:

		this.messages[ this.tail ] = value;
		atomicFence;
		this.tail = ( this.tail + 1 ) % this.size;

Jan 09 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 9 January 2016 at 15:51:51 UTC, Jin wrote:
 On Saturday, 9 January 2016 at 14:20:18 UTC, Andy Smith wrote:
 I'm a little worried you have no volatile writes or fences 
 around your code when you 'publish' an event using head/tail 
 etc. It looks like it's working but how are you ensuring no 
 compiler/CPU reordering is ocurring. Does x86_64 actually 
 allow you to get away with this? I know its memory model is 
 stricter than others...

 I just add atomic fence for push and take:

 		this.messages[ this.tail ] = value;
 		atomicFence;
 		this.tail = ( this.tail + 1 ) % this.size;

You need it in the tests.

I haven't used atomics in D, but you have

   atomicLoad(MemoryOrder ms = MemoryOrder.seq, T)(ref const 
shared T val)
   atomicStore(MemoryOrder ms = MemoryOrder.seq, T, V1)(ref shared 
T val, V1 newval)

The compiler should then be able to ignore it if the CPU handles 
it well enough, assuming the D implementation work. Something 
along the lines of:

   immutable i = atomicLoad!(MemoryOrder.raw)(tail);
   immutable n = (i+1)%size;

   if( n == atomicLoad!(MemoryOrder.acq)(head) ) return QUEUE_FULL;

   buffer[n] = ...;
   atomicStore(MemoryOrder.rel, tail, n);

Or something like that.

Jan 09 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 16:05:34 UTC, Ola Fosheim Grøstad 
wrote:
 You need it in the tests.

If memory writes will reorder, then current tests will fail. 
Memory reades can be in any order.

On Saturday, 9 January 2016 at 16:05:34 UTC, Ola Fosheim Grøstad 
wrote:
 I haven't used atomics in D, but you have

   atomicLoad(MemoryOrder ms = MemoryOrder.seq, T)(ref const 
 shared T val)
   atomicStore(MemoryOrder ms = MemoryOrder.seq, T, V1)(ref 
 shared T val, V1 newval)

I do not understand how to use its right. So i simple use 
atomicFence. Performance does not degrade.

Jan 09 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 9 January 2016 at 16:22:12 UTC, Jin wrote:
 On Saturday, 9 January 2016 at 16:05:34 UTC, Ola Fosheim 
 Grøstad wrote:
 You need it in the tests.

 If memory writes will reorder, then current tests will fail. 
 Memory reades can be in any order.

They have to be atomic if you want your code to be portable.

Jan 09 2016

Jin <nin-jin ya.ru> writes:

On Saturday, 9 January 2016 at 16:29:07 UTC, Ola Fosheim Grøstad 
wrote:
 They have to be atomic if you want your code to be portable.

Do not want yet :-)

Jan 09 2016

Andy Smith <andyrsmith googlemail.com> writes:

On Saturday, 9 January 2016 at 17:24:47 UTC, Jin wrote:
 On Saturday, 9 January 2016 at 16:29:07 UTC, Ola Fosheim 
 Grøstad wrote:
 They have to be atomic if you want your code to be portable.

 Do not want yet :-)

Okay I've just refreshed my mental X86/64 memory model :-) What 
you've done should be okay from a CPU-reordering perspective. In 
x86/64 Loads are not reordered with other loads and stores are 
not reordered with other stores. So from a CPU perspective you 
should be okay on X86/64.

There's still a grey area for me around possible *compiler* 
re-orderings... there's nothing to stop the compiler reordering 
those two lines with respect to each other because from an 
'as-if-sequential' perspective those two operations are 
commutative. Unfortunately I'm not well versed enough on the 
internals of the three main compiler versions to give an opinion 
on whether this will be a problem or not :-(

If your version is working for you on your platform/compiler, I'd 
recommend maybe being a bit conservative and wrap your code with 
appropriate version blocks until you know the answer to these 
questions for the other platforms/compilers (ARM, GDC, LDC etc.). 
And put appropriate explanation why that's the case in your 
README.md. I think that's just a basic courtesy to other users 
that you don't let them walk over a minefield without fair 
warning :-)

Cheers,

A.

Jan 09 2016

David Nadlinger <code klickverbot.at> writes:

On Saturday, 9 January 2016 at 17:44:26 UTC, Andy Smith wrote:
 Unfortunately I'm not well versed enough on the internals of 
 the three main compiler versions to give an opinion on whether 
 this will be a problem or not :-(

It might work on DMD, but I know from experience (one-to-many 
queues, i.e. single-consumer-multi-producer or the other way 
round) that the LDC optimizer will ruthlessly break code written 
in that "I know that it would work in assembly, let's just hope 
the compiler doesn't touch it" style.

  — David

Jan 09 2016

Andy Smith <andyrsmith googlemail.com> writes:

On Saturday, 9 January 2016 at 17:49:46 UTC, David Nadlinger 
wrote:
 On Saturday, 9 January 2016 at 17:44:26 UTC, Andy Smith wrote:
 Unfortunately I'm not well versed enough on the internals of 
 the three main compiler versions to give an opinion on whether 
 this will be a problem or not :-(

 It might work on DMD, but I know from experience (one-to-many 
 queues, i.e. single-consumer-multi-producer or the other way 
 round) that the LDC optimizer will ruthlessly break code 
 written in that "I know that it would work in assembly, let's 
 just hope the compiler doesn't touch it" style.

  — David

Ah Cheers David... I think our messages crossed :-)

Cheers,

A.

Jan 09 2016

David Nadlinger <code klickverbot.at> writes:

On Saturday, 9 January 2016 at 16:22:12 UTC, Jin wrote:
 So i simple use atomicFence. Performance does not degrade.

Either you are not calling it in the way you think you are, then, 
or your code is otherwise very unoptimized. You can definitely 
measure the impact of a full mfence on otherwise tight lock-free 
code.

  — David

Jan 09 2016

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 01/09/2016 04:51 PM, Jin wrote:
 I just add atomic fence for push and take:
 
         this.messages[ this.tail ] = value;
         atomicFence;
         this.tail = ( this.tail + 1 ) % this.size;

Don't do this, memory fences are expensive.

This is what you need for a spsc queue.
https://github.com/MartinNowak/lock-free/commit/233739262c14e00866a60f4a6a86c1b979ac968b

Jan 10 2016

Martin Nowak <code+news.digitalmars dawg.eu> writes:

On 01/08/2016 05:58 PM, Jin wrote:
 Idea: no mutex, no CAS, only one direction queues without any locks.
 
 My prototype (https://github.com/nin-jin/go.d) is up to 10x faster than
 std.concurrency.send/receive

Yes single-reader single-writer queue are the fastest way for
inter-thread communication.
You might have a look at my
[lock-free](http://code.dlang.org/packages/lock-free) for a correct
implementation.

 Problems:
 
 For blocking thread i use loop with Thread.sleep - this is bad decision
 IMHO.

Have a look at this exponential backoff implementation for my GC
spinlock PR.
https://github.com/D-Programming-Language/druntime/pull/1447/files#diff-fb5cbe06e1aaf83814ccf5ff08f05519R34

In general you need some sort of configurable or adaptive backoff or
you'll waste too much time context switching.

-Martin

Jan 10 2016

Jin <nin-jin ya.ru> writes:

On Sunday, 10 January 2016 at 21:25:34 UTC, Martin Nowak wrote:
 For blocking thread i use loop with Thread.sleep - this is bad 
 decision IMHO.

 Have a look at this exponential backoff implementation for my GC
 spinlock PR.
 https://github.com/D-Programming-Language/druntime/pull/1447/files#diff-fb5cbe06e1aaf83814ccf5ff08f05519R34
 In general you need some sort of configurable or adaptive 
 backoff or you'll waste too much time context switching.

I am using Waiter for this 
https://github.com/nin-jin/go.d/blob/master/source/jin/go.d#L171

Jan 13 2016

Jin <nin-jin ya.ru> writes:

I just use this queues to implement Go like api for concurrency 
(coroutines+channels): 
http://forum.dlang.org/thread/lcfnfnhjzonkdkeaumds forum.dlang.org

Mar 27 2016

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 27-Mar-2016 21:23, Jin wrote:
 I just use this queues to implement Go like api for concurrency
 (coroutines+channels):
 http://forum.dlang.org/thread/lcfnfnhjzonkdkeaumds forum.dlang.org

If nothing changed implementation-wise this is just data-racy queues :)

-- 
Dmitry Olshansky

Mar 28 2016

Jin <nin-jin ya.ru> writes:

On Monday, 28 March 2016 at 16:39:45 UTC, Dmitry Olshansky wrote:
 On 27-Mar-2016 21:23, Jin wrote:
 I just use this queues to implement Go like api for concurrency
 (coroutines+channels):
 http://forum.dlang.org/thread/lcfnfnhjzonkdkeaumds forum.dlang.org

 If nothing changed implementation-wise this is just data-racy 
 queues :)

Why?

Mar 28 2016

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 28-Mar-2016 20:03, Jin wrote:
 On Monday, 28 March 2016 at 16:39:45 UTC, Dmitry Olshansky wrote:
 On 27-Mar-2016 21:23, Jin wrote:
 I just use this queues to implement Go like api for concurrency
 (coroutines+channels):
 http://forum.dlang.org/thread/lcfnfnhjzonkdkeaumds forum.dlang.org

 If nothing changed implementation-wise this is just data-racy queues :)

 Why?

All I see is a ring buffer with hand-wavy atomicFence on one of mutating 
operations. popFront is not protected at all.

Also force yielding a thread is not a sane synchronization technique.

Over all - I suggest to not label this as "wait free" code as it's waaay 
far from what it takes to get that.

-- 
Dmitry Olshansky

Mar 28 2016

Jin <nin-jin ya.ru> writes:

On Monday, 28 March 2016 at 17:16:14 UTC, Dmitry Olshansky wrote:
 If nothing changed implementation-wise this is just data-racy 
 queues :)

 Why?

 All I see is a ring buffer with hand-wavy atomicFence on one of 
 mutating operations. popFront is not protected at all.

popFront does not need protection. It is atommic for provider.

 Also force yielding a thread is not a sane synchronization 
 technique.

Here the fibers are used.

 Over all - I suggest to not label this as "wait free" code as 
 it's waaay far from what it takes to get that.

Each operation (clear,front,popFront,full,put) has a fixed number 
of steps if you checks for clear before access to front and check 
for full before put. If you not check this - you will be blockek 
of cource. What do you expect? Exception? Ignore?

Mar 28 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Wait-free thread communication