digitalmars.D - Oh, my GoD! Goroutines on D
- Jin (77/77) Mar 27 2016 DUB module: http://code.dlang.org/packages/jin-go
- deadalnix (3/5) Mar 27 2016 Note that this is also the case in go. Yes, contrary to what is
- Walter Bright (2/3) Mar 27 2016 Nice! Please write an article about this!
- Jin (2/5) Mar 28 2016 My english is too bad to write articles, sorry :-(
- Lass Safin (2/8) Mar 28 2016 Just use engrish, we won't care. Really.
- Russel Winder via Digitalmars-d (13/15) Mar 28 2016 Write the article content and then get someone who is a person good at
- Walter Bright (3/4) Mar 28 2016 Baloney, your english is very good. Besides, I'm sure there will be many...
- Jin (5/9) Mar 28 2016 I just wrote the article on russin:
- Walter Bright (2/11) Mar 28 2016 Awesome! Who wants to help with the English?
- Walter Bright (15/29) Mar 28 2016 I'll start with the first paragraph (I don't know any Russian):
- sigod (3/13) Mar 29 2016 Create repository on GitHub. So, it will be easier for others to
- Jacob Carlborg (7/11) Mar 28 2016 It would be useful with a wiki page, or similar, that describes and
- =?UTF-8?Q?Ali_=c3=87ehreli?= (22/32) Mar 28 2016 And make sure to tell me about it so that my DConf presentation will be
- Jacob Carlborg (5/7) Mar 28 2016 I was hoping someone could give _me_ the links, that's why I wrote the
- Dejan Lekic (5/10) Mar 29 2016 +1
- Jin (2/6) Mar 29 2016 http://wiki.dlang.org/Go_to_D
- H. S. Teoh (9/15) Mar 29 2016 Since I know a bit of Russian, I took a shot at improving this
- mw (3/4) May 16 2020 Any performance comparison with Go? esp. in real word scenario?
- Russel Winder (12/18) May 17 2020 Seems to have been created four years ago and then left fallow. Perhaps ...
- Andre Pany (5/15) May 18 2020 ;)
- Seb (3/13) May 19 2020 FYI: channels are also part of vibe-core since a while:
- Russel Winder (12/15) May 19 2020 d
- Jin (3/18) May 25 2020 Yes. But it uses mutex. My implementation is wait-free
- mw (24/29) Jun 14 2020 Just saw this, for 1-provider-1-consumer queue, I did some
- Bienlein (18/23) May 19 2020 Go can easily have some ten thousand green threads. I once did a
- Panke (2/8) May 19 2020 The continuation is implicit when using fibers, isn't it?
- Russel Winder (42/67) May 19 2020 It wouldn't surprise me if std.parallelism couldn't do something analogo...
- Jin (15/22) May 25 2020 I have updated the code. But it isn't ready to use currently
- Mathias LANG (58/76) May 25 2020 This is a problem that's of interest to me as well, and I've been
- Johannes Loher (4/8) May 25 2020 I believe Weka did that with their own fiber implementation in
- Russel Winder (44/49) May 26 2020 [=E2=80=A6]
- mw (11/31) Jun 14 2020 ...
- mw (9/16) Jun 14 2020 You can try it here:
- Jin (37/57) Jun 14 2020 My wheels are quite simple:
- mw (8/13) Jun 14 2020 ...
- mw (10/14) Jun 14 2020 Since you are using 1-provider-1-consumer queue, can you try this
- Jin (17/32) Jun 15 2020 I have added a new release and a PR to your repo.
- mw (14/20) Jun 15 2020 on my same machine, go.d:
- Petar Kirov [ZombineDev] (3/25) Jun 16 2020 There's a SPMC/MPSC queue implementation here:
- Jin (66/76) Jan 02 Hello everyone, I've done a little refactoring and optimization
- Guillaume Piolat (14/16) Jan 02 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67
- Jin (5/21) Jan 03 I didn't really understand what you were talking about. After
- Brad Roberts (21/42) Jan 03 Probably the most common atomicity error, is atomic check followed by
- claptrap (7/23) Jan 03 Its a single producer, single consumer queue, so only one thread
- claptrap (6/14) Jan 03 Have you considered not wrapping the offsets, and only modulus
- Sebastiaan Koppe (9/35) Jan 04 Since you are essentially testing the speed of your spsc ring
- Jin (6/14) Jan 06 I found a bug. I first checked if there was anything in the
- Jin (20/25) Jan 08 I looked into the core.atomic code and was very disappointed. I
- claptrap (5/14) Jan 08 Technically it is lock free but not wait free. No thread can
- Jin (6/10) Jan 08 For wait-free in general and cyclic buffers in particular, it is
- Richard (Rikki) Andrew Cattermole (5/31) Jan 08 Dmd will not inline functions with inline assembly, any function calls
- Jin (17/21) Jan 08 Visible reordering can occur due to the asynchronous nature of
- Richard (Rikki) Andrew Cattermole (5/30) Jan 08 Not macros, what you want is intrinsics, this is how core.atomics works
- Jin (4/8) Jan 13 Unfortunately, support from all compilers to wait long. The
- Jin (12/39) Jun 14 2020 I have fixed all issues, and it's usable now. But I had to return
- mw (8/41) Jun 14 2020 ...
- Casey Sybrandy (6/9) Mar 30 2016 Have you considered using a Disrupter
- Casey Sybrandy (5/15) Mar 30 2016 Oh, and yes, I know that it would have to be rewritten in D
- Jin (2/11) Mar 30 2016 This is java bloatware. :-(
- Casey Sybrandy (5/6) Mar 30 2016 I've never used the library so I can't comment on that, but the
- Russel Winder via Digitalmars-d (26/33) Apr 01 2016 =C2=A0
- Russel Winder via Digitalmars-d (19/32) Apr 01 2016 =C2=A0
- Derek Fawcus (16/16) Jan 04 While having a CSP style mechanism, with cheap stackful
DUB module: http://code.dlang.org/packages/jin-go GIT repo: https://github.com/nin-jin/go.d Function "go" starts coroutines (vibe.d tasks) in thread pool like in Go language: unittest { import core.time; import jin.go; __gshared static string[] log; static void saying( string message ) { for( int i = 0 ; i < 3 ; ++i ) { sleep( 100.msecs ); log ~= message; } } go!saying( "hello" ); sleep( 50.msecs ); saying( "world" ); log.assertEq([ "hello" , "world" , "hello" , "world" , "hello" , "world" ]); } Class "Channel" is wait-free one-consumer-one-provider channel. By default, Channel blocks thread on receive from clear channel or send to full channel: unittest { static void summing( Channel!int output ) { foreach( i ; ( output.size * 2 ).iota ) { output.next = 1; } output.close(); } auto input = go!summing; while( !input.full ) yield; input.sum.assertEq( input.size * 2 ); } You can no wait if you do not want: unittest { import core.time; import jin.go; static auto after( Channel!bool channel , Duration dur ) { sleep( dur ); if( !channel.closed ) channel.next = true; } static auto tick( Channel!bool channel , Duration dur ) { while( !channel.closed ) after( channel , dur ); } auto ticks = go!tick( 101.msecs ); auto booms = go!after( 501.msecs ); string log; while( booms.clear ) { while( !ticks.clear ) { log ~= "tick"; ticks.popFront; } log ~= "."; sleep( 51.msecs ); } log ~= "BOOM!"; log.assertEq( "..tick..tick..tick..tick..BOOM!" ); } Channel are InputRange and OutputRange compatible. Structs "Inputs" and "Outputs" are round-robin facade to array of channels. More examples in unit tests: https://github.com/nin-jin/go.d/blob/master/source/jin/go.d#L293 Current problems: 1. You can give channel to more than two thread. I'm going to play with unique pointers to solve this problem. Any hints? 2. Sometimes you must close channel to notify partner to break range cycle. Solving (1) problem can solve and this. 3. API may be better. Advices?
Mar 27 2016
On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:1. You can give channel to more than two thread. I'm going to play with unique pointers to solve this problem. Any hints?Note that this is also the case in go. Yes, contrary to what is usually said, go is dead usnafe when it come to threads.
Mar 27 2016
On 3/27/2016 11:17 AM, Jin wrote:[...]Nice! Please write an article about this!
Mar 27 2016
On Sunday, 27 March 2016 at 20:39:57 UTC, Walter Bright wrote:On 3/27/2016 11:17 AM, Jin wrote:My english is too bad to write articles, sorry :-([...]Nice! Please write an article about this!
Mar 28 2016
On Monday, 28 March 2016 at 13:10:45 UTC, Jin wrote:On Sunday, 27 March 2016 at 20:39:57 UTC, Walter Bright wrote:Just use engrish, we won't care. Really.On 3/27/2016 11:17 AM, Jin wrote:My english is too bad to write articles, sorry :-([...]Nice! Please write an article about this!
Mar 28 2016
On Mon, 2016-03-28 at 13:10 +0000, Jin via Digitalmars-d wrote:=C2=A0 My english is too bad to write articles, sorry :-(Write the article content and then get someone who is a person good at writing in English to ghostwrite the article from your content. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Mar 28 2016
On 3/28/2016 6:10 AM, Jin wrote:My english is too bad to write articles, sorry :-(Baloney, your english is very good. Besides, I'm sure there will be many volunteers here to help you touch it up.
Mar 28 2016
On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:On 3/28/2016 6:10 AM, Jin wrote:I just wrote the article on russin: https://habrahabr.ru/post/280378/ Translation to english: https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2FMy english is too bad to write articles, sorry :-(Baloney, your english is very good. Besides, I'm sure there will be many volunteers here to help you touch it up.
Mar 28 2016
On 3/28/2016 3:35 PM, Jin wrote:On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:Awesome! Who wants to help with the English?On 3/28/2016 6:10 AM, Jin wrote:I just wrote the article on russin: https://habrahabr.ru/post/280378/ Translation to english: https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2FMy english is too bad to write articles, sorry :-(Baloney, your english is very good. Besides, I'm sure there will be many volunteers here to help you touch it up.
Mar 28 2016
On 3/28/2016 5:07 PM, Walter Bright wrote:On 3/28/2016 3:35 PM, Jin wrote:I'll start with the first paragraph (I don't know any Russian): Google: Multitasking - this is what is implemented in Go of the good, though not perfect. Nice syntax with a tart aftertaste, simple and powerful abstraction, bribe its elegance compared to other imperative languages. And taste the best, so do not want to have to slide to mediocrity. Therefore, if and switch to another language, it must be even more expressive and with no less sensible multitasking. English: Multitasking - Go's multitasking capabilities are good, though not perfect. It has a nice syntax with a sweet aftertaste, simple and powerful abstraction, elegant compared to other imperative languages. It exhibits good taste, so one does not wish to compromise into mediocrity. Therefore, to switch to another language, it must be even more expressive and with no less sensible multitasking.On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:Awesome! Who wants to help with the English?On 3/28/2016 6:10 AM, Jin wrote:I just wrote the article on russin: https://habrahabr.ru/post/280378/ Translation to english: https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2FMy english is too bad to write articles, sorry :-(Baloney, your english is very good. Besides, I'm sure there will be many volunteers here to help you touch it up.
Mar 28 2016
On Monday, 28 March 2016 at 22:35:12 UTC, Jin wrote:On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:Create repository on GitHub. So, it will be easier for others to help with translation.On 3/28/2016 6:10 AM, Jin wrote:I just wrote the article on russin: https://habrahabr.ru/post/280378/ Translation to english: https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2FMy english is too bad to write articles, sorry :-(Baloney, your english is very good. Besides, I'm sure there will be many volunteers here to help you touch it up.
Mar 29 2016
On 2016-03-27 20:17, Jin wrote:DUB module: http://code.dlang.org/packages/jin-go GIT repo: https://github.com/nin-jin/go.d Function "go" starts coroutines (vibe.d tasks) in thread pool like in Go language:It would be useful with a wiki page, or similar, that describes and compares different ways of doing concurrency in D, both built-in and third party libraries like this and vibe.d. Also compare against other languages like Go, Erlang, Scala and so on. -- /Jacob Carlborg
Mar 28 2016
On 03/28/2016 03:49 AM, Jacob Carlborg wrote:On 2016-03-27 20:17, Jin wrote:And make sure to tell me about it so that my DConf presentation will be more complete: :) http://dconf.org/2016/talks/cehreli.html That abstract is awfully short but I've posted a pull request to improve it a little bit: <quote> D provides support for multitasking in the form of language features and standard library modules. D makes it easy for your programs to perform multiple tasks at the same time. This kind of support is especially important in order to take advantage of the multiple CPU cores that are available on modern computing systems. Multitasking is one of the most difficult computing concepts to implement correctly. This talk will introduce different kinds of multitasking, as well as parallelism, a concept which is in fact unrelated to, but is often confused with, multitasking. The talk will conclude with fibers (aka co-routines), a powerful tool that is often overlooked despite its convenience. </quote> Seriously, I appreciate any documentation links that you can give to complete my "homework" before DConf. :) AliDUB module: http://code.dlang.org/packages/jin-go GIT repo: https://github.com/nin-jin/go.d Function "go" starts coroutines (vibe.d tasks) in thread pool like in Go language:It would be useful with a wiki page, or similar, that describes and compares different ways of doing concurrency in D, both built-in and third party libraries like this and vibe.d. Also compare against other languages like Go, Erlang, Scala and so on.
Mar 28 2016
On 2016-03-29 01:53, Ali Çehreli wrote:Seriously, I appreciate any documentation links that you can give to complete my "homework" before DConf. :)I was hoping someone could give _me_ the links, that's why I wrote the post ;) -- /Jacob Carlborg
Mar 28 2016
On Monday, 28 March 2016 at 10:49:28 UTC, Jacob Carlborg wrote:It would be useful with a wiki page, or similar, that describes and compares different ways of doing concurrency in D, both built-in and third party libraries like this and vibe.d. Also compare against other languages like Go, Erlang, Scala and so on.+1 Wiki is absolutely the best solution to this, I agree. Plus, we already have a wiki so he should just go there and start writing. The community will incorrect grammar/syntax and typos.
Mar 29 2016
On Tuesday, 29 March 2016 at 12:30:24 UTC, Dejan Lekic wrote:+1 Wiki is absolutely the best solution to this, I agree. Plus, we already have a wiki so he should just go there and start writing. The community will incorrect grammar/syntax and typos.http://wiki.dlang.org/Go_to_D
Mar 29 2016
On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:On Tuesday, 29 March 2016 at 12:30:24 UTC, Dejan Lekic wrote:Since I know a bit of Russian, I took a shot at improving this article, and got partway through the "Channels" section. But now I need to get back to work... so hopefully somebody else can work on improving the English text. :-) --T+1 Wiki is absolutely the best solution to this, I agree. Plus, we already have a wiki so he should just go there and start writing. The community will incorrect grammar/syntax and typos.http://wiki.dlang.org/Go_to_D
Mar 29 2016
On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 16 2020
On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:Seems to have been created four years ago and then left fallow. Perhaps it should be resurrected and integrated into Phobos? Or left as a package in = the Dub repository? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.ukhttp://wiki.dlang.org/Go_to_D=20 Any performance comparison with Go? esp. in real word scenario? =20 Can it easily handle hundreds of (go)routines?
May 17 2020
On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:;) https://github.com/nin-jin/go.d/issues/2 Kind regards AndreOn Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:Seems to have been created four years ago and then left fallow. Perhaps it should be resurrected and integrated into Phobos? Or left as a package in the Dub repository?http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 18 2020
On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:FYI: channels are also part of vibe-core since a while: https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.dOn Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:Seems to have been created four years ago and then left fallow. Perhaps it should be resurrected and integrated into Phobos? Or left as a package in the Dub repository?http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 19 2020
On Tue, 2020-05-19 at 09:15 +0000, Seb via Digitalmars-d wrote: [=E2=80=A6]FYI: channels are also part of vibe-core since a while: =20 https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.=d I will have to investigate. Sounds like vibe.d can be used as a tasks with channels on a threadpool. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
May 19 2020
On Tuesday, 19 May 2020 at 09:15:24 UTC, Seb wrote:On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:Yes. But it uses mutex. My implementation is wait-free (https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All threads can easy and fast write and read without any locking. So every queue is 1-provider-1-consumer. But Input and Output channels is roundrobin list of queues. You can found some diagrams there: https://github.com/nin-jin/go.d/blob/master/readme.drawio.svgOn Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:FYI: channels are also part of vibe-core since a while: https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.dOn Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:Seems to have been created four years ago and then left fallow. Perhaps it should be resurrected and integrated into Phobos? Or left as a package in the Dub repository?http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 25 2020
On Monday, 25 May 2020 at 16:41:06 UTC, Jin wrote:Just saw this, for 1-provider-1-consumer queue, I did some experiments with https://code.dlang.org/packages/lock-free and the result is here: https://github.com/mingwugmail/liblfdsd/tree/master/comparison LDC is ~5x times faster than DMD, I'm not so sure why; and this package may need more stress testing: ------------------------------------------------------------------------ $ make ldcrun ... received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec $ make dmdrun ... received 1000000000 messages in 53607 msec sum=499999999500000000 speed=18654 msg/msec $ make javamp 10000000 messages received in 1151.0 ms, sum=49999995000000 speed: 0.1151 microsec/message, 8688.097306689835 messages/msec $ make dmp received 100000000 messages in 27574 msec sum=4999999950000000 speed=3626 msg/msec ------------------------------------------------------------------------FYI: channels are also part of vibe-core since a while: https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.dYes. But it uses mutex. My implementation is wait-free (https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All threads can easy and fast write and read without any locking. So every queue is 1-provider-1-consumer. But Input and Output channels is roundrobin list of queues. You can found some diagrams there: https://github.com/nin-jin/go.d/blob/master/readme.drawio.svg
Jun 14 2020
On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:Go can easily have some ten thousand green threads. I once did a test run to see how many. On a 4 GB machine 80.000 green threads aka goroutines were created till out of memory occured. Communicating sequential processes as in Go relies on being able to create a large amount of threads. With a paradigm of threads doing blocking takes on channels any application would otherwise quickly run out of threads. In D something similar could be done using fibers. Using fibers is also the approach chosen in Java extending the JVM to have CSP-stye concurrency as in Go, see https://www.youtube.com/watch?v=lIq-x_iI-kc Then you also need continuations. Lets say inside a function a blocking take is done on two channels in a row. The first channel has some input, the next one has not. In between comes a context switch. When switching back the value taken from the first channel has to be put back into the context. This is why continuations are needed. Really nice work! Please keep it going :-)http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 19 2020
On Tuesday, 19 May 2020 at 08:42:14 UTC, Bienlein wrote:Then you also need continuations. Lets say inside a function a blocking take is done on two channels in a row. The first channel has some input, the next one has not. In between comes a context switch. When switching back the value taken from the first channel has to be put back into the context. This is why continuations are needed.The continuation is implicit when using fibers, isn't it?
May 19 2020
On Tue, 2020-05-19 at 08:42 +0000, Bienlein via Digitalmars-d wrote:On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:It wouldn't surprise me if std.parallelism couldn't do something analogous. There are tasks that are executed (potentially with work stealing) by threa= ds in a threadpool. The problem is that the std.parallelism API is dedicated t= o data parallelism rather than the process/channels concurrency/parallelism o= f CSP (the theoretical foundation for goroutines =E2=80=93 sort of but you ha= ve to read Rob Pike's articles to see why). I am fairly certain, but have not yet checked, that the idea of task is ver= y similar in std.parallelism and jin.go =E2=80=93 which is based on vibe.d's = event loop as I understand it (but could be wrong). Vibe.d allows a threadpool, though= I suspect most people use it single threaded. So Vibe.d and std.parallelism h= ave a lot in common. I'll bet there is much that could be factored out and shar= ed, but realistically this isn't going to happen.On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:=20 Go can easily have some ten thousand green threads. I once did a=20 test run to see how many. On a 4 GB machine 80.000 green threads=20 aka goroutines were created till out of memory occured.http://wiki.dlang.org/Go_to_D=20 Any performance comparison with Go? esp. in real word scenario? =20 Can it easily handle hundreds of (go)routines?Communicating sequential processes as in Go relies on being able=20 to create a large amount of threads. With a paradigm of threads=20 doing blocking takes on channels any application would otherwise=20 quickly run out of threads. In D something similar could be done=20 using fibers. Using fibers is also the approach chosen in Java=20 extending the JVM to have CSP-stye concurrency as in Go, see=20 https://www.youtube.com/watch?v=3DlIq-x_iI-kcI wonder if we should use "thread" for heavyweight (OS?) thread, "fibre" fo= r fibre (including lightweight threads from before threads became threads), a= nd task for the things that get put into the job queues of a threadpool. D already has threadpool in std.parallelism, perhaps it needs extracting (as = in Java) so it can be the basis of Vibe.d =E2=80=93 or vice versa, obviously.Then you also need continuations. Lets say inside a function a=20 blocking take is done on two channels in a row. The first channel=20 has some input, the next one has not. In between comes a context=20 switch. When switching back the value taken from the first=20 channel has to be put back into the context. This is why=20 continuations are needed.Are you sure? Isn't the whole point of tasks (processes in CSP jargon) and channels is that you don't need continuations. A CSP computation should not even understand the idea of a context switch. If it matters, has perhaps, t= he concept of fibre intruded in a way that violates the abstraction?Really nice work! Please keep it going :-)--=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
May 19 2020
On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:I have updated the code. But it isn't ready to use currently because: 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work. 2. Race condition. I'm going to closely review algorithm. Currently it's twice slower than Go. On y machine:http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?go run app.go --releaseWorkers Result Time 4 499500000 27.9226msdub --quiet --build=releaseWorkers Result Time 3 499500000 64 ms It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins.
May 25 2020
On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:This is a problem that's of interest to me as well, and I've been working on this for a few months (on and off). I had to eventually ditch `std.concurrency` because of some design decisions that made things hard to work with. `std.concurrency`'s MessageBox were originally designed to be only between threads. As such, they come with all the locking you'd expect from a cross-thread message-passing data structure. Support for fibers was added as an afterthought. You can even see it in the documentation (https://dlang.org/phobos/std_concurrency.html), where "thread" is mentioned all over the place. The module doc kinda makes it get away with it because it calls fibers "logical threads", but that distinction is not always made. It also have some concept that make a lot of sense for threads, but much less so for Fibers (such as the "owner" concept, which is the task that `spawn`ed you). Finally, it forces messages to be `shared` or isolated (read: with only `immutable` indirections), which doesn't make sense when you're dealing only with Fibers on the same thread. We found some ridiculous issues when trying to use it. We upstreamed some fixes (https://github.com/dlang/phobos/pull/7096, https://github.com/dlang/phobos/pull/6738) and put a bounty on one of the issue which led to someone finding the bug in `std.concurrency` (https://github.com/Geod24/localrest/pull/5#issuecomment-523707490). After some playing around with it, we just gave up and forked the whole module and started to change it to make it behave more like channels. There are some other issues I found while refactoring which I might upstream in the future, but it needs so much work that I might as well PR a whole new module. What we're trying to achieve is to move from a MessageBox approach, where there is a 1-to-1 relationship between a task (or logical thread) and a MessageBox, to a channel-like model, where there is a N-to-1 relationship (See Go's select). In order to achieve Go-like performance, we need a few things though: - Direct hand-off semantic for same-thread message passing: Meaning that if Fiber A sends a message to Fiber B, and they are both in the same thread, there is an immediate context switch from A to B, without going through the scheduler; - Thread-level multiplexing of receive: With the current `std.concurrency`, calling `receive` yield the fiber and might block the Thread. The scheduler simply iterate over all Fibers in a linear order, which means you could end up in a situation where, if you have 3 Fibers, and they all `receive` one after the other, you'll end up being blocked on the *first* one receiving a message to wait the other ones up. - Smaller Fibers: Goroutine can have very, very small stack. They don't stack overflow because they are managed (whenever you need to allocate more stack, there use to be a check for stack overflow, and stack "regions" were/are essentially a linked list and need not be contiguous in memory). On the other hand we use simple regular fiber context switching, which is much more expensive. In that area, I think exploring the idea of a stackless coroutine based scheduler could be worthwhile. This google doc has a lot of good informations, if you're interested: https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW4cSv1wy5hC0ApeGMV9s/pub It's still a problem we're working on, as some issues are unique to D and we haven't found a good solution (e.g. requiring `shared` for same-thread Fiber communication is quite problematic). If we ever reach a satisfying solution I'll try upstreaming it.On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:I have updated the code. But it isn't ready to use currently because: 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work. 2. Race condition. I'm going to closely review algorithm. [...] It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins.http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?
May 25 2020
On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:- Direct hand-off semantic for same-thread message passing: Meaning that if Fiber A sends a message to Fiber B, and they are both in the same thread, there is an immediate context switch from A to B, without going through the scheduler;I believe Weka did that with their own fiber implementation in Mecca. I think I remember Shachar mentioning this during his talk at DConf (2018?)
May 25 2020
On Tue, 2020-05-26 at 01:27 +0000, Mathias LANG via Digitalmars-d wrote: [=E2=80=A6]=20 This is a problem that's of interest to me as well, and I've been=20 working on this for a few months (on and off). I had to eventually ditch `std.concurrency` because of some=20 design decisions that made things hard to work with.[=E2=80=A6] I am fairly sure std.parallelism is a better place to get threadpools, task= s, scheduling, work stealing, etc. However it is all packaged with a view to implementing SMP parallelism in D. I haven't been following, but many others including Vibe.d have implemented either fibres and yield or tasks/threadpools and channels =E2=80=93 the bit= missing from std.parallelism since it isn't needed for SMP parallelism, but is if y= ou take the tasks and threadpools out of that context. What has happened in Rust, and to a great extent in the JVM arena is that there has been an implementation of fibres/yield, futures and async, and/or task and threadpool that has been centralised and then everyone else has evolved to use it rather than having multiple implementations of all the ideas. In the JVM milieu there is still a lot of NIH replication but then t= hey have lots of money and resources. Strategically if there were to be one set of Dub packages doing this low le= vel stuff that people worked on that then everyone else used that would be good= . Then the question is whether to deprecate std.parallelism and rebuild it ba= sed on the new low level code. Answer yes. Perhaps the std.parallelism stuff co= uld actually provide a basis for some of this low level code along with vibe.co= re stuff and mayhap the Mecca stuff. My feeling is the time for everyone implements their own is long past, it is time for all to join in on a stand= ard for set of tools for D. This includes removing the Fibres stuff from std.concurrency. =20 So yes I am up for contributing. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder t: +44 20 7585 2200 41 Buckmaster Road m: +44 7770 465 077 London SW11 1EN, UK w: www.russel.org.uk
May 26 2020
On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:...On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario?This is a problem that's of interest to me as well, and I've been working on this for a few months (on and off). I had to eventually ditch `std.concurrency` because of some design decisions that made things hard to work with. `std.concurrency`'s MessageBox were originally designed to be only between threads. As such, they come with all the locking you'd expect from a cross-thread message-passing data structure....It's still a problem we're working on, as some issues are unique to D and we haven't found a good solution (e.g. requiring `shared` for same-thread Fiber communication is quite problematic). If we ever reach a satisfying solution I'll try upstreaming it.... Have you tried lock-free queue? https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer) Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation). I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones.
Jun 14 2020
On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote:Have you tried lock-free queue? https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer) Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation). I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones.You can try it here: https://github.com/mingwugmail/liblfdsd only https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_prod cer,_many_consumer) for now. ``` received 100000000 messages in 4632 msec sum=4999999950000000 speed=21588 msg/msec ```
Jun 14 2020
On Sunday, 14 June 2020 at 19:49:46 UTC, mw wrote:On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote:My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d I have written your bench (https://github.com/mingwugmail/liblfdsd/blob/master/liblfds.dpp#L67) with go.d: ``` const int n = 100_000_000; void threadProducer(Output!int queue) { foreach (int i; 0..n) { queue.put(i); } } void main() { Input!int queue; go!threadProducer(queue.pair); StopWatch sw; sw.start(); long sum = 0; foreach (p; queue) { sum += p; } sw.stop(); writefln("received %d messages in %d msec sum=%d speed=%d msg/msec", n, sw.peek.total!"msecs", sum, n / sw.peek.total!"msecs"); assert(sum == (n * (n - 1) / 2)); } ``` The code is simpler. On my laptop it gives: ```Have you tried lock-free queue? https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer) Java uses the same algorithm for ConcurrentLinkedQueue (in C implementation). I tried some small examples with liblfds, got slightly better performance than Java. Maybe we don’t want to reinvent the wheels, esp the well tested ones.You can try it here: https://github.com/mingwugmail/liblfdsd only https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_prod cer,_many_consumer) for now. ``` received 100000000 messages in 4632 msec sum=4999999950000000 speed=21588 msg/msec ```dub --quiet --build=releasereceived 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msec ```
Jun 14 2020
On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:...https://github.com/mingwugmail/liblfdsd speed=21588 msg/msecMy wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d...speed=9989 msg/msecCan you make a new go.d release to https://github.com/nin-jin/go.d/releases, or create a PR using your jin.go.queue to https://github.com/mingwugmail/liblfdsd/tree/master/comparison So either you or me can run the comparison on the same machine?
Jun 14 2020
On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d...received 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msecSince you are using 1-provider-1-consumer queue, can you try this package as a drop-in replacement in go.d: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d (you may need to add align() to that implementation). according to my test: received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec you may get ~10x boost.
Jun 14 2020
On Monday, 15 June 2020 at 01:55:27 UTC, mw wrote:On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:I have added a new release and a PR to your repo. But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because: 1. My API is std.range compatible. 2. I use the same API for queue and channels. 3. My API supports finalization (by provider or consumer or both). 4. Your queue is fixed size. But my channels operate with set of queues that can have different sizes. But I would steal some optimization: 1. Power of 2 capacity and bitop instead of division. It can add 10% performance. 2. The cache line size should be a minimum size of the message to prevent false sharing. You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 Is this really required? CPU can't reorder dependent statements.My wheels are quite simple: https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d...received 100000000 messages in 10011 msec sum=4999999950000000 speed=9989 msg/msecSince you are using 1-provider-1-consumer queue, can you try this package as a drop-in replacement in go.d: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d (you may need to add align() to that implementation). according to my test: received 1000000000 messages in 9845 msec sum=499999999500000000 speed=101574 msg/msec you may get ~10x boost.
Jun 15 2020
On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote:I have added a new release and a PR to your repo.on my same machine, go.d: received 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec so, it's ~2.7x faster than Java: https://github.com/mingwugmail/liblfdsd/tree/master/comparison And your https://github.com/nin-jin/go.d on my machine go.d is 2~4 times slower than Go. 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d) Go's speed is consistent in multiple runs, for go.d it can be 2x difference, maybe because of the scheduler is unstable?But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because:I just want to do a comparison.You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 Is this really required? CPU can't reorder dependent statements.It's not mine, but MartinNowak's. The implementation is based on https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular
Jun 15 2020
On Monday, 15 June 2020 at 22:04:49 UTC, mw wrote:On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote:There's a SPMC/MPSC queue implementation here: https://github.com/weka-io/mecca/blob/master/src/mecca/containers/otm_queue.d that may be also interesting for you guys to check.I have added a new release and a PR to your repo.on my same machine, go.d: received 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec so, it's ~2.7x faster than Java: https://github.com/mingwugmail/liblfdsd/tree/master/comparison And your https://github.com/nin-jin/go.d on my machine go.d is 2~4 times slower than Go. 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d) Go's speed is consistent in multiple runs, for go.d it can be 2x difference, maybe because of the scheduler is unstable?But I don't think it's a good idea to replace jin.go.queue by lock_free.rwqueue because:I just want to do a comparison.You are using atomicLoad!(MemoryOrder.acq) at there: https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41 Is this really required? CPU can't reorder dependent statements.It's not mine, but MartinNowak's. The implementation is based on https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular
Jun 16 2020
Hello everyone, I've done a little refactoring and optimization of [jin.go](https://github.com/nin-jin/go.d): - I got rid of the vibe.d dependency because it's slow, big, and I haven't been able to make friends with version 2. When running only 1000 vibe-fibers, not only did the application crash, but even the graphics system driver crashed once, which required restarting the laptop. - So far, I've settled on native streams with a small stack size (4 kb). - I'm really looking forward to [photon's](https://github.com/nin-jin/go.d/issues/7) stabilization to get fiber support back. It would be really awesome to see it in the standard library. - I had to abandon the move semantics because I couldn't make friends with the delegates. Currently, the number of references to the queue is controlled by the copy constructor. Good news! After all the optimizations, the channels show impressive speed in the above benchmark for pumping messages between two streams. ```d import std.datetime.stopwatch; import std.range; import std.stdio; import jin.go; const long n = 100_000_000; auto threadProducer() { return n.iota; } void main() { auto queue = go!threadProducer; StopWatch sw; sw.start(); long sum = 0; foreach (p; queue) { sum += p; } sw.stop(); writefln("received %d messages in %d msec sum=%d speed=%d msg/msec", n, sw.peek.total!"msecs", sum, n / sw.peek.total!"msecs"); assert(sum == (n * (n - 1) / 2)); } ``` ```sh received 100000000 messages in 718 msec sum=4999999950000000 speed=139275 msg/msec ``` I've almost caught up with Go in [my goroutines benchmark](https://github.com/nin-jin/go.d/blob/master/compare.cmd): ```shreceived 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec so, it's ~2.7x faster than Java: https://github.com/mingwugmail/liblfdsd/tree/master/comparison And your https://github.com/nin-jin/go.d on my machine go.d is 2~4 times slower than Go. 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)go run app.go --releaseWorkers Result Time 8 49995000000 109.7644msdub --quiet --build=releaseWorkers Result Time 0 49995000000 124 ms ``` Bad news. Sometimes I get incorrect results and I can't figure out why. ```shdub --quiet --build=releaseWorkers Result Time 0 49945005000 176 ms ``` I use the atomic acquire and release operations, although they are not required on x86, but I hope the compiler takes them into account and does not reorder instructions. But even with stricter memory barriers, I don't get a very stable result. If someone can tell me what could be wrong here, I would be very grateful.
Jan 02
On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:If someone can tell me what could be wrong here, I would be very grateful.https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67 A concurrent `put` and `popFront` will do nothing to avoid races in `Queue`. Individual access to _offset is atomic but its point of use in both put and popFront is not. Both functions look like this: 1. Atomic-read-offset 2. Anything can happen 3. Atomic-write-offset If one function has completed 1) then other has completed 1+2+3 then you get a race. Chaining two atomic things doesn't make the whole atomic, rather classic mistake with mutual exclusion.
Jan 02
On Friday, 3 January 2025 at 00:17:27 UTC, Guillaume Piolat wrote:On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:I didn't really understand what you were talking about. After checking pending/available, we are guaranteed to have the opportunity to take the next step on consumer/provider side. Therefore, we are doing our job and then increasing our offset.If someone can tell me what could be wrong here, I would be very grateful.https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67 A concurrent `put` and `popFront` will do nothing to avoid races in `Queue`. Individual access to _offset is atomic but its point of use in both put and popFront is not. Both functions look like this: 1. Atomic-read-offset 2. Anything can happen 3. Atomic-write-offset If one function has completed 1) then other has completed 1+2+3 then you get a race. Chaining two atomic things doesn't make the whole atomic, rather classic mistake with mutual exclusion.
Jan 03
On 1/2/2025 4:17 PM, Guillaume Piolat via Digitalmars-d wrote:On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:Probably the most common atomicity error, is atomic check followed by non atomic acting based on that check. The problem that is overlooked in that scenario is that while the check itself is safe, by the time the action proceeds the condition can change. The only safe way is to combine the check and action into a single atomic transaction. The next most common is probably the ABA issue. Assuming that because you see A in the second case, that what you're seeing is still the first A and that implies that B couldn't and didn't happen. The lesson here is that you're far better off NOT being clever and trying to avoid longer lived locks unless you can demonstrate that it's particularly important and detrimental to the app performance. It's super easy to get separate/tiny atomic operations wrong and be very hard to detect/debug that than to get it right at a slightly higher cost with multi-instruction simple locking. Not to mention that cpu and os designers have invested energy improving the performance of mutex and spinlock style code. There's a time an place for cleverness, but not at the expense of correctness. My 2 cents, BradIf someone can tell me what could be wrong here, I would be very grateful.https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67 A concurrent `put` and `popFront` will do nothing to avoid races in `Queue`. Individual access to _offset is atomic but its point of use in both put and popFront is not. Both functions look like this: 1. Atomic-read-offset 2. Anything can happen 3. Atomic-write-offset If one function has completed 1) then other has completed 1+2+3 then you get a race. Chaining two atomic things doesn't make the whole atomic, rather classic mistake with mutual exclusion.
Jan 03
On Friday, 3 January 2025 at 00:17:27 UTC, Guillaume Piolat wrote:On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:Its a single producer, single consumer queue, so only one thread pushes, and one thread pulls. So the "2 Anything can happen" doesnt apply, since there's only one thread actually pushing to the queue. The consumer thread only reads the producer.offset, so it cant interfere with pushes, and it only sees a pushed message once producer.offset is updated.If someone can tell me what could be wrong here, I would be very grateful.https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67 A concurrent `put` and `popFront` will do nothing to avoid races in `Queue`. Individual access to _offset is atomic but its point of use in both put and popFront is not. Both functions look like this: 1. Atomic-read-offset 2. Anything can happen 3. Atomic-write-offset If one function has completed 1) then other has completed 1+2+3 then you get a race. Chaining two atomic things doesn't make the whole atomic, rather classic mistake with mutual exclusion.
Jan 03
On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:Have you considered not wrapping the offsets, and only modulus with the length when you index into the messages, it'll simplify the math, IE.. messagesInQueue = producer.offet-consumer.offset available = (Length - messagesInQueue -1)I use the atomic acquire and release operations, although they are not required on x86, but I hope the compiler takes them into account and does not reorder instructions. But even with stricter memory barriers, I don't get a very stable result. If someone can tell me what could be wrong here, I would be very grateful.received 100000000 messages in 2906 msec sum=4999999950000000
Jan 03
On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:Since you are essentially testing the speed of your spsc ring queue, I would just benchmark the queue directly. Don't think there is much room for improvement unless you adopt disruptor design and switch to batch consumption. That said, I do see some redundant offset loads and stores in the code the optimiser might miss. Testing just the queue might also simplify and reduce the surface area to find the error.Hello everyone, I've done a little refactoring and optimization of [jin.go](https://github.com/nin-jin/go.d): - I got rid of the vibe.d dependency because it's slow, big, and I haven't been able to make friends with version 2. When running only 1000 vibe-fibers, not only did the application crash, but even the graphics system driver crashed once, which required restarting the laptop. - So far, I've settled on native streams with a small stack size (4 kb). - I'm really looking forward to [photon's](https://github.com/nin-jin/go.d/issues/7) stabilization to get fiber support back. It would be really awesome to see it in the standard library. - I had to abandon the move semantics because I couldn't make friends with the delegates. Currently, the number of references to the queue is controlled by the copy constructor. Good news! After all the optimizations, the channels show impressive speed in the above benchmark for pumping messages between two streams.received 100000000 messages in 2906 msec sum=4999999950000000 speed=34411 msg/msec so, it's ~2.7x faster than Java: https://github.com/mingwugmail/liblfdsd/tree/master/comparison And your https://github.com/nin-jin/go.d on my machine go.d is 2~4 times slower than Go. 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)
Jan 04
On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:Bad news. Sometimes I get incorrect results and I can't figure out why. I use the atomic acquire and release operations, although they are not required on x86, but I hope the compiler takes them into account and does not reorder instructions. But even with stricter memory barriers, I don't get a very stable result. If someone can tell me what could be wrong here, I would be very grateful.I found a bug. I first checked if there was anything in the queue, and if not, I checked if the queue was finalized. Sometimes the queue would fill up between these two steps and I would lose data. I moved the finalization check to the beginning and now everything is stable.
Jan 06
On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:I found a bug. I first checked if there was anything in the queue, and if not, I checked if the queue was finalized. Sometimes the queue would fill up between these two steps and I would lose data. I moved the finalization check to the beginning and now everything is stable.I looked into the core.atomic code and was very disappointed. I expected to see memory barriers (necessary for wait-free), but I saw a CAS-spinlock there (and this is lock-free): ```d asm pure nothrow nogc trusted { naked; push RBX; mov R8, RDX; mov RAX, [RDX]; mov RDX, 8[RDX]; mov RBX, [RCX]; mov RCX, 8[RCX]; L1: lock; cmpxchg16b [R8]; jne L1; pop RBX; ret; } ```
Jan 08
On Wednesday, 8 January 2025 at 10:10:55 UTC, Jin wrote:On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:Technically it is lock free but not wait free. No thread can actually hold it like a regular lock, the CAS either succeeds or of fails in one go, so you are always guaranteed at least one thread will progress.I found a bug. I first checked if there was anything in the queue, and if not, I checked if the queue was finalized. Sometimes the queue would fill up between these two steps and I would lose data. I moved the finalization check to the beginning and now everything is stable.I looked into the core.atomic code and was very disappointed. I expected to see memory barriers (necessary for wait-free), but I saw a CAS-spinlock there (and this is lock-free):
Jan 08
On Wednesday, 8 January 2025 at 10:16:58 UTC, claptrap wrote:Technically it is lock free but not wait free. No thread can actually hold it like a regular lock, the CAS either succeeds or of fails in one go, so you are always guaranteed at least one thread will progress.For wait-free in general and cyclic buffers in particular, it is important that operations are ordered (first buffer operations, then offset updates). CAS does not do anything useful in this case, as it always succeeds (only one thread writes to offset), but other operations can be rearranged.
Jan 08
On 08/01/2025 11:10 PM, Jin wrote:On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:Dmd will not inline functions with inline assembly, any function calls should prevent reordering cpu side. So any concern for ordering you have shouldn't matter for dmd, its ldc and gdc that you need to be worried about.I found a bug. I first checked if there was anything in the queue, and if not, I checked if the queue was finalized. Sometimes the queue would fill up between these two steps and I would lose data. I moved the finalization check to the beginning and now everything is stable.I looked into the core.atomic code and was very disappointed. I expected to see memory barriers (necessary for wait-free), but I saw a CAS- spinlock there (and this is lock-free): ```d asm pure nothrow nogc trusted { naked; push RBX; mov R8, RDX; mov RAX, [RDX]; mov RDX, 8[RDX]; mov RBX, [RCX]; mov RCX, 8[RCX]; L1: lock; cmpxchg16b [R8]; jne L1; pop RBX; ret; } ```
Jan 08
On Wednesday, 8 January 2025 at 11:37:48 UTC, Richard (Rikki) Andrew Cattermole wrote:Dmd will not inline functions with inline assembly, any function calls should prevent reordering cpu side. So any concern for ordering you have shouldn't matter for dmd, its ldc and gdc that you need to be worried about.Visible reordering can occur due to the asynchronous nature of inter-core communication, which is relevant for ARM and other architectures. So it looks like we need macros that will insert inline opcodes for memory barriers: ```d writeToBuffer; mixin(Store_Store); writeToOffset; ``` ```d readFromBuffer; mixin(Load_Store); writeToOffset; ```
Jan 08
On 09/01/2025 1:01 AM, Jin wrote:On Wednesday, 8 January 2025 at 11:37:48 UTC, Richard (Rikki) Andrew Cattermole wrote:Not macros, what you want is intrinsics, this is how core.atomics works for ldc/gdc. In this case ``atomicFence``. https://dlang.org/phobos/core_atomic.html#.atomicFenceDmd will not inline functions with inline assembly, any function calls should prevent reordering cpu side. So any concern for ordering you have shouldn't matter for dmd, its ldc and gdc that you need to be worried about.Visible reordering can occur due to the asynchronous nature of inter- core communication, which is relevant for ARM and other architectures. So it looks like we need macros that will insert inline opcodes for memory barriers: ```d writeToBuffer; mixin(Store_Store); writeToOffset; ``` ```d readFromBuffer; mixin(Load_Store); writeToOffset; ```
Jan 08
On Wednesday, 8 January 2025 at 12:10:19 UTC, Richard (Rikki) Andrew Cattermole wrote:Not macros, what you want is intrinsics, this is how core.atomics works for ldc/gdc. In this case ``atomicFence``. https://dlang.org/phobos/core_atomic.html#.atomicFenceUnfortunately, support from all compilers to wait long. The library could be realized now.
Jan 13
On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:I have fixed all issues, and it's usable now. But I had to return vibe-core dependency. Now it's slow down:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:I have updated the code. But it isn't ready to use currently because: 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work. 2. Race condition. I'm going to closely review algorithm. Currently it's twice slower than Go. On y machine:http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario? Can it easily handle hundreds of (go)routines?go run app.go --releaseWorkers Result Time 4 499500000 27.9226msdub --quiet --build=releaseWorkers Result Time 3 499500000 64 ms It would be cool if someone help me with it. There are docstrings, tests and diagrams. I'll explain more if someone joins..\compare.cmd go run app.go --releaseWorkers Result Time 4 4999500000 25.9163msdub --quiet --build=releaseWorkers Result Time 4 4999500000 116 ms And I had to reduce the count of "threads" to 100 because vibe-core fails on 1000. And I have created thread on dlang/project with an explanation of my vision of concurrency in D: https://github.com/dlang/projects/issues/65
Jun 14 2020
On Sunday, 14 June 2020 at 14:24:29 UTC, Jin wrote:On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:...On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:http://wiki.dlang.org/Go_to_DAny performance comparison with Go? esp. in real word scenario?...I have updated the code. But it isn't ready to use currently because: 1. I rewrote code to use std.parallelism instead of vibe.d. So, it's difficult to integrate fibers with tasks. Now, every tasks spinlocks on waiting channel and main thread don't useful work. 2. Race condition. I'm going to closely review algorithm. Currently it's twice slower than Go. On y machine:I have fixed all issues, and it's usable now. But I had to return vibe-core dependency. Now it's slow down:I haven’t checked your implementation, or vibe’s, but I rediscovered that D’s message passage passing is ~4 times slower than Java: https://forum.dlang.org/thread/mailman.148.1328778563.20196.digitalmars-d puremagic.com?page=4 Is this the same problem in GoD?.\compare.cmd go run app.go --releaseWorkers Result Time 4 4999500000 25.9163msdub --quiet --build=releaseWorkers Result Time 4 4999500000 116 ms And I had to reduce the count of "threads" to 100 because vibe-core fails on 1000. And I have created thread on dlang/project with an explanation of my vision of concurrency in D: https://github.com/dlang/projects/issues/65
Jun 14 2020
On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:DUB module: http://code.dlang.org/packages/jin-go GIT repo: https://github.com/nin-jin/go.d [...]Have you considered using a Disrupter (http://lmax-exchange.github.io/disruptor/) for the channels? Not sure how it compares to what you're using from Vibe.d, but it's not a hard data structure to implement and, IIRC, it allows for multiple producers and consumers.
Mar 30 2016
On Wednesday, 30 March 2016 at 14:28:50 UTC, Casey Sybrandy wrote:On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:Oh, and yes, I know that it would have to be rewritten in D unless there's a C version somewhere. I actually did it once and it wasn't too bad. I don't think I have a copy anymore, but if I do find it, I can put it up somewhere.DUB module: http://code.dlang.org/packages/jin-go GIT repo: https://github.com/nin-jin/go.d [...]Have you considered using a Disrupter (http://lmax-exchange.github.io/disruptor/) for the channels? Not sure how it compares to what you're using from Vibe.d, but it's not a hard data structure to implement and, IIRC, it allows for multiple producers and consumers.
Mar 30 2016
On Wednesday, 30 March 2016 at 15:22:26 UTC, Casey Sybrandy wrote:This is java bloatware. :-(Have you considered using a Disrupter (http://lmax-exchange.github.io/disruptor/) for the channels? Not sure how it compares to what you're using from Vibe.d, but it's not a hard data structure to implement and, IIRC, it allows for multiple producers and consumers.Oh, and yes, I know that it would have to be rewritten in D unless there's a C version somewhere. I actually did it once and it wasn't too bad. I don't think I have a copy anymore, but if I do find it, I can put it up somewhere.
Mar 30 2016
On Wednesday, 30 March 2016 at 15:50:47 UTC, Jin wrote:This is java bloatware. :-(I've never used the library so I can't comment on that, but the actual data structure/algorithm is really pretty simple. The core components are atomic counters and a static array. I think it would be a good data structure for channels.
Mar 30 2016
On Wed, 2016-03-30 at 17:01 +0000, Casey Sybrandy via Digitalmars-d wrote:On Wednesday, 30 March 2016 at 15:50:47 UTC, Jin wrote:=C2=A0=20 This is java bloatware. :-(I've never used the library so I can't comment on that, but the=C2=A0 actual data structure/algorithm is really pretty simple.=C2=A0=C2=A0The=core components are atomic counters and a static array.=C2=A0=C2=A0I thin=k=C2=A0it would be a good data structure for channels.If I recollect correctly, the core data structure is a lock-free ring buffer, and the parallelism "trick" the use of multicast with atomic indexes. This works fine for the problem of creating a trading framework, but I suspect the architecture is just too big for realizing channels. In particular, the really important thing about channels over (thread|process)-safe queues is the ability to select. I have no idea how select is implemented on Windows but the classic Posix approach is to use file descriptors to represent the queues and select or epoll system calls to get the kernel to realize the select. As to how JCSP does select on the JVM, I shall have to go and delve into the source code=E2=80=A6 =C2=A0 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Apr 01 2016
On Wed, 2016-03-30 at 15:50 +0000, Jin via Digitalmars-d wrote:On Wednesday, 30 March 2016 at 15:22:26 UTC, Casey Sybrandy wrote:=C2=A0=20=20 Have you considered using a Disrupter=C2=A0 (http://lmax-exchange.github.io/disruptor/) for the channels?=C2=A0==C2=A0Not sure how it compares to what you're using from Vibe.d, but=C2=A0 it's not a hard data structure to implement and, IIRC, it=C2=A0 allows for multiple producers and consumers.Oh, and yes, I know that it would have to be rewritten in D=C2=A0 unless there's a C version somewhere.=C2=A0=C2=A0I actually did it once==C2=A0and it wasn't too bad.=C2=A0=C2=A0I don't think I have a copy anymore,=I think that is probably just slander. Whilst there are some known problems with The Disruptor, blithely labelling it "java bloatware" is most likely an uneducated, and probably ill-judged, comment. =C2=A0=C2=A0 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderbut if I do find it, I can put it up somewhere.This is java bloatware. :-(
Apr 01 2016
While having a CSP style mechanism, with cheap stackful tasks/procs/coroutines and channels has merit, pursing a performance test with a 1000 (or more) trivial instances of such tasks seems misguided. In the last couple of years, I used Go to design and implement a subsystem as a process using CSP style, and many goroutines. Even there I only had on the order of 20 - 40 goroutines present at once. Then possibly around 4-8 additional goroutines for each simultaneous client service I was performing. Now it may have been theoretically possible to cause that to spawn on the order of 64k goroutines if stressed the "correct" way in an absurd deployment. That would be improbable, and a max of around 500 would be more likely, even there performance was gated upon Internet RTTs. I'll have to re-read the code next week to get the actual number, but I expect it would be of that order of 30 or so.
Jan 04