digitalmars.D - Oh, my GoD! Goroutines on D

Jin (77/77) Mar 27 2016 DUB module: http://code.dlang.org/packages/jin-go

deadalnix (3/5) Mar 27 2016 Note that this is also the case in go. Yes, contrary to what is
Walter Bright (2/3) Mar 27 2016 Nice! Please write an article about this!

Jin (2/5) Mar 28 2016 My english is too bad to write articles, sorry :-(

Lass Safin (2/8) Mar 28 2016 Just use engrish, we won't care. Really.
Russel Winder via Digitalmars-d (13/15) Mar 28 2016 Write the article content and then get someone who is a person good at
Walter Bright (3/4) Mar 28 2016 Baloney, your english is very good. Besides, I'm sure there will be many...

Jin (5/9) Mar 28 2016 I just wrote the article on russin:

Walter Bright (2/11) Mar 28 2016 Awesome! Who wants to help with the English?

Walter Bright (15/29) Mar 28 2016 I'll start with the first paragraph (I don't know any Russian):

sigod (3/13) Mar 29 2016 Create repository on GitHub. So, it will be easier for others to

Jacob Carlborg (7/11) Mar 28 2016 It would be useful with a wiki page, or similar, that describes and

=?UTF-8?Q?Ali_=c3=87ehreli?= (22/32) Mar 28 2016 And make sure to tell me about it so that my DConf presentation will be

Jacob Carlborg (5/7) Mar 28 2016 I was hoping someone could give _me_ the links, that's why I wrote the

Dejan Lekic (5/10) Mar 29 2016 +1

Jin (2/6) Mar 29 2016 http://wiki.dlang.org/Go_to_D

H. S. Teoh (9/15) Mar 29 2016 Since I know a bit of Russian, I took a shot at improving this
mw (3/4) May 16 2020 Any performance comparison with Go? esp. in real word scenario?

Russel Winder (12/18) May 17 2020 Seems to have been created four years ago and then left fallow. Perhaps ...

Andre Pany (5/15) May 18 2020 ;)
Seb (3/13) May 19 2020 FYI: channels are also part of vibe-core since a while:

Russel Winder (12/15) May 19 2020 d
Jin (3/18) May 25 2020 Yes. But it uses mutex. My implementation is wait-free

mw (24/29) Jun 14 2020 Just saw this, for 1-provider-1-consumer queue, I did some

Bienlein (18/23) May 19 2020 Go can easily have some ten thousand green threads. I once did a

Panke (2/8) May 19 2020 The continuation is implicit when using fibers, isn't it?
Russel Winder (42/67) May 19 2020 It wouldn't surprise me if std.parallelism couldn't do something analogo...

Jin (15/22) May 25 2020 I have updated the code. But it isn't ready to use currently

Mathias LANG (58/76) May 25 2020 This is a problem that's of interest to me as well, and I've been

Johannes Loher (4/8) May 25 2020 I believe Weka did that with their own fiber implementation in
Russel Winder (44/49) May 26 2020 [=E2=80=A6]
mw (11/31) Jun 14 2020 ...

mw (9/16) Jun 14 2020 You can try it here:

Jin (37/57) Jun 14 2020 My wheels are quite simple:

mw (8/13) Jun 14 2020 ...
mw (10/14) Jun 14 2020 Since you are using 1-provider-1-consumer queue, can you try this

Jin (17/32) Jun 15 2020 I have added a new release and a PR to your repo.

mw (14/20) Jun 15 2020 on my same machine, go.d:

Petar Kirov [ZombineDev] (3/25) Jun 16 2020 There's a SPMC/MPSC queue implementation here:

Jin (66/76) Jan 02 Hello everyone, I've done a little refactoring and optimization

Guillaume Piolat (14/16) Jan 02 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67

Jin (5/21) Jan 03 I didn't really understand what you were talking about. After
Brad Roberts (21/42) Jan 03 Probably the most common atomicity error, is atomic check followed by
claptrap (7/23) Jan 03 Its a single producer, single consumer queue, so only one thread

claptrap (6/14) Jan 03 Have you considered not wrapping the offsets, and only modulus
Sebastiaan Koppe (9/35) Jan 04 Since you are essentially testing the speed of your spsc ring
Jin (6/14) Jan 06 I found a bug. I first checked if there was anything in the

Jin (20/25) Jan 08 I looked into the core.atomic code and was very disappointed. I

claptrap (5/14) Jan 08 Technically it is lock free but not wait free. No thread can

Jin (6/10) Jan 08 For wait-free in general and cyclic buffers in particular, it is

Richard (Rikki) Andrew Cattermole (5/31) Jan 08 Dmd will not inline functions with inline assembly, any function calls

Jin (17/21) Jan 08 Visible reordering can occur due to the asynchronous nature of

Richard (Rikki) Andrew Cattermole (5/30) Jan 08 Not macros, what you want is intrinsics, this is how core.atomics works

Jin (4/8) Jan 13 Unfortunately, support from all compilers to wait long. The

Jin (12/39) Jun 14 2020 I have fixed all issues, and it's usable now. But I had to return

mw (8/41) Jun 14 2020 ...

Casey Sybrandy (6/9) Mar 30 2016 Have you considered using a Disrupter

Casey Sybrandy (5/15) Mar 30 2016 Oh, and yes, I know that it would have to be rewritten in D

Jin (2/11) Mar 30 2016 This is java bloatware. :-(

Casey Sybrandy (5/6) Mar 30 2016 I've never used the library so I can't comment on that, but the

Russel Winder via Digitalmars-d (26/33) Apr 01 2016 =C2=A0

Russel Winder via Digitalmars-d (19/32) Apr 01 2016 =C2=A0

Derek Fawcus (16/16) Jan 04 While having a CSP style mechanism, with cheap stackful

Jin <nin-jin ya.ru> writes:

DUB module: http://code.dlang.org/packages/jin-go
GIT repo: https://github.com/nin-jin/go.d

Function "go" starts coroutines (vibe.d tasks) in thread pool 
like in Go language:

	unittest
	{
		import core.time;
		import jin.go;

		__gshared static string[] log;

		static void saying( string message )
		{
			for( int i = 0 ; i < 3 ; ++i ) {
				sleep( 100.msecs );
				log ~= message;
			}
		}

		go!saying( "hello" );
		sleep( 50.msecs );
		saying( "world" );

		log.assertEq([ "hello" , "world" , "hello" , "world" , "hello" 
, "world" ]);
	}

Class "Channel" is wait-free one-consumer-one-provider channel. 
By default, Channel blocks thread on receive from clear channel 
or send to full channel:

unittest
{
	static void summing( Channel!int output ) {
		foreach( i ; ( output.size * 2 ).iota ) {
			output.next = 1;
		}
		output.close();
	}

	auto input = go!summing;
	while( !input.full ) yield;

	input.sum.assertEq( input.size * 2 );
}

You can no wait if you do not want:

	unittest
	{
		import core.time;
		import jin.go;

		static auto after( Channel!bool channel , Duration dur )
		{
			sleep( dur );
			if( !channel.closed ) channel.next = true;
		}

		static auto tick( Channel!bool channel , Duration dur )
		{
			while( !channel.closed ) after( channel , dur );
		}

		auto ticks = go!tick( 101.msecs );
		auto booms = go!after( 501.msecs );

		string log;

		while( booms.clear )
		{
			while( !ticks.clear ) {
				log ~= "tick";
				ticks.popFront;
			}
			log ~= ".";
			sleep( 51.msecs );
		}
		log ~= "BOOM!";

		log.assertEq( "..tick..tick..tick..tick..BOOM!" );
	}

Channel are InputRange and OutputRange compatible. Structs 
"Inputs" and "Outputs" are round-robin facade to array of 
channels.

More examples in unit tests: 
https://github.com/nin-jin/go.d/blob/master/source/jin/go.d#L293

Current problems:

1. You can give channel to more than two thread. I'm going to 
play with unique pointers to solve this problem. Any hints?

2. Sometimes you must close channel to notify partner to break 
range cycle. Solving (1) problem can solve and this.

3. API may be better. Advices?

Mar 27 2016

deadalnix <deadalnix gmail.com> writes:

On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:
 1. You can give channel to more than two thread. I'm going to 
 play with unique pointers to solve this problem. Any hints?

Note that this is also the case in go. Yes, contrary to what is 
usually said, go is dead usnafe when it come to threads.

Mar 27 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 3/27/2016 11:17 AM, Jin wrote:
 [...]

Nice! Please write an article about this!

Mar 27 2016

Jin <nin-jin ya.ru> writes:

On Sunday, 27 March 2016 at 20:39:57 UTC, Walter Bright wrote:
 On 3/27/2016 11:17 AM, Jin wrote:
 [...]

 Nice! Please write an article about this!

My english is too bad to write articles, sorry :-(

Mar 28 2016

Lass Safin <lasssafin gmail.com> writes:

On Monday, 28 March 2016 at 13:10:45 UTC, Jin wrote:
 On Sunday, 27 March 2016 at 20:39:57 UTC, Walter Bright wrote:
 On 3/27/2016 11:17 AM, Jin wrote:
 [...]

 Nice! Please write an article about this!

 My english is too bad to write articles, sorry :-(

Just use engrish, we won't care. Really.

Mar 28 2016

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Mon, 2016-03-28 at 13:10 +0000, Jin via Digitalmars-d wrote:
=C2=A0
 My english is too bad to write articles, sorry :-(

Write the article content and then get someone who is a person good at
writing in English to ghostwrite the article from your content.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Mar 28 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2016 6:10 AM, Jin wrote:
 My english is too bad to write articles, sorry :-(

Baloney, your english is very good. Besides, I'm sure there will be many 
volunteers here to help you touch it up.

Mar 28 2016

Jin <nin-jin ya.ru> writes:

On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:
 On 3/28/2016 6:10 AM, Jin wrote:
 My english is too bad to write articles, sorry :-(

 Baloney, your english is very good. Besides, I'm sure there 
 will be many volunteers here to help you touch it up.

I just wrote the article on russin: 
https://habrahabr.ru/post/280378/
Translation to english: 
https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2F

Mar 28 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2016 3:35 PM, Jin wrote:
 On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:
 On 3/28/2016 6:10 AM, Jin wrote:
 My english is too bad to write articles, sorry :-(

 Baloney, your english is very good. Besides, I'm sure there will be many
 volunteers here to help you touch it up.

 I just wrote the article on russin: https://habrahabr.ru/post/280378/
 Translation to english:
 https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2F

Awesome! Who wants to help with the English?

Mar 28 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2016 5:07 PM, Walter Bright wrote:
 On 3/28/2016 3:35 PM, Jin wrote:
 On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:
 On 3/28/2016 6:10 AM, Jin wrote:
 My english is too bad to write articles, sorry :-(

 Baloney, your english is very good. Besides, I'm sure there will be many
 volunteers here to help you touch it up.

 I just wrote the article on russin: https://habrahabr.ru/post/280378/
 Translation to english:
 https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2F

 Awesome! Who wants to help with the English?

I'll start with the first paragraph (I don't know any Russian):

Google:

Multitasking - this is what is implemented in Go of the good, though not 
perfect.  Nice syntax with a tart aftertaste, simple and powerful abstraction, 
bribe its elegance compared to other imperative languages.  And taste the best, 
so do not want to have to slide to mediocrity.  Therefore, if and switch to 
another language, it must be even more expressive and with no less sensible 
multitasking.

English:

Multitasking - Go's multitasking capabilities are good, though not perfect.  It 
has a nice syntax with a sweet aftertaste, simple and powerful abstraction, 
elegant compared to other imperative languages.  It exhibits good taste, so one 
does not wish to compromise into mediocrity.  Therefore, to switch to another 
language, it must be even more expressive and with no less sensible
multitasking.

Mar 28 2016

sigod <sigod.mail gmail.com> writes:

On Monday, 28 March 2016 at 22:35:12 UTC, Jin wrote:
 On Monday, 28 March 2016 at 19:29:55 UTC, Walter Bright wrote:
 On 3/28/2016 6:10 AM, Jin wrote:
 My english is too bad to write articles, sorry :-(

 Baloney, your english is very good. Besides, I'm sure there 
 will be many volunteers here to help you touch it up.

 I just wrote the article on russin: 
 https://habrahabr.ru/post/280378/
 Translation to english: 
 https://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F280378%2F

Create repository on GitHub. So, it will be easier for others to 
help with translation.

Mar 29 2016

Jacob Carlborg <doob me.com> writes:

On 2016-03-27 20:17, Jin wrote:
 DUB module: http://code.dlang.org/packages/jin-go
 GIT repo: https://github.com/nin-jin/go.d

 Function "go" starts coroutines (vibe.d tasks) in thread pool like in Go
 language:

It would be useful with a wiki page, or similar, that describes and 
compares different ways of doing concurrency in D, both built-in and 
third party libraries like this and vibe.d. Also compare against other 
languages like Go, Erlang, Scala and so on.

-- 
/Jacob Carlborg

Mar 28 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 03/28/2016 03:49 AM, Jacob Carlborg wrote:
 On 2016-03-27 20:17, Jin wrote:
 DUB module: http://code.dlang.org/packages/jin-go
 GIT repo: https://github.com/nin-jin/go.d

 Function "go" starts coroutines (vibe.d tasks) in thread pool like in Go
 language:

 It would be useful with a wiki page, or similar, that describes and
 compares different ways of doing concurrency in D, both built-in and
 third party libraries like this and vibe.d. Also compare against other
 languages like Go, Erlang, Scala and so on.

And make sure to tell me about it so that my DConf presentation will be 
more complete: :)

   http://dconf.org/2016/talks/cehreli.html

That abstract is awfully short but I've posted a pull request to improve 
it a little bit:

<quote>
D provides support for multitasking in the form of language features and 
standard library modules. D makes it easy for your programs to perform 
multiple tasks at the same time. This kind of support is especially 
important in order to take advantage of the multiple CPU cores that are 
available on modern computing systems.

Multitasking is one of the most difficult computing concepts to 
implement correctly. This talk will introduce different kinds of 
multitasking, as well as parallelism, a concept which is in fact 
unrelated to, but is often confused with, multitasking. The talk will 
conclude with fibers (aka co-routines), a powerful tool that is often 
overlooked despite its convenience.
</quote>

Seriously, I appreciate any documentation links that you can give to 
complete my "homework" before DConf. :)

Ali

Mar 28 2016

Jacob Carlborg <doob me.com> writes:

On 2016-03-29 01:53, Ali Çehreli wrote:

 Seriously, I appreciate any documentation links that you can give to
 complete my "homework" before DConf. :)

I was hoping someone could give _me_ the links, that's why I wrote the 
post ;)

-- 
/Jacob Carlborg

Mar 28 2016

Dejan Lekic <dejan.lekic gmail.com> writes:

On Monday, 28 March 2016 at 10:49:28 UTC, Jacob Carlborg wrote:
 It would be useful with a wiki page, or similar, that describes 
 and compares different ways of doing concurrency in D, both 
 built-in and third party libraries like this and vibe.d. Also 
 compare against other languages like Go, Erlang, Scala and so 
 on.

+1
Wiki is absolutely the best solution to this, I agree. Plus, we 
already have a wiki so he should just go there and start writing. 
The community will incorrect grammar/syntax and typos.

Mar 29 2016

Jin <nin-jin ya.ru> writes:

On Tuesday, 29 March 2016 at 12:30:24 UTC, Dejan Lekic wrote:
 +1
 Wiki is absolutely the best solution to this, I agree. Plus, we 
 already have a wiki so he should just go there and start 
 writing. The community will incorrect grammar/syntax and typos.

http://wiki.dlang.org/Go_to_D

Mar 29 2016

H. S. Teoh <hsteoh quickfur.ath.cx> writes:

On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 On Tuesday, 29 March 2016 at 12:30:24 UTC, Dejan Lekic wrote:
 +1
 Wiki is absolutely the best solution to this, I agree. Plus, 
 we already have a wiki so he should just go there and start 
 writing. The community will incorrect grammar/syntax and typos.

 http://wiki.dlang.org/Go_to_D

Since I know a bit of Russian, I took a shot at improving this 
article,
and got partway through the "Channels" section.  But now I need 
to get
back to work... so hopefully somebody else can work on improving 
the
English text. :-)


--T

Mar 29 2016

mw <mingwu gmail.com> writes:

On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

Any performance comparison with Go? esp. in real word scenario?

Can it easily handle hundreds of (go)routines?

May 16 2020

Russel Winder <russel winder.org.uk> writes:

On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

=20
 Any performance comparison with Go? esp. in real word scenario?
=20
 Can it easily handle hundreds of (go)routines?

Seems to have been created four years ago and then left fallow. Perhaps it
should be resurrected  and integrated into Phobos? Or left as a package in =
the
Dub repository?

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

May 17 2020

Andre Pany <andre s-e-a-p.de> writes:

On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:
 On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 
 Any performance comparison with Go? esp. in real word scenario?
 
 Can it easily handle hundreds of (go)routines?

 Seems to have been created four years ago and then left fallow. 
 Perhaps it should be resurrected  and integrated into Phobos? 
 Or left as a package in the Dub repository?

;)

https://github.com/nin-jin/go.d/issues/2

Kind regards
Andre

May 18 2020

Seb <seb wilzba.ch> writes:

On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:
 On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 
 Any performance comparison with Go? esp. in real word scenario?
 
 Can it easily handle hundreds of (go)routines?

 Seems to have been created four years ago and then left fallow. 
 Perhaps it should be resurrected  and integrated into Phobos? 
 Or left as a package in the Dub repository?

FYI: channels are also part of vibe-core since a while:

https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d

May 19 2020

Russel Winder <russel winder.org.uk> writes:

On Tue, 2020-05-19 at 09:15 +0000, Seb via Digitalmars-d wrote:
[=E2=80=A6]
 FYI: channels are also part of vibe-core since a while:
=20
 https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.=

d

I will have to investigate. Sounds like vibe.d can be used as a tasks with
channels on a threadpool.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

May 19 2020

Jin <nin-jin ya.ru> writes:

On Tuesday, 19 May 2020 at 09:15:24 UTC, Seb wrote:
 On Sunday, 17 May 2020 at 15:17:44 UTC, Russel Winder wrote:
 On Sat, 2020-05-16 at 20:06 +0000, mw via Digitalmars-d wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 
 Any performance comparison with Go? esp. in real word 
 scenario?
 
 Can it easily handle hundreds of (go)routines?

 Seems to have been created four years ago and then left 
 fallow. Perhaps it should be resurrected  and integrated into 
 Phobos? Or left as a package in the Dub repository?

 FYI: channels are also part of vibe-core since a while:

 https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d

Yes. But it uses mutex. My implementation is wait-free 
(https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All
threads can easy and fast write and read without any locking. So every queue is
1-provider-1-consumer. But Input and Output channels is roundrobin list of
queues. You can found some diagrams there:
https://github.com/nin-jin/go.d/blob/master/readme.drawio.svg

May 25 2020

mw <mingwu gmail.com> writes:

On Monday, 25 May 2020 at 16:41:06 UTC, Jin wrote:
 FYI: channels are also part of vibe-core since a while:

 https://github.com/vibe-d/vibe-core/blob/master/source/vibe/core/channel.d

 Yes. But it uses mutex. My implementation is wait-free 
 (https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom). All
threads can easy and fast write and read without any locking. So every queue is
1-provider-1-consumer. But Input and Output channels is roundrobin list of
queues. You can found some diagrams there:
https://github.com/nin-jin/go.d/blob/master/readme.drawio.svg


Just saw this, for 1-provider-1-consumer queue, I did some 
experiments with

https://code.dlang.org/packages/lock-free

and the result is here:

https://github.com/mingwugmail/liblfdsd/tree/master/comparison

LDC is ~5x times faster than DMD, I'm not so sure why; and this 
package may need more stress testing:

------------------------------------------------------------------------
$ make ldcrun
...
received 1000000000 messages in 9845 msec sum=499999999500000000 
speed=101574 msg/msec


$ make dmdrun
...
received 1000000000 messages in 53607 msec sum=499999999500000000 
speed=18654 msg/msec


$ make javamp
10000000 messages received in 1151.0 ms, sum=49999995000000 
speed: 0.1151 microsec/message, 8688.097306689835 messages/msec

$ make dmp
received 100000000 messages in 27574 msec sum=4999999950000000 
speed=3626 msg/msec
------------------------------------------------------------------------

Jun 14 2020

Bienlein <ffm2002 web.de> writes:

On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word scenario?

 Can it easily handle hundreds of (go)routines?

Go can easily have some ten thousand green threads. I once did a 
test run to see how many. On a 4 GB machine 80.000 green threads 
aka goroutines were created till out of memory occured.

Communicating sequential processes as in Go relies on being able 
to create a large amount of threads. With a paradigm of threads 
doing blocking takes on channels any application would otherwise 
quickly run out of threads. In D something similar could be done 
using fibers. Using fibers is also the approach chosen in Java 
extending the JVM to have CSP-stye concurrency as in Go, see 
https://www.youtube.com/watch?v=lIq-x_iI-kc

Then you also need continuations. Lets say inside a function a 
blocking take is done on two channels in a row. The first channel 
has some input, the next one has not. In between comes a context 
switch. When switching back the value taken from the first 
channel has to be put back into the context. This is why 
continuations are needed.

Really nice work! Please keep it going :-)

May 19 2020

Panke <tobias pankrath.net> writes:

On Tuesday, 19 May 2020 at 08:42:14 UTC, Bienlein wrote:
 Then you also need continuations. Lets say inside a function a 
 blocking take is done on two channels in a row. The first 
 channel has some input, the next one has not. In between comes 
 a context switch. When switching back the value taken from the 
 first channel has to be put back into the context. This is why 
 continuations are needed.

The continuation is implicit when using fibers, isn't it?

May 19 2020

Russel Winder <russel winder.org.uk> writes:

On Tue, 2020-05-19 at 08:42 +0000, Bienlein via Digitalmars-d wrote:
 On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

=20
 Any performance comparison with Go? esp. in real word scenario?
=20
 Can it easily handle hundreds of (go)routines?

=20
 Go can easily have some ten thousand green threads. I once did a=20
 test run to see how many. On a 4 GB machine 80.000 green threads=20
 aka goroutines were created till out of memory occured.

It wouldn't surprise me if std.parallelism couldn't do something analogous.
There are tasks that are executed (potentially with work stealing) by threa=
ds
in a threadpool. The problem is that the std.parallelism API is dedicated t=
o
data parallelism rather than the process/channels concurrency/parallelism o=
f
CSP (the theoretical foundation for goroutines =E2=80=93 sort of but you ha=
ve to read
Rob Pike's articles to see why).

I am fairly certain, but have not yet checked, that the idea of task is ver=
y
similar in std.parallelism and jin.go =E2=80=93 which is based on vibe.d's =
event loop
as I understand it (but could be wrong). Vibe.d allows a threadpool, though=
 I
suspect most people use it single threaded. So Vibe.d and std.parallelism h=
ave
a lot in common. I'll bet there is much that could be factored out and shar=
ed,
but realistically this isn't going to happen.

 Communicating sequential processes as in Go relies on being able=20
 to create a large amount of threads. With a paradigm of threads=20
 doing blocking takes on channels any application would otherwise=20
 quickly run out of threads. In D something similar could be done=20
 using fibers. Using fibers is also the approach chosen in Java=20
 extending the JVM to have CSP-stye concurrency as in Go, see=20
 https://www.youtube.com/watch?v=3DlIq-x_iI-kc

I wonder if we should use "thread" for heavyweight (OS?) thread, "fibre" fo=
r
fibre (including lightweight threads from before threads became threads), a=
nd
task for the things that get put into the job queues of a threadpool. D
already has threadpool in std.parallelism, perhaps it needs extracting (as =
in
Java) so it can be the basis of Vibe.d =E2=80=93 or vice versa, obviously.

 Then you also need continuations. Lets say inside a function a=20
 blocking take is done on two channels in a row. The first channel=20
 has some input, the next one has not. In between comes a context=20
 switch. When switching back the value taken from the first=20
 channel has to be put back into the context. This is why=20
 continuations are needed.

Are you sure? Isn't the whole point of tasks (processes in CSP jargon) and
channels is that you don't need continuations. A CSP computation should not
even understand the idea of a context switch. If it matters, has perhaps, t=
he
concept of fibre intruded in a way that violates the abstraction?

 Really nice work! Please keep it going :-)

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

May 19 2020

Jin <nin-jin ya.ru> writes:

On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word scenario?

 Can it easily handle hundreds of (go)routines?

I have updated the code. But it isn't ready to use currently 
because:

1. I rewrote code to use std.parallelism instead of vibe.d. So, 
it's difficult to integrate fibers with tasks. Now, every tasks 
spinlocks on waiting channel and main thread don't useful work.

2. Race condition. I'm going to closely review algorithm.

Currently it's twice slower than Go. On y machine:

go run app.go --release

Workers Result          Time
4       499500000       27.9226ms

dub --quiet --build=release

Workers Result          Time
3       499500000       64 ms

It would be cool if someone help me with it. There are 
docstrings, tests and diagrams. I'll explain more if someone 
joins.

May 25 2020

Mathias LANG <geod24 gmail.com> writes:

On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
 On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word scenario?

 Can it easily handle hundreds of (go)routines?

 I have updated the code. But it isn't ready to use currently 
 because:

 1. I rewrote code to use std.parallelism instead of vibe.d. So, 
 it's difficult to integrate fibers with tasks. Now, every tasks 
 spinlocks on waiting channel and main thread don't useful work.

 2. Race condition. I'm going to closely review algorithm.

 [...]

 It would be cool if someone help me with it. There are 
 docstrings, tests and diagrams. I'll explain more if someone 
 joins.

This is a problem that's of interest to me as well, and I've been 
working on this for a few months (on and off).
I had to eventually ditch `std.concurrency` because of some 
design decisions that made things hard to work with.

`std.concurrency`'s MessageBox were originally designed to be 
only between threads. As such, they come with all the locking 
you'd expect from a cross-thread message-passing data structure. 
Support for fibers was added as an afterthought. You can even see 
it in the documentation 
(https://dlang.org/phobos/std_concurrency.html), where "thread" 
is mentioned all over the place. The module doc kinda makes it 
get away with it because it calls fibers "logical threads", but 
that distinction is not always made. It also have some concept 
that make a lot of sense for threads, but much less so for Fibers 
(such as the "owner" concept, which is the task that `spawn`ed 
you). Finally, it forces messages to be `shared` or isolated 
(read: with only `immutable` indirections), which doesn't make 
sense when you're dealing only with Fibers on the same thread.

We found some ridiculous issues when trying to use it. We 
upstreamed some fixes (https://github.com/dlang/phobos/pull/7096, 
https://github.com/dlang/phobos/pull/6738) and put a bounty on 
one of the issue which led to someone finding the bug in 
`std.concurrency` 
(https://github.com/Geod24/localrest/pull/5#issuecomment-523707490). After some
playing around with it, we just gave up and forked the whole module and started
to change it to make it behave more like channels. There are some other issues
I found while refactoring which I might upstream in the future, but it needs so
much work that I might as well PR a whole new module.

What we're trying to achieve is to move from a MessageBox 
approach, where there is a 1-to-1 relationship between a task (or 
logical thread) and a MessageBox, to a channel-like model, where 
there is a N-to-1 relationship (See Go's select).

In order to achieve Go-like performance, we need a few things 
though:
- Direct hand-off semantic for same-thread message passing: 
Meaning that if Fiber A sends a message to Fiber B, and they are 
both in the same thread, there is an immediate context switch 
from A to B, without going through the scheduler;
- Thread-level multiplexing of receive: With the current 
`std.concurrency`, calling `receive` yield the fiber and might 
block the Thread. The scheduler simply iterate over all Fibers in 
a linear order, which means you could end up in a situation 
where, if you have 3 Fibers, and they all `receive` one after the 
other, you'll end up being blocked on the *first* one receiving a 
message to wait the other ones up.
- Smaller Fibers: Goroutine can have very, very small stack. They 
don't stack overflow because they are managed (whenever you need 
to allocate more stack, there use to be a check for stack 
overflow, and stack "regions" were/are essentially a linked list 
and need not be contiguous in memory). On the other hand we use 
simple regular fiber context switching, which is much more 
expensive. In that area, I think exploring the idea of a 
stackless coroutine based scheduler could be worthwhile.

This google doc has a lot of good informations, if you're 
interested: 
https://docs.google.com/document/d/1yIAYmbvL3JxOKOjuCyon7JhW4cSv1wy5hC0ApeGMV9s/pub

It's still a problem we're working on, as some issues are unique 
to D and we haven't found a good solution (e.g. requiring 
`shared` for same-thread Fiber communication is quite 
problematic). If we ever reach a satisfying solution I'll try 
upstreaming it.

May 25 2020

Johannes Loher <johannes.loher fg4f.de> writes:

On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:
 - Direct hand-off semantic for same-thread message passing: 
 Meaning that if Fiber A sends a message to Fiber B, and they 
 are both in the same thread, there is an immediate context 
 switch from A to B, without going through the scheduler;

I believe Weka did that with their own fiber implementation in 
Mecca. I think I remember  Shachar mentioning this during his 
talk at DConf (2018?)

May 25 2020

Russel Winder <russel winder.org.uk> writes:

On Tue, 2020-05-26 at 01:27 +0000, Mathias LANG via Digitalmars-d wrote:
[=E2=80=A6]
=20
 This is a problem that's of interest to me as well, and I've been=20
 working on this for a few months (on and off).
 I had to eventually ditch `std.concurrency` because of some=20
 design decisions that made things hard to work with.

[=E2=80=A6]

I am fairly sure std.parallelism is a better place to get threadpools, task=
s,
scheduling, work stealing, etc. However it is all packaged with a view to
implementing SMP parallelism in D.

I haven't been following, but many others including Vibe.d have implemented
either fibres and yield or tasks/threadpools and channels =E2=80=93 the bit=
 missing
from std.parallelism since it isn't needed for SMP parallelism, but is if y=
ou
take the tasks and threadpools out of that context.

What has happened in Rust, and to a great extent in the JVM arena is that
there has been an implementation of fibres/yield, futures and async, and/or
task and threadpool that has been centralised and then everyone else has
evolved to use it rather than having multiple implementations of all the
ideas. In the JVM milieu there is still a lot of NIH replication but then t=
hey
have lots of money and resources.

Strategically if there were to be one set of Dub packages doing this low le=
vel
stuff that people worked on that then everyone else used that would be good=
.
Then the question is whether to deprecate std.parallelism and rebuild it ba=
sed
on the new low level code. Answer yes. Perhaps the std.parallelism stuff co=
uld
actually provide a basis for some of this low level code along with vibe.co=
re
stuff and mayhap the Mecca stuff. My feeling is the time for everyone
implements their own is long past, it is time for all to join in on a stand=
ard
for set of tools for D. This includes removing the Fibres stuff from
std.concurrency.
=20
So yes I am up for contributing.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

May 26 2020

mw <mingwu gmail.com> writes:

On Tuesday, 26 May 2020 at 01:27:49 UTC, Mathias LANG wrote:
 On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
 On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word 
 scenario?



...
 This is a problem that's of interest to me as well, and I've 
 been working on this for a few months (on and off).
 I had to eventually ditch `std.concurrency` because of some 
 design decisions that made things hard to work with.

 `std.concurrency`'s MessageBox were originally designed to be 
 only between threads. As such, they come with all the locking 
 you'd expect from a cross-thread message-passing data structure.

...
 It's still a problem we're working on, as some issues are 
 unique to D and we haven't found a good solution (e.g. 
 requiring `shared` for same-thread Fiber communication is quite 
 problematic). If we ever reach a satisfying solution I'll try 
 upstreaming it.

...

Have you tried lock-free queue?

https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer)

Java uses the same algorithm for ConcurrentLinkedQueue (in C 
implementation).

I tried some small examples with liblfds, got slightly better 
performance than Java. Maybe we don’t want to reinvent the 
wheels, esp the well tested ones.

Jun 14 2020

mw <mingwu gmail.com> writes:

On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote:
 Have you tried lock-free queue?

 https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer)

 Java uses the same algorithm for ConcurrentLinkedQueue (in C 
 implementation).

 I tried some small examples with liblfds, got slightly better 
 performance than Java. Maybe we don’t want to reinvent the 
 wheels, esp the well tested ones.

You can try it here:

https://github.com/mingwugmail/liblfdsd

only 
https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_prod
cer,_many_consumer) for now.

```
received 100000000 messages in 4632 msec sum=4999999950000000 
speed=21588 msg/msec
```

Jun 14 2020

Jin <nin-jin ya.ru> writes:

On Sunday, 14 June 2020 at 19:49:46 UTC, mw wrote:
 On Sunday, 14 June 2020 at 17:10:14 UTC, mw wrote:
 Have you tried lock-free queue?

 https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(unbounded,_many_producer,_many_consumer)

 Java uses the same algorithm for ConcurrentLinkedQueue (in C 
 implementation).

 I tried some small examples with liblfds, got slightly better 
 performance than Java. Maybe we don’t want to reinvent the 
 wheels, esp the well tested ones.

 You can try it here:

 https://github.com/mingwugmail/liblfdsd

 only 
 https://www.liblfds.org/mediawiki/index.php?title=r7.1.1:Queue_(bounded,_many_prod
cer,_many_consumer) for now.

 ```
 received 100000000 messages in 4632 msec sum=4999999950000000 
 speed=21588 msg/msec
 ```

My wheels are quite simple: 
https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d

I have written your bench 
(https://github.com/mingwugmail/liblfdsd/blob/master/liblfds.dpp#L67) with go.d:

```
const int n = 100_000_000;

void threadProducer(Output!int queue)
{
   foreach (int i; 0..n) {
	queue.put(i);
   }
}

void main()
{
	Input!int queue;
	go!threadProducer(queue.pair);

	StopWatch sw;
	sw.start();
	long sum = 0;

	foreach (p; queue)
	{
		sum += p;
	}

	sw.stop();

	writefln("received %d messages in %d msec sum=%d speed=%d 
msg/msec", n,
			sw.peek.total!"msecs", sum, n / sw.peek.total!"msecs");
	
	assert(sum == (n * (n - 1) / 2));
}
```

The code is simpler. On my laptop it gives:

```
 dub --quiet --build=release

received 100000000 messages in 10011 msec sum=4999999950000000 
speed=9989 msg/msec
```

Jun 14 2020

mw <mingwu gmail.com> writes:

On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:
 https://github.com/mingwugmail/liblfdsd
 speed=21588 msg/msec


...
 My wheels are quite simple: 
 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d

...
 speed=9989 msg/msec

Can you make a new go.d release to 
https://github.com/nin-jin/go.d/releases, or create a PR  using 
your jin.go.queue to 
https://github.com/mingwugmail/liblfdsd/tree/master/comparison

So either you or me can run the comparison on the same machine?

Jun 14 2020

mw <mingwu gmail.com> writes:

On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:
 My wheels are quite simple: 
 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d

...
 received 100000000 messages in 10011 msec sum=4999999950000000 
 speed=9989 msg/msec

Since you are using 1-provider-1-consumer queue, can you try this 
package as a drop-in replacement in go.d:

https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d

(you may need to add align() to that implementation).

according to my test:

received 1000000000 messages in 9845 msec sum=499999999500000000 
speed=101574 msg/msec

you may get ~10x boost.

Jun 14 2020

Jin <nin-jin ya.ru> writes:

On Monday, 15 June 2020 at 01:55:27 UTC, mw wrote:
 On Sunday, 14 June 2020 at 22:57:25 UTC, Jin wrote:
 My wheels are quite simple: 
 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d

 ...
 received 100000000 messages in 10011 msec sum=4999999950000000 
 speed=9989 msg/msec

 Since you are using 1-provider-1-consumer queue, can you try 
 this package as a drop-in replacement in go.d:

 https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d

 (you may need to add align() to that implementation).

 according to my test:

 received 1000000000 messages in 9845 msec 
 sum=499999999500000000 speed=101574 msg/msec

 you may get ~10x boost.

I have added a new release and a PR to your repo.

But I don't think it's a good idea to replace jin.go.queue by 
lock_free.rwqueue because:

1. My API is std.range compatible.
2. I use the same API for queue and channels.
3. My API supports finalization (by provider or consumer or both).
4. Your queue is fixed size. But my channels operate with set of 
queues that can have different sizes.

But I would steal some optimization:

1. Power of 2 capacity and bitop instead of division. It can add 
10% performance.
2. The cache line size should be a minimum size of the message to 
prevent false sharing.

You are using atomicLoad!(MemoryOrder.acq) at there: 
https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41
Is this really required? CPU can't reorder dependent statements.

Jun 15 2020

mw <mingwu gmail.com> writes:

On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote:
 I have added a new release and a PR to your repo.

on my same machine, go.d:

received 100000000 messages in 2906 msec sum=4999999950000000 
speed=34411 msg/msec

so, it's ~2.7x faster than Java:

https://github.com/mingwugmail/liblfdsd/tree/master/comparison

And your https://github.com/nin-jin/go.d on my machine

go.d is 2~4 times slower than Go.

9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)

Go's speed is consistent in multiple runs, for go.d it can be 2x 
difference, maybe because of the scheduler is unstable?


 But I don't think it's a good idea to replace jin.go.queue by 
 lock_free.rwqueue because:

I just want to do a comparison.

 You are using atomicLoad!(MemoryOrder.acq) at there: 
 https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41
 Is this really required? CPU can't reorder dependent statements.

It's not mine, but  MartinNowak's.  The implementation is based on

https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular

Jun 15 2020

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Monday, 15 June 2020 at 22:04:49 UTC, mw wrote:
 On Monday, 15 June 2020 at 09:09:23 UTC, Jin wrote:
 I have added a new release and a PR to your repo.

 on my same machine, go.d:

 received 100000000 messages in 2906 msec sum=4999999950000000 
 speed=34411 msg/msec

 so, it's ~2.7x faster than Java:

 https://github.com/mingwugmail/liblfdsd/tree/master/comparison

 And your https://github.com/nin-jin/go.d on my machine

 go.d is 2~4 times slower than Go.

 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)

 Go's speed is consistent in multiple runs, for go.d it can be 
 2x difference, maybe because of the scheduler is unstable?


 But I don't think it's a good idea to replace jin.go.queue by 
 lock_free.rwqueue because:

 I just want to do a comparison.

 You are using atomicLoad!(MemoryOrder.acq) at there: 
 https://github.com/MartinNowak/lock-free/blob/master/src/lock_free/rwqueue.d#L41
 Is this really required? CPU can't reorder dependent 
 statements.

 It's not mine, but  MartinNowak's.  The implementation is based 
 on

 https://www.codeproject.com/Articles/43510/Lock-Free-Single-Producer-Single-Consumer-Circular

There's a SPMC/MPSC queue implementation here: 
https://github.com/weka-io/mecca/blob/master/src/mecca/containers/otm_queue.d
that may be also interesting for you guys to check.

Jun 16 2020

Jin <nin-jin ya.ru> writes:

 received 100000000 messages in 2906 msec sum=4999999950000000 
 speed=34411 msg/msec
 so, it's ~2.7x faster than Java:
 https://github.com/mingwugmail/liblfdsd/tree/master/comparison
 And your https://github.com/nin-jin/go.d on my machine
 go.d is 2~4 times slower than Go.
 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)


Hello everyone, I've done a little refactoring and optimization 
of [jin.go](https://github.com/nin-jin/go.d):
- I got rid of the vibe.d dependency because it's slow, big, and 
I haven't been able to make friends with version 2. When running 
only 1000 vibe-fibers, not only did the application crash, but 
even the graphics system driver crashed once, which required 
restarting the laptop.
- So far, I've settled on native streams with a small stack size 
(4 kb).
- I'm really looking forward to 
[photon's](https://github.com/nin-jin/go.d/issues/7) 
stabilization to get fiber support back. It would be really 
awesome to see it in the standard library.
- I had to abandon the move semantics because I couldn't make 
friends with the delegates. Currently, the number of references 
to the queue is controlled by the copy constructor.

Good news! After all the optimizations, the channels show 
impressive speed in the above benchmark for pumping messages 
between two streams.

```d
import std.datetime.stopwatch;
import std.range;
import std.stdio;

import jin.go;

const long n = 100_000_000;

     auto threadProducer() {
     	return n.iota;
     }

     void main() {
     	auto queue = go!threadProducer;

     	StopWatch sw;
     	sw.start();
     	long sum = 0;

     	foreach (p; queue) {
     		sum += p;
     	}

     	sw.stop();

     	writefln("received %d messages in %d msec sum=%d speed=%d 
msg/msec", n,
     		sw.peek.total!"msecs", sum, n / sw.peek.total!"msecs");

     	assert(sum == (n * (n - 1) / 2));
     }
     ```

```sh
received 100000000 messages in 718 msec sum=4999999950000000 
speed=139275 msg/msec
```

I've almost caught up with Go in [my goroutines 
benchmark](https://github.com/nin-jin/go.d/blob/master/compare.cmd):

```sh
 go run app.go --release

Workers Result          Time
8       49995000000     109.7644ms

 dub --quiet --build=release

Workers Result          Time
0       49995000000     124 ms
```

Bad news. Sometimes I get incorrect results and I can't figure 
out why.

```sh
 dub --quiet --build=release

Workers Result          Time
0       49945005000     176 ms
```

I use the atomic acquire and release operations, although they 
are not required on x86, but I hope the compiler takes them into 
account and does not reorder instructions. But even with stricter 
memory barriers, I don't get a very stable result. If someone can 
tell me what could be wrong here, I would be very grateful.

Jan 02

Guillaume Piolat <firstname.name spam.com> writes:

On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 If someone can tell me what could be wrong here, I would be 
 very grateful.

https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67

A concurrent `put` and `popFront` will do nothing to avoid races 
in `Queue`.
Individual access to _offset is atomic but its point of use in 
both put and popFront is not.

Both functions look like this:

    1. Atomic-read-offset
    2. Anything can happen
    3. Atomic-write-offset

If one function has completed 1) then other has completed 1+2+3 
then you get a race.
Chaining two atomic things doesn't make the whole atomic, rather 
classic mistake with mutual exclusion.

Jan 02

Jin <nin-jin ya.ru> writes:

On Friday, 3 January 2025 at 00:17:27 UTC, Guillaume Piolat wrote:
 On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 If someone can tell me what could be wrong here, I would be 
 very grateful.

 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67

 A concurrent `put` and `popFront` will do nothing to avoid 
 races in `Queue`.
 Individual access to _offset is atomic but its point of use in 
 both put and popFront is not.

 Both functions look like this:

    1. Atomic-read-offset
    2. Anything can happen
    3. Atomic-write-offset

 If one function has completed 1) then other has completed 1+2+3 
 then you get a race.
 Chaining two atomic things doesn't make the whole atomic, 
 rather classic mistake with mutual exclusion.

I didn't really understand what you were talking about. After 
checking pending/available, we are guaranteed to have the 
opportunity to take the next step on consumer/provider side. 
Therefore, we are doing our job and then increasing our offset.

Jan 03

Brad Roberts <braddr puremagic.com> writes:

On 1/2/2025 4:17 PM, Guillaume Piolat via Digitalmars-d wrote:
 On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 If someone can tell me what could be wrong here, I would be very 
 grateful.

 
 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67
 
 A concurrent `put` and `popFront` will do nothing to avoid races in 
 `Queue`.
 Individual access to _offset is atomic but its point of use in both put 
 and popFront is not.
 
 Both functions look like this:
 
     1. Atomic-read-offset
     2. Anything can happen
     3. Atomic-write-offset
 
 If one function has completed 1) then other has completed 1+2+3 then you 
 get a race.
 Chaining two atomic things doesn't make the whole atomic, rather classic 
 mistake with mutual exclusion.

Probably the most common atomicity error, is atomic check followed by 
non atomic acting based on that check.  The problem that is overlooked 
in that scenario is that while the check itself is safe, by the time the 
action proceeds the condition can change.  The only safe way is to 
combine the check and action into a single atomic transaction.

The next most common is probably the ABA issue.  Assuming that because 
you see A in the second case, that what you're seeing is still the first 
A and that implies that B couldn't and didn't happen.

The lesson here is that you're far better off NOT being clever and 
trying to avoid longer lived locks unless you can demonstrate that it's 
particularly important and detrimental to the app performance.  It's 
super easy to get separate/tiny atomic operations wrong and be very hard 
to detect/debug that than to get it right at a slightly higher cost with 
multi-instruction simple locking.

Not to mention that cpu and os designers have invested energy improving 
the performance of mutex and spinlock style code.

There's a time an place for cleverness, but not at the expense of 
correctness.

My 2 cents,
Brad

Jan 03

claptrap <clap trap.com> writes:

On Friday, 3 January 2025 at 00:17:27 UTC, Guillaume Piolat wrote:
 On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 If someone can tell me what could be wrong here, I would be 
 very grateful.

 https://github.com/nin-jin/go.d/blob/master/source/jin/go/queue.d#L67

 A concurrent `put` and `popFront` will do nothing to avoid 
 races in `Queue`.
 Individual access to _offset is atomic but its point of use in 
 both put and popFront is not.

 Both functions look like this:

    1. Atomic-read-offset
    2. Anything can happen
    3. Atomic-write-offset

 If one function has completed 1) then other has completed 1+2+3 
 then you get a race.
 Chaining two atomic things doesn't make the whole atomic, 
 rather classic mistake with mutual exclusion.

Its a single producer, single consumer queue, so only one thread 
pushes, and one thread pulls. So the "2 Anything can happen" 
doesnt apply, since there's only one thread actually pushing to 
the queue. The consumer thread only reads the producer.offset, so 
it cant interfere with pushes, and it only sees a pushed message 
once producer.offset is updated.

Jan 03

claptrap <clap trap.com> writes:

On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 received 100000000 messages in 2906 msec sum=4999999950000000


 I use the atomic acquire and release operations, although they 
 are not required on x86, but I hope the compiler takes them 
 into account and does not reorder instructions. But even with 
 stricter memory barriers, I don't get a very stable result. If 
 someone can tell me what could be wrong here, I would be very 
 grateful.

Have you considered not wrapping the offsets, and only modulus 
with the length when you index into the messages, it'll simplify 
the math, IE..

messagesInQueue = producer.offet-consumer.offset

available = (Length - messagesInQueue -1)

Jan 03

Sebastiaan Koppe <mail skoppe.eu> writes:

On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 received 100000000 messages in 2906 msec sum=4999999950000000 
 speed=34411 msg/msec
 so, it's ~2.7x faster than Java:
 https://github.com/mingwugmail/liblfdsd/tree/master/comparison
 And your https://github.com/nin-jin/go.d on my machine
 go.d is 2~4 times slower than Go.
 9.7638ms (Go) v.s [19 ms ~ 40 ms] (go.d)


 Hello everyone, I've done a little refactoring and optimization 
 of [jin.go](https://github.com/nin-jin/go.d):
 - I got rid of the vibe.d dependency because it's slow, big, 
 and I haven't been able to make friends with version 2. When 
 running only 1000 vibe-fibers, not only did the application 
 crash, but even the graphics system driver crashed once, which 
 required restarting the laptop.
 - So far, I've settled on native streams with a small stack 
 size (4 kb).
 - I'm really looking forward to 
 [photon's](https://github.com/nin-jin/go.d/issues/7) 
 stabilization to get fiber support back. It would be really 
 awesome to see it in the standard library.
 - I had to abandon the move semantics because I couldn't make 
 friends with the delegates. Currently, the number of references 
 to the queue is controlled by the copy constructor.

 Good news! After all the optimizations, the channels show 
 impressive speed in the above benchmark for pumping messages 
 between two streams.

Since you are essentially testing the speed of your spsc ring 
queue, I would just benchmark the queue directly. Don't think 
there is much room for improvement unless you adopt disruptor 
design and switch to batch consumption. That said, I do see some 
redundant offset loads and stores in the code the optimiser might 
miss.

Testing just the queue might also simplify and reduce the surface 
area to find the error.

Jan 04

Jin <nin-jin ya.ru> writes:

On Thursday, 2 January 2025 at 19:59:26 UTC, Jin wrote:
 Bad news. Sometimes I get incorrect results and I can't figure 
 out why.
 I use the atomic acquire and release operations, although they 
 are not required on x86, but I hope the compiler takes them 
 into account and does not reorder instructions. But even with 
 stricter memory barriers, I don't get a very stable result. If 
 someone can tell me what could be wrong here, I would be very 
 grateful.

I found a bug. I first checked if there was anything in the 
queue, and if not, I checked if the queue was finalized. 
Sometimes the queue would fill up between these two steps and I 
would lose data. I moved the finalization check to the beginning 
and now everything is stable.

Jan 06

Jin <nin-jin ya.ru> writes:

On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:
 I found a bug. I first checked if there was anything in the 
 queue, and if not, I checked if the queue was finalized. 
 Sometimes the queue would fill up between these two steps and I 
 would lose data. I moved the finalization check to the 
 beginning and now everything is stable.

I looked into the core.atomic code and was very disappointed. I 
expected to see memory barriers (necessary for wait-free), but I 
saw a CAS-spinlock there (and this is lock-free):

```d
asm pure nothrow  nogc  trusted
{
       naked;
       push RBX;
       mov R8, RDX;
       mov RAX, [RDX];
       mov RDX, 8[RDX];
       mov RBX, [RCX];
       mov RCX, 8[RCX];
   L1: lock; cmpxchg16b [R8];
       jne L1;
       pop RBX;
       ret;
}
```

Jan 08

claptrap <clap trap.com> writes:

On Wednesday, 8 January 2025 at 10:10:55 UTC, Jin wrote:
 On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:
 I found a bug. I first checked if there was anything in the 
 queue, and if not, I checked if the queue was finalized. 
 Sometimes the queue would fill up between these two steps and 
 I would lose data. I moved the finalization check to the 
 beginning and now everything is stable.

 I looked into the core.atomic code and was very disappointed. I 
 expected to see memory barriers (necessary for wait-free), but 
 I saw a CAS-spinlock there (and this is lock-free):

Technically it is lock free but not wait free. No thread can 
actually hold it like a regular lock, the CAS either succeeds or 
of fails in one go, so you are always guaranteed at least one 
thread will progress.

Jan 08

Jin <nin-jin ya.ru> writes:

On Wednesday, 8 January 2025 at 10:16:58 UTC, claptrap wrote:
 Technically it is lock free but not wait free. No thread can 
 actually hold it like a regular lock, the CAS either succeeds 
 or of fails in one go, so you are always guaranteed at least 
 one thread will progress.

For wait-free in general and cyclic buffers in particular, it is 
important that operations are ordered (first buffer operations, 
then offset updates). CAS does not do anything useful in this 
case, as it always succeeds (only one thread writes to offset), 
but other operations can be rearranged.

Jan 08

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 08/01/2025 11:10 PM, Jin wrote:
 On Monday, 6 January 2025 at 10:15:39 UTC, Jin wrote:
 I found a bug. I first checked if there was anything in the queue, and 
 if not, I checked if the queue was finalized. Sometimes the queue 
 would fill up between these two steps and I would lose data. I moved 
 the finalization check to the beginning and now everything is stable.

 
 I looked into the core.atomic code and was very disappointed. I expected 
 to see memory barriers (necessary for wait-free), but I saw a CAS- 
 spinlock there (and this is lock-free):
 
 ```d
 asm pure nothrow  nogc  trusted
 {
        naked;
        push RBX;
        mov R8, RDX;
        mov RAX, [RDX];
        mov RDX, 8[RDX];
        mov RBX, [RCX];
        mov RCX, 8[RCX];
    L1: lock; cmpxchg16b [R8];
        jne L1;
        pop RBX;
        ret;
 }
 ```

Dmd will not inline functions with inline assembly, any function calls 
should prevent reordering cpu side.

So any concern for ordering you have shouldn't matter for dmd, its ldc 
and gdc that you need to be worried about.

Jan 08

Jin <nin-jin ya.ru> writes:

On Wednesday, 8 January 2025 at 11:37:48 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Dmd will not inline functions with inline assembly, any 
 function calls should prevent reordering cpu side.

 So any concern for ordering you have shouldn't matter for dmd, 
 its ldc and gdc that you need to be worried about.

Visible reordering can occur due to the asynchronous nature of 
inter-core communication, which is relevant for ARM and other 
architectures.

So it looks like we need macros that will insert inline opcodes 
for memory barriers:

```d
writeToBuffer;
mixin(Store_Store);
writeToOffset;
```

```d
readFromBuffer;
mixin(Load_Store);
writeToOffset;
```

Jan 08

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 09/01/2025 1:01 AM, Jin wrote:
 On Wednesday, 8 January 2025 at 11:37:48 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 Dmd will not inline functions with inline assembly, any function calls 
 should prevent reordering cpu side.

 So any concern for ordering you have shouldn't matter for dmd, its ldc 
 and gdc that you need to be worried about.

 
 Visible reordering can occur due to the asynchronous nature of inter- 
 core communication, which is relevant for ARM and other architectures.
 
 So it looks like we need macros that will insert inline opcodes for 
 memory barriers:
 
 ```d
 writeToBuffer;
 mixin(Store_Store);
 writeToOffset;
 ```
 
 ```d
 readFromBuffer;
 mixin(Load_Store);
 writeToOffset;
 ```

Not macros, what you want is intrinsics, this is how core.atomics works 
for ldc/gdc.

In this case ``atomicFence``.

https://dlang.org/phobos/core_atomic.html#.atomicFence

Jan 08

Jin <nin-jin ya.ru> writes:

On Wednesday, 8 January 2025 at 12:10:19 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Not macros, what you want is intrinsics, this is how 
 core.atomics works for ldc/gdc.

 In this case ``atomicFence``.

 https://dlang.org/phobos/core_atomic.html#.atomicFence

Unfortunately, support from all compilers to wait long. The 
library could be realized now.

Jan 13

Jin <nin-jin ya.ru> writes:

On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
 On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word scenario?

 Can it easily handle hundreds of (go)routines?

 I have updated the code. But it isn't ready to use currently 
 because:

 1. I rewrote code to use std.parallelism instead of vibe.d. So, 
 it's difficult to integrate fibers with tasks. Now, every tasks 
 spinlocks on waiting channel and main thread don't useful work.

 2. Race condition. I'm going to closely review algorithm.

 Currently it's twice slower than Go. On y machine:

go run app.go --release

 Workers Result          Time
 4       499500000       27.9226ms

dub --quiet --build=release

 Workers Result          Time
 3       499500000       64 ms

 It would be cool if someone help me with it. There are 
 docstrings, tests and diagrams. I'll explain more if someone 
 joins.

I have fixed all issues, and it's usable now. But I had to return 
vibe-core dependency. Now it's slow down:

 .\compare.cmd
go run app.go --release

Workers Result          Time
4       4999500000      25.9163ms
dub --quiet --build=release

Workers Result          Time
4       4999500000      116 ms

And I had to reduce the count of "threads" to 100 because 
vibe-core fails on 1000.

And I have created thread on dlang/project with an explanation of 
my vision of concurrency in D: 
https://github.com/dlang/projects/issues/65

Jun 14 2020

mw <mingwu gmail.com> writes:

On Sunday, 14 June 2020 at 14:24:29 UTC, Jin wrote:
 On Monday, 25 May 2020 at 16:26:31 UTC, Jin wrote:
 On Saturday, 16 May 2020 at 20:06:47 UTC, mw wrote:
 On Tuesday, 29 March 2016 at 17:10:02 UTC, Jin wrote:
 http://wiki.dlang.org/Go_to_D

 Any performance comparison with Go? esp. in real word 
 scenario?



...

 I have updated the code. But it isn't ready to use currently 
 because:

 1. I rewrote code to use std.parallelism instead of vibe.d. 
 So, it's difficult to integrate fibers with tasks. Now, every 
 tasks spinlocks on waiting channel and main thread don't 
 useful work.

 2. Race condition. I'm going to closely review algorithm.

 Currently it's twice slower than Go. On y machine:


...

 I have fixed all issues, and it's usable now. But I had to 
 return vibe-core dependency. Now it's slow down:

 .\compare.cmd
go run app.go --release

 Workers Result          Time
 4       4999500000      25.9163ms
dub --quiet --build=release

 Workers Result          Time
 4       4999500000      116 ms

 And I had to reduce the count of "threads" to 100 because 
 vibe-core fails on 1000.

 And I have created thread on dlang/project with an explanation 
 of my vision of concurrency in D: 
 https://github.com/dlang/projects/issues/65

I haven’t checked your implementation, or vibe’s, but I 
rediscovered that D’s message passage passing is ~4 times slower 
than Java:

https://forum.dlang.org/thread/mailman.148.1328778563.20196.digitalmars-d puremagic.com?page=4

Is this the same problem in GoD?

Jun 14 2020

Casey Sybrandy <sybrandy gmail.com> writes:

On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:
 DUB module: http://code.dlang.org/packages/jin-go
 GIT repo: https://github.com/nin-jin/go.d

 [...]

Have you considered using a Disrupter 
(http://lmax-exchange.github.io/disruptor/) for the channels?  
Not sure how it compares to what you're using from Vibe.d, but 
it's not a hard data structure to implement and, IIRC, it allows 
for multiple producers and consumers.

Mar 30 2016

Casey Sybrandy <sybrandy gmail.com> writes:

On Wednesday, 30 March 2016 at 14:28:50 UTC, Casey Sybrandy wrote:
 On Sunday, 27 March 2016 at 18:17:55 UTC, Jin wrote:
 DUB module: http://code.dlang.org/packages/jin-go
 GIT repo: https://github.com/nin-jin/go.d

 [...]

 Have you considered using a Disrupter 
 (http://lmax-exchange.github.io/disruptor/) for the channels?  
 Not sure how it compares to what you're using from Vibe.d, but 
 it's not a hard data structure to implement and, IIRC, it 
 allows for multiple producers and consumers.

Oh, and yes, I know that it would have to be rewritten in D 
unless there's a C version somewhere.  I actually did it once and 
it wasn't too bad.  I don't think I have a copy anymore, but if I 
do find it, I can put it up somewhere.

Mar 30 2016

Jin <nin-jin ya.ru> writes:

On Wednesday, 30 March 2016 at 15:22:26 UTC, Casey Sybrandy wrote:
 Have you considered using a Disrupter 
 (http://lmax-exchange.github.io/disruptor/) for the channels?  
 Not sure how it compares to what you're using from Vibe.d, but 
 it's not a hard data structure to implement and, IIRC, it 
 allows for multiple producers and consumers.

 Oh, and yes, I know that it would have to be rewritten in D 
 unless there's a C version somewhere.  I actually did it once 
 and it wasn't too bad.  I don't think I have a copy anymore, 
 but if I do find it, I can put it up somewhere.

This is java bloatware. :-(

Mar 30 2016

Casey Sybrandy <sybrandy gmail.com> writes:

On Wednesday, 30 March 2016 at 15:50:47 UTC, Jin wrote:
 This is java bloatware. :-(

I've never used the library so I can't comment on that, but the 
actual data structure/algorithm is really pretty simple.  The 
core components are atomic counters and a static array.  I think 
it would be a good data structure for channels.

Mar 30 2016

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 2016-03-30 at 17:01 +0000, Casey Sybrandy via Digitalmars-d
wrote:
 On Wednesday, 30 March 2016 at 15:50:47 UTC, Jin wrote:
=20
 This is java bloatware. :-(

 I've never used the library so I can't comment on that, but the=C2=A0
 actual data structure/algorithm is really pretty simple.=C2=A0=C2=A0The=

=C2=A0
 core components are atomic counters and a static array.=C2=A0=C2=A0I thin=

k=C2=A0
 it would be a good data structure for channels.

If I recollect correctly, the core data structure is a lock-free ring
buffer, and the parallelism "trick" the use of multicast with atomic
indexes. This works fine for the problem of creating a trading
framework, but I suspect the architecture is just too big for realizing
channels. In particular, the really important thing about channels over
(thread|process)-safe queues is the ability to select. I have no idea
how select is implemented on Windows but the classic Posix approach is
to use file descriptors to represent the queues and select or epoll
system calls to get the kernel to realize the select. As to how JCSP
does select on the JVM, I shall have to go and delve into the source
code=E2=80=A6
=C2=A0
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Apr 01 2016

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Wed, 2016-03-30 at 15:50 +0000, Jin via Digitalmars-d wrote:
 On Wednesday, 30 March 2016 at 15:22:26 UTC, Casey Sybrandy wrote:
=20
=20
 Have you considered using a Disrupter=C2=A0
 (http://lmax-exchange.github.io/disruptor/) for the channels?=C2=A0=



=C2=A0
 Not sure how it compares to what you're using from Vibe.d, but=C2=A0
 it's not a hard data structure to implement and, IIRC, it=C2=A0
 allows for multiple producers and consumers.

 Oh, and yes, I know that it would have to be rewritten in D=C2=A0
 unless there's a C version somewhere.=C2=A0=C2=A0I actually did it once=


=C2=A0
 and it wasn't too bad.=C2=A0=C2=A0I don't think I have a copy anymore,=


=C2=A0
 but if I do find it, I can put it up somewhere.

 This is java bloatware. :-(

I think that is probably just slander.

Whilst there are some known problems with The Disruptor, blithely
labelling it "java bloatware" is most likely an uneducated, and
probably ill-judged, comment.
=C2=A0=C2=A0
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Apr 01 2016

Derek Fawcus <dfawcus+dlang employees.org> writes:

While having a CSP style mechanism, with cheap stackful 
tasks/procs/coroutines and channels has merit, pursing a 
performance test with a 1000 (or more) trivial instances of such 
tasks seems misguided.

In the last couple of years, I used Go to design and implement a 
subsystem as a process using CSP style, and many goroutines.  
Even there I only had on the order of 20 - 40 goroutines present 
at once.  Then possibly around 4-8 additional goroutines for each 
simultaneous client service I was performing.

Now it may have been theoretically possible to cause that to 
spawn on the order of 64k goroutines if stressed the "correct" 
way in an absurd deployment. That would be improbable, and a max 
of around 500 would be more likely, even there performance was 
gated upon Internet RTTs.

I'll have to re-read the code next week to get the actual number, 
but I expect it would be of that order of 30 or so.

Jan 04

D Programming

C/C++ Programming

Other

digitalmars.D - Oh, my GoD! Goroutines on D