www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Go's march to low-latency GC

reply Enamex <enamex+d outlook.com> writes:
https://news.ycombinator.com/item?id=12042198

^ reposting a link in the right place.
Jul 07 2016
next sibling parent reply ikod <geller.garry gmail.com> writes:
On Thursday, 7 July 2016 at 22:36:29 UTC, Enamex wrote:
 https://news.ycombinator.com/item?id=12042198

 ^ reposting a link in the right place.
 While a program using 10,000 OS threads might perform poorly, 
 that number of goroutines is nothing unusual. One difference is 
 that a goroutine starts with a very small stack — only 
 2kB — which grows as needed, contrasted with the large 
 fixed-size stacks that are common elsewhere. Go’s function call 
 preamble makes sure there’s enough stack space for the next 
 call, and if not will move the goroutine’s stack to a larger 
 memory area — rewriting pointers as needed — before allowing 
 the call to continue.
Correct me if I'm wrong, but in D fibers allocate stack statically, so we have to preallocate large stacks. If yes - can we allocate stack frames on demand from some non-GC area?
Jul 07 2016
parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 07/08/2016 07:45 AM, ikod wrote:
 Correct me if I'm wrong, but in D fibers allocate stack statically, so
 we have to preallocate large stacks.
 
 If yes - can we allocate stack frames on demand from some non-GC area?
Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand.
Jul 08 2016
next sibling parent reply ikod <garry.geller gmail.com> writes:
On Friday, 8 July 2016 at 20:35:05 UTC, Martin Nowak wrote:
 On 07/08/2016 07:45 AM, ikod wrote:
 Correct me if I'm wrong, but in D fibers allocate stack 
 statically, so we have to preallocate large stacks.
 
 If yes - can we allocate stack frames on demand from some 
 non-GC area?
Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand.
But the size of fiber stack is fixed? When we call Fiber constructor, the second parameter for ctor is stack size. If I made a wrong guess and ask for too small stack then programm may crash. If I ask for too large stack then I probably waste resources. So, it would be nice if programmer will not forced to make any wrong decisions about fiber's stack size. Or maybe I'm wrong and I shouldn't care about stack size when I create new fiber?
Jul 08 2016
parent reply Dicebot <public dicebot.lv> writes:
On 07/09/2016 02:48 AM, ikod wrote:
 If I made a wrong guess and
 ask for too small stack then programm may crash. If I ask for too large
 stack then I probably waste resources.
Nope, this is exactly the point. You can demand crazy 10 MB of stack for each fiber and only the actually used part will be allocated by kernel.
Jul 09 2016
next sibling parent ikod <geller.garry gmail.com> writes:
On Saturday, 9 July 2016 at 13:48:41 UTC, Dicebot wrote:
 On 07/09/2016 02:48 AM, ikod wrote:
 If I made a wrong guess and
 ask for too small stack then programm may crash. If I ask for 
 too large
 stack then I probably waste resources.
Nope, this is exactly the point. You can demand crazy 10 MB of stack for each fiber and only the actually used part will be allocated by kernel.
Thanks, nice to know.
Jul 09 2016
prev sibling parent reply Sergey Podobry <sergey.podobry gmail.com> writes:
On Saturday, 9 July 2016 at 13:48:41 UTC, Dicebot wrote:
 On 07/09/2016 02:48 AM, ikod wrote:
 If I made a wrong guess and
 ask for too small stack then programm may crash. If I ask for 
 too large
 stack then I probably waste resources.
Nope, this is exactly the point. You can demand crazy 10 MB of stack for each fiber and only the actually used part will be allocated by kernel.
Remember that virtual address space is limited on 32-bit platforms. Thus spawning 2000 threads 1 MB stack each will occupy all available VA space and you'll get an allocation failure (even if the real memory usage is low).
Jul 10 2016
parent reply Dicebot <public dicebot.lv> writes:
On Sunday, 10 July 2016 at 19:49:11 UTC, Sergey Podobry wrote:
 On Saturday, 9 July 2016 at 13:48:41 UTC, Dicebot wrote:
 Nope, this is exactly the point. You can demand crazy 10 MB of 
 stack for each fiber and only the actually used part will be 
 allocated by kernel.
Remember that virtual address space is limited on 32-bit platforms. Thus spawning 2000 threads 1 MB stack each will occupy all available VA space and you'll get an allocation failure (even if the real memory usage is low).
Sorry, but someone who tries to run highly concurrent server software with thousands of fibers on 32-bit platform is quite unwise and there is no point in taking such use case into account. 32-bit has its own niche with different kinds of concerns.
Jul 11 2016
parent reply Sergey Podobry <sergey.podobry gmail.com> writes:
On Monday, 11 July 2016 at 11:23:26 UTC, Dicebot wrote:
 On Sunday, 10 July 2016 at 19:49:11 UTC, Sergey Podobry wrote:
 Remember that virtual address space is limited on 32-bit 
 platforms. Thus spawning 2000 threads 1 MB stack each will 
 occupy all available VA space and you'll get an allocation 
 failure (even if the real memory usage is low).
Sorry, but someone who tries to run highly concurrent server software with thousands of fibers on 32-bit platform is quite unwise and there is no point in taking such use case into account. 32-bit has its own niche with different kinds of concerns.
Agreed. I don't know why golang guys bother about it.
Jul 11 2016
next sibling parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Mon, 2016-07-11 at 12:21 +0000, Sergey Podobry via Digitalmars-d
wrote:
 On Monday, 11 July 2016 at 11:23:26 UTC, Dicebot wrote:
 On Sunday, 10 July 2016 at 19:49:11 UTC, Sergey Podobry wrote:
=20
 Remember that virtual address space is limited on 32-bit=C2=A0
 platforms. Thus spawning 2000 threads 1 MB stack each will=C2=A0
 occupy all available VA space and you'll get an allocation=C2=A0
 failure (even if the real memory usage is low).
=20 Sorry, but someone who tries to run highly concurrent server=C2=A0 software with thousands of fibers on 32-bit platform is quite=C2=A0 unwise and there is no point in taking such use case into=C2=A0 account. 32-bit has its own niche with different kinds of=C2=A0 concerns.
=20 Agreed. I don't know why golang guys bother about it.
Maybe because they are developing a language for the 1980s? ;-) --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 11 2016
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Monday, 11 July 2016 at 13:05:09 UTC, Russel Winder wrote:
 Maybe because they are developing a language for the 1980s?

 ;-)
It is quite common for web services to run with less than 1GB. 64bit would be very wasteful.
Jul 11 2016
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Monday, 11 July 2016 at 13:05:09 UTC, Russel Winder wrote:
 Agreed. I don't know why golang guys bother about it.
Because they have nothing else to propose than massive goroutine orgy so they kind of have to make it work.
 Maybe because they are developing a language for the 1980s?

 ;-)
It's not like they are using the Plan9 toolchain... Ho wait...
Jul 11 2016
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Monday, 11 July 2016 at 12:21:04 UTC, Sergey Podobry wrote:
 On Monday, 11 July 2016 at 11:23:26 UTC, Dicebot wrote:
 On Sunday, 10 July 2016 at 19:49:11 UTC, Sergey Podobry wrote:
 Remember that virtual address space is limited on 32-bit 
 platforms. Thus spawning 2000 threads 1 MB stack each will 
 occupy all available VA space and you'll get an allocation 
 failure (even if the real memory usage is low).
Sorry, but someone who tries to run highly concurrent server software with thousands of fibers on 32-bit platform is quite unwise and there is no point in taking such use case into account. 32-bit has its own niche with different kinds of concerns.
Agreed. I don't know why golang guys bother about it.
Because of attitudes like shown in that thread https://forum.dlang.org/post/ilbmfvywzktilhskpeoh forum.dlang.org people who do not really understand why 32 bit systems are a really problematic even if the apps don't use more than 2 GiB of memory. Here's Linus Torvalds classic rant about 64 bit https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/ (it's more about PAE but the reasons why 64 bits is a good thing in general are the same: address space!)
Jul 11 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 11 July 2016 at 13:13:02 UTC, Patrick Schluter wrote:
 Because of attitudes like shown in that thread
 https://forum.dlang.org/post/ilbmfvywzktilhskpeoh forum.dlang.org
 people who do not really understand why 32 bit systems are a 
 really problematic even if the apps don't use more than 2 GiB 
 of memory.

 Here's Linus Torvalds classic rant about 64 bit
 https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/  (it's
more about PAE but the reasons why 64 bits is a good thing in general are the
same: address space!)
Why can't you use both 32bit and 64bit pointers when compiling for x86_64? My guess would be that using 64bit registers precludes the use of 32bit registers.
Jul 11 2016
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Monday, 11 July 2016 at 17:14:17 UTC, jmh530 wrote:
 On Monday, 11 July 2016 at 13:13:02 UTC, Patrick Schluter wrote:
 Because of attitudes like shown in that thread
 https://forum.dlang.org/post/ilbmfvywzktilhskpeoh forum.dlang.org
 people who do not really understand why 32 bit systems are a 
 really problematic even if the apps don't use more than 2 GiB 
 of memory.

 Here's Linus Torvalds classic rant about 64 bit
 https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/  (it's
more about PAE but the reasons why 64 bits is a good thing in general are the
same: address space!)
Why can't you use both 32bit and 64bit pointers when compiling for x86_64? My guess would be that using 64bit registers precludes the use of 32bit registers.
You can, but OSes usually give you randomized memory layout as a security measure.
Jul 11 2016
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 11 July 2016 at 17:23:49 UTC, Ola Fosheim Grøstad 
wrote:
 You can, but OSes usually give you randomized memory layout as 
 a security measure.
What if the memory allocation scheme were something like: randomly pick memory locations below some threshold from the 32bit segment and then above the threshold pick from elsewhere?
Jul 12 2016
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Tuesday, 12 July 2016 at 13:28:33 UTC, jmh530 wrote:
 On Monday, 11 July 2016 at 17:23:49 UTC, Ola Fosheim Grøstad 
 wrote:
 You can, but OSes usually give you randomized memory layout as 
 a security measure.
What if the memory allocation scheme were something like: randomly pick memory locations below some threshold from the 32bit segment and then above the threshold pick from elsewhere?
One possible technique is to use contiguous "unmapped" memory areas that cover your worst case number of elements with a specific base and just use indexes instead of absolute addressing. That way you often can just use 16 bits typed addressing (assuming max 65535 objects of a given type + a null index). The base address may then be injected (during linking or by using self-modifying code if the OS allows it) into the code segments. Or you could use TLS + indexing, or whatever the OS supports. Using global 64 bit pointers is just for generality and to keep language-implementation simple. It is not strictly hardware related if you have a MMU, nor directly related to machine language as such. For a statically typed language you could probably get away with 16 or 32 bits for typed pointers most of the time if the OS and language doesn't make it difficult (like the conservative D GC scan).
Jul 12 2016
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Tuesday, 12 July 2016 at 13:28:33 UTC, jmh530 wrote:
 On Monday, 11 July 2016 at 17:23:49 UTC, Ola Fosheim Grøstad 
 wrote:
 You can, but OSes usually give you randomized memory layout as 
 a security measure.
What if the memory allocation scheme were something like: randomly pick memory locations below some threshold from the 32bit segment and then above the threshold pick from elsewhere?
There is a mmap flag for this on linux.
Jul 12 2016
prev sibling parent Kagamin <spam here.lot> writes:
On Monday, 11 July 2016 at 13:13:02 UTC, Patrick Schluter wrote:
 (it's more about PAE but the reasons why 64 bits is a good 
 thing in general are the same: address space!)
And what's with address space?
Jul 12 2016
prev sibling parent Chris Wright <dhasenan gmail.com> writes:
On Fri, 08 Jul 2016 22:35:05 +0200, Martin Nowak wrote:

 On 07/08/2016 07:45 AM, ikod wrote:
 Correct me if I'm wrong, but in D fibers allocate stack statically, so
 we have to preallocate large stacks.
 
 If yes - can we allocate stack frames on demand from some non-GC area?
Fiber stacks are just mapped virtual memory pages that the kernel only backs with physical memory when they're actually used. So they already are allocated on demand.
The downside is that it's difficult to release that memory. On the other hand, Go had a lot of problems with its implementation in part because it released memory. At some point you start telling users: if you want a fiber that does a huge recursion, dispose of it when you're done. It's cheap enough to create another fiber later.
Jul 09 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/7/16 6:36 PM, Enamex wrote:
 https://news.ycombinator.com/item?id=12042198

 ^ reposting a link in the right place.
A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that. I wish we could amass the experts able to make similar things happen for us. Andrei
Jul 09 2016
next sibling parent bob belcher <claudiu.garba gmail.com> writes:
On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu 
wrote:
 On 7/7/16 6:36 PM, Enamex wrote:
 https://news.ycombinator.com/item?id=12042198

 ^ reposting a link in the right place.
A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that. I wish we could amass the experts able to make similar things happen for us. Andrei
kickstarter for improve gc :)
Jul 09 2016
prev sibling next sibling parent reply Martin Nowak <code dawg.eu> writes:
On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu 
wrote:
 On 7/7/16 6:36 PM, Enamex wrote:
 https://news.ycombinator.com/item?id=12042198

 ^ reposting a link in the right place.
A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that.
Exactly, how someone can run a big site with 2 second pauses in the GC code is beyond me.
 I wish we could amass the experts able to make similar things 
 happen for us.
We sort of have an agreement that we don't want to pay 5% for write barriers, so the common algorithmic GC improvements aren't available for us. There is still connectivity based GC [¹], which is an interesting idea, but AFAIK it hasn't been widely tried. Maybe someone has an idea for optional write barriers, i.e. zero cost if you don't use them. Or we agree that it's worth to have different incompatible binaries. [¹]: https://www.cs.purdue.edu/homes/hosking/690M/cbgc.pdf In any case now that we made the GC pluggable we should port the forking GC. It has almost no latency at the price of higher peak memory usage and throughput, the same trade-offs you have with any concurrent mark phase. Moving the sweeping to background GC threads is sth. we should be doing anyhow. Overall I think we should focus more on good deterministic MM alternatives, rather than investing years of engineering into our GC, or hoping for silver bullets.
Jul 09 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/09/2016 03:42 PM, Martin Nowak wrote:
 We sort of have an agreement that we don't want to pay 5% for write
 barriers, so the common algorithmic GC improvements aren't available for
 us.
Yah, I was thinking in a more general sense. Plenty of improvements of all kinds are within reach. -- Andrei
Jul 09 2016
parent Martin Nowak <code dawg.eu> writes:
On Saturday, 9 July 2016 at 23:12:10 UTC, Andrei Alexandrescu 
wrote:
 Yah, I was thinking in a more general sense. Plenty of 
 improvements of all kinds are within reach. -- Andrei
Yes, but hardly anything that would allow us to do partial collections. And without that you always have to scan the full live heap, this can't scale to bigger heaps, there is no way to scan a GB sized heap fast. So either we facilitate to get by with a small GC heap, i.e. more deterministic MM, or we spent a lot of time to make some partial collection algorithm work. Ideally we do both but the former is a simpler goal. The connectivity based GC would be a realistic goal as well, only somewhat more complex than the precise GC. But it's unclear how well it will work for typical applications.
Jul 10 2016
prev sibling next sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu 
wrote:
 I wish we could amass the experts able to make similar things 
 happen for us.
I humbly believe it is not just about amassing experts, but also making it easy to do experiments. Phobos/druntime should provide set of APIs for literally everything so people can do their own implementations of ANY standard library module(s). I wish D offered module interfaces the same way Modula-3 did... To work on new GC in D one needs to remove the old one, and replace it with his/her new implementation, while with competition it is more/less implementation of few interfaces, and instructing compiler to use the new GC...
Jul 09 2016
parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Saturday, 9 July 2016 at 21:25:34 UTC, Dejan Lekic wrote:
 On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu 
 wrote:
 I wish we could amass the experts able to make similar things 
 happen for us.
I humbly believe it is not just about amassing experts, but also making it easy to do experiments. Phobos/druntime should provide set of APIs for literally everything so people can do their own implementations of ANY standard library module(s). I wish D offered module interfaces the same way Modula-3 did... To work on new GC in D one needs to remove the old one, and replace it with his/her new implementation, while with competition it is more/less implementation of few interfaces, and instructing compiler to use the new GC...
https://github.com/dlang/druntime/blob/master/src/gc/gcinterface.d https://github.com/dlang/druntime/blob/master/src/gc/impl/manual/gc.d What else do you need to start working on a new GC implementation?
Jul 09 2016
parent Dejan Lekic <dejan.lekic gmail.com> writes:
On Saturday, 9 July 2016 at 23:14:38 UTC, ZombineDev wrote:
 https://github.com/dlang/druntime/blob/master/src/gc/gcinterface.d
 https://github.com/dlang/druntime/blob/master/src/gc/impl/manual/gc.d

 What else do you need to start working on a new GC 
 implementation?
That is actually the only case that I know of that an interface was provided to be implemented by 3rd parties... My reply was about Phobos in general. To repeat again - Phobos should provide the API (interfaces) *and* reference implementations of those.
Sep 26 2016
prev sibling parent reply Istvan Dobos <stvn.dobos gmail.com> writes:
On Saturday, 9 July 2016 at 17:41:59 UTC, Andrei Alexandrescu 
wrote:
 On 7/7/16 6:36 PM, Enamex wrote:
 https://news.ycombinator.com/item?id=12042198

 ^ reposting a link in the right place.
A very nice article and success story. We've had similar stories with several products at Facebook. There is of course the opposite view - an orders-of-magnitude improvement means there was quite a lot of waste just before that. I wish we could amass the experts able to make similar things happen for us. Andrei
Hello Andrei, May only be slightly related, but when you talked about D vs Go vs Rust in that Quora answer (here: https://www.quora.com/Which-language-has-the-brightest-future-in-replacement-of-C-between-D-Go-and-Rust-And-Why/answer/A drei-Alexandrescu), I was thinking, okay, so D's GC seems to turned out not that great. But how about the idea of transplanting Rust's ownership system instead of trying to make the GC better? Disclaimer: I know very little about D's possibly similar mechanisms. Thanks, Istvan
Jul 14 2016
parent reply thedeemon <dlang thedeemon.com> writes:
On Thursday, 14 July 2016 at 10:58:47 UTC, Istvan Dobos wrote:
  I was thinking, okay, so D's GC seems to turned out not that 
 great. But how about the idea of transplanting Rust's ownership 
 system instead of trying to make the GC better?
This requires drastically changing 99% of the language and it's bringing not just the benefits but also all the pain coming with this ownership system. Productivity goes down, learning curve goes up. And it will be a very different language in the end, so you might want to just use Rust instead of trying to make D another Rust.
Jul 16 2016
parent The D dude <thedlangdude gmail.com> writes:
On Saturday, 16 July 2016 at 11:02:00 UTC, thedeemon wrote:
 On Thursday, 14 July 2016 at 10:58:47 UTC, Istvan Dobos wrote:
  I was thinking, okay, so D's GC seems to turned out not that 
 great. But how about the idea of transplanting Rust's 
 ownership system instead of trying to make the GC better?
This requires drastically changing 99% of the language and it's bringing not just the benefits but also all the pain coming with this ownership system. Productivity goes down, learning curve goes up. And it will be a very different language in the end, so you might want to just use Rust instead of trying to make D another Rust.
Yes that's the case for Rust, but no one has proven yet that an ownership system needs to such a pain. In fact someone recently proposed an idea for a readable ownership system: http://forum.dlang.org/post/ensdiijttlpcwuhdfpuu forum.dlang.org and I believe that it's quite possible to improve over Rust and still having a productive language. In fact the simple `scope` statements are a first and excellent step on this journey ;-)
Jul 16 2016