digitalmars.D - Removing The Global GC Lock: Largest Plausible Number of Threads?
- dsimcha (11/11) May 11 2011 I'm thinking about ways to remove the global lock from the garbage
- Jonathan M Davis (5/16) May 11 2011 I don't think that you can legally create that many threads on a typical...
- Daniel Gibson (13/31) May 11 2011 On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(),
- Daniel Gibson (8/42) May 11 2011 Also:
- Walter Bright (4/13) May 11 2011 I suggest one of two schemes:
- Jonathan M Davis (18/34) May 11 2011 Okay. I was wrong. Apparently, the limit on Linux is set by what's in
- Andrei Alexandrescu (4/15) May 11 2011 http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-...
- Kagamin (7/18) May 12 2011 version(Slim) {
- JimBob (5/8) May 13 2011 If you use lock free free lists you dont need one for every thread, you ...
- Kagamin (3/6) May 13 2011 http://www.liblfds.org/
- dsimcha (4/10) May 13 2011 Surprisingly, lock-free free lists wouldn't solve the problem, since the...
- JimBob (6/18) May 13 2011 If the flag needs to be atomicaly updated and thread local free lists so...
- dsimcha (14/33) May 13 2011 There's clearly been some misunderstanding here because I haven't explai...
- liblfds-admin (2/8) Dec 30 2015 Release 7.0.0 of liblfds has just been published.
- extrawurst (4/14) Jan 03 2016 Still no osx support ? or is it just not tested ?
- libfds-admin (9/12) Jan 03 2016 Well, no build files are provided for out-of-the-box use, because
I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
May 11 2011
On 2011-05-11 21:14, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
Am 12.05.2011 06:33, schrieb Jonathan M Davis:On 2011-05-11 21:14, dsimcha wrote:On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - DanielI'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
Am 12.05.2011 06:47, schrieb Daniel Gibson:Am 12.05.2011 06:33, schrieb Jonathan M Davis:Also: http://www.sgi.com/products/servers/altix/uv/configs.html You can actually buy systems with 2048 Cores (AMD64 architecture even) and, thanks to Hyperthreading, 4096 threads (that may run at the same time) that run Linux. I think it isn't *that* unrealistic to have more than ushort.max Threads on such a beast.On 2011-05-11 21:14, dsimcha wrote:On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - DanielI'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
On 5/11/2011 9:14 PM, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.I suggest one of two schemes: 1. Use thread local storage to store the free list for each thread. 2. Use a global free list, and use CAS to pull allocations off of it.
May 11 2011
On 2011-05-11 21:33, Jonathan M Davis wrote:On 2011-05-11 21:14, dsimcha wrote:Okay. I was wrong. Apparently, the limit on Linux is set by what's in /proc/sys/kernel/threads-max. On my system (64-bit Arch Linux), that's 256,245 - though that's system-wide, so not even the most thread-happy program can really get all that near that limit. Still, that's far above 65,000. I can't find as clear an answer for Windows though. It sounds like it doesn't technically have a limit, but you'll run out of memory eventually. I'm not sure if either Linux or Windows can actually get away with having as many as 65,000 threads at a time. And even if you can, it might be completely reasonable to say that druntime doesn't support more than that. That's a _lot_ of threads, and I find it hard to believe that any semi-reasonable D program would have that many. Something like erlang probably would, but I don't believe that they use full-on OS threads for their threads but rather do something more like fibers. So, technically, I'm not sure that you can assume that 65,000 threads is enough, but it certainly sounds to me like it would be reasonable to enforce that limit on D programs, even if the OS doesn't. - Jonathan M DavisI'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high.
May 11 2011
On 5/11/11 11:14 PM, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux On my work machine: 1032845 Andrei
May 11 2011
dsimcha Wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.version(Slim) { alias ushort threadid_t; } else { alias uint threadid_t; } ?
May 12 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:iqfn3l$p2p$1 digitalmars.com...I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.If you use lock free free lists you dont need one for every thread, you only need enough to reduce contention on the free lists, which in my experience is about 1x to 2x the number of cores you have.
May 13 2011
dsimcha Wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
May 13 2011
On 5/13/2011 6:12 AM, Kagamin wrote:dsimcha Wrote:Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
May 13 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:iqj8ba$18sc$1 digitalmars.com...On 5/13/2011 6:12 AM, Kagamin wrote:If the flag needs to be atomicaly updated and thread local free lists solve that problem yet global lock free free lists dont, the implication is that the flag is only ever accessed from the one thread. And that implies that memory cant be allocated in one thread and freed in another?dsimcha Wrote:Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
May 13 2011
== Quote from JimBob (jim bob.com)'s article"dsimcha" <dsimcha yahoo.com> wrote in message news:iqj8ba$18sc$1 digitalmars.com...There's clearly been some misunderstanding here because I haven't explained in detail what my idea was. First, the GC would still be locked when sweeping or manually freeing. Second, every page would be owned by a single thread. The way the GCBits are laid out, this means the flags issue would become a non-issue because every integer that makes up the bit array only tracks flags for one page. Secondly, I'm concerned about two things if I get too fancy with lock-free/atomic stuff: 1. It might plausibly end up being substantially slower than the current version in single-threaded programs or programs where there's not much contention. 2. I don't want to do anything too clever. I learned the hard way by implementing std.parallelism that: non-trivial manual memory management + clever concurrency in the same code == nightmarish to debugOn 5/13/2011 6:12 AM, Kagamin wrote:If the flag needs to be atomicaly updated and thread local free lists solve that problem yet global lock free free lists dont, the implication is that the flag is only ever accessed from the one thread. And that implies that memory cant be allocated in one thread and freed in another?dsimcha Wrote:Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
May 13 2011
On Friday, 13 May 2011 at 10:12:24 UTC, Kagamin wrote:dsimcha Wrote:Release 7.0.0 of liblfds has just been published.I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
Dec 30 2015
On Wednesday, 30 December 2015 at 23:55:25 UTC, liblfds-admin wrote:On Friday, 13 May 2011 at 10:12:24 UTC, Kagamin wrote:Still no osx support ? or is it just not tested ? -- Stephandsimcha Wrote:Release 7.0.0 of liblfds has just been published.I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.http://www.liblfds.org/ ?
Jan 03 2016
On Sunday, 3 January 2016 at 18:32:23 UTC, extrawurst wrote:On Wednesday, 30 December 2015 at 23:55:25 UTC, liblfds-adminWell, no build files are provided for out-of-the-box use, because I do not have an OSX platform to make them on. The library itself will work fine on OSX of course. It shouldn't be onerous to make the build files for it - there's nothing special that has to be done. Just load them all up into a project and compile. I'd very much like to support OSX and it's on my mind, but it's not reasonable to carry a Mac around the world with me for that.Release 7.0.0 of liblfds has just been published.Still no osx support ? or is it just not tested ?
Jan 03 2016