digitalmars.D - Thin Lock Implementation
- Bartosz Milewski (5/5) Aug 17 2008 I posted some implementation details of Thin Lock in my blog,
- Frank Benoit (4/11) Aug 18 2008 This sounds very intersting.
- Bartosz Milewski (6/6) Aug 18 2008 I'm in the process of implementing it. I've been already rewriting parts...
- Aarti_pl (12/19) Aug 19 2008 Well, I am definitely no concurrency expert, but one questions bothers
- Lars Ivar Igesund (10/33) Aug 19 2008 2046 isn't a particularly small number, in fact most OS'es aren't set up...
- Jb (10/15) Aug 19 2008 If you're on a windows box you can see how many threads are currently up...
- Steven Schveighoffer (6/22) Aug 19 2008 On a 32-bit system, the amount of addressable memory space and the stack...
- Sascha Katzner (6/11) Aug 19 2008 Not exactly right, since the stack is organized in 4kb pages and these
- Steven Schveighoffer (7/16) Aug 19 2008 Hm... I read the limit above online
- Jb (11/27) Aug 19 2008 Read the whole page ;-)
- Steven Schveighoffer (5/34) Aug 19 2008 Read my whole post :)
- Jb (3/6) Aug 19 2008 Ok, sorry, my bad. ;-)
- Jb (6/15) Aug 19 2008 The allocation granularity is actualy 64k chunks on most (at least all t...
- Walter Bright (4/14) Aug 19 2008 The memory isn't committed until it is used, but the address space is.
- BCS (3/21) Aug 19 2008 Must the stack be contiguous? Could you, on a stack overflow, heap alloc...
- Walter Bright (2/7) Aug 19 2008 No.
- Walter Bright (5/14) Aug 19 2008 I suppose I should explain a bit more. Stack overflows are detected by
- dsimcha (5/5) Aug 20 2008 Thin locks sounds pretty cool, but I do have one question. It seems lik...
- Bartosz Milewski (5/10) Aug 20 2008 The paper describing thin locks (my post has a link to it) claims that
- Brad Roberts (8/21) Aug 20 2008 I'll agree that inbetween is relatively rare, but just because there's
- Bartosz Milewski (5/10) Aug 20 2008 A thin lock is not a spin lock. It's an optimization on top of the OS
- Leandro Lucarella (11/22) Aug 22 2008 I would love to hear you point on optimizing an OS lock that is already
- Robert Jacques (6/28) Aug 22 2008 Assuming the OS lock involves a kernel call then yes, there is a major
- Leandro Lucarella (9/32) Aug 22 2008 It doesn't (at least in Linux).
- Bartosz Milewski (6/9) Aug 24 2008 I think I'm going to write a separate blog post on this topic, since
- BCS (7/26) Aug 20 2008 That's what the extra stack frame I mentioned is for. I'll grant you can...
- Sean Kelly (5/33) Aug 20 2008 This is basically how Fibers work, though the stack transition isn't
- Bartosz Milewski (7/15) Aug 19 2008 This is the number I've found in the D runtime and I didn't question it
- Lars Ivar Igesund (7/19) Aug 19 2008 I believe the same restriction doesn't exist in Tango's runtime.
- Sean Kelly (12/28) Aug 19 2008 I had the same concern, however, the compact thread id is clearly necess...
- Jb (15/19) Aug 19 2008 One thing that occurs to me is this probably wont work with threads the ...
- Jb (8/10) Aug 19 2008 Another thought I had is that when reading it is this.
- Steven Schveighoffer (18/22) Aug 19 2008 This looks a lot more promising than the original concept that Walter li...
- Steven Schveighoffer (7/30) Aug 19 2008 Reading further messages in your blog, I see that you expect sharing cas...
- Bartosz Milewski (6/11) Aug 19 2008 No, it's definitely not ok, but unless there is a reference-counting
- Steven Schveighoffer (19/30) Aug 19 2008 Yes, it's similar to casting something to invariant when you are not sur...
- Sean Kelly (24/57) Aug 19 2008 I think the implication that casting an object to unshared actually
- Bartosz Milewski (10/17) Aug 19 2008 I would like to make the nulling of the original reference part of the
- Sean Kelly (11/21) Aug 19 2008 Yeah, that's the nasty thing about atomics--they provide yet another rea...
- Bartosz Milewski (2/5) Aug 19 2008 Exactly!
I posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . Bartosz
Aug 17 2008
Bartosz Milewski schrieb:I posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . BartoszThis sounds very intersting. In what state is this? Discussion or are you already in process of implementing this for D?
Aug 18 2008
I'm in the process of implementing it. I've been already rewriting parts of the runtime to make monitors more modular. Now I'm diving into the low-level implementation. Since, at the moment, dmd only supports the x86, that's the platform I'm targeting. But I'm also abstracting processor-dependent part into a separate library.
Aug 18 2008
Bartosz Milewski pisze:I posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . BartoszWell, I am definitely no concurrency expert, but one questions bothers me a bit. From your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)? BR Marcin Kuszczak (aarti_pl)
Aug 19 2008
Aarti_pl wrote:Bartosz Milewski pisze:2046 isn't a particularly small number, in fact most OS'es aren't set up to allow more than 500-1000 depending on available memory. However, I agree that it should not restrict _more_ than the OS where a larger number is available. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the TangoI posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . BartoszWell, I am definitely no concurrency expert, but one questions bothers me a bit. From your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)? BR Marcin Kuszczak (aarti_pl)
Aug 19 2008
"Lars Ivar Igesund" <larsivar igesund.net> wrote in message news:g8e509$2872$1 digitalmars.com...2046 isn't a particularly small number, in fact most OS'es aren't set up to allow more than 500-1000 depending on available memory. However, I agree that it should not restrict _more_ than the OS where a larger number is available.If you're on a windows box you can see how many threads are currently up and running. My single core Athlon, which doesnt actualy have much going on, is clocking up 370 threads. I imagine that newer more busy PCs are running 2 or 3 times as many. That said.. this is system wide. I dont know if the OS will partition threads, so any particular process can only see it's threads, and a few OS threads. I dont know how it works.
Aug 19 2008
"Aarti_pl" wroteBartosz Milewski pisze:On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows). -SteveI posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . BartoszWell, I am definitely no concurrency expert, but one questions bothers me a bit. From your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)?
Aug 19 2008
Steven Schveighoffer wrote:On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb. LLAP, Sascha
Aug 19 2008
"Sascha Katzner" wroteSteven Schveighoffer wrote:Hm... I read the limit above online (http://blogs.msdn.com/oldnewthing/archive/2005/07/29/444912.aspx). I think maybe the pages aren't initialized, but doesn't the system have to reserve them because it has to guarantee a 1MB stack for each? Or maybe that guy is completely wrong, or the concept has been fixed :) -SteveOn a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:g8emvj$1cu5$1 digitalmars.com..."Sascha Katzner" wroteRead the whole page ;-) The 1MB is just the default stack size if you pass a value of zero to the CreateThread function, and usualy most linkers / librarys whatever do that. You can override this by specifying the stack size. And as he said he could then create 13000 rather than 2000 threads. But the actualy mimimum size is 64k, as that is the granularity of the windows allocator. You can check by calling getSystemInfo, and examining the allocationGranularity, but I've never seen it return anything other than 64k.Steven Schveighoffer wrote:Hm... I read the limit above online (http://blogs.msdn.com/oldnewthing/archive/2005/07/29/444912.aspx). I think maybe the pages aren't initialized, but doesn't the system have to reserve them because it has to guarantee a 1MB stack for each? Or maybe that guy is completely wrong, or the concept has been fixed :)On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
"Jb" wrote"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:g8emvj$1cu5$1 digitalmars.com...Read my whole post :) "this is the typical situation for Windows" i.e. the default. -Steve"Sascha Katzner" wroteRead the whole page ;-) The 1MB is just the default stack size if you pass a value of zero to the CreateThread function, and usualy most linkers / librarys whatever do that. You can override this by specifying the stack size. And as he said he could then create 13000 rather than 2000 threads. But the actualy mimimum size is 64k, as that is the granularity of the windows allocator. You can check by calling getSystemInfo, and examining the allocationGranularity, but I've never seen it return anything other than 64k.Steven Schveighoffer wrote:Hm... I read the limit above online (http://blogs.msdn.com/oldnewthing/archive/2005/07/29/444912.aspx). I think maybe the pages aren't initialized, but doesn't the system have to reserve them because it has to guarantee a 1MB stack for each? Or maybe that guy is completely wrong, or the concept has been fixed :)On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:g8eqos$1jn5$1 digitalmars.com...Read my whole post :) "this is the typical situation for Windows" i.e. the default.Ok, sorry, my bad. ;-)
Aug 19 2008
"Sascha Katzner" <sorry.no spam.invalid> wrote in message news:g8emle$1bad$1 digitalmars.com...Steven Schveighoffer wrote:The allocation granularity is actualy 64k chunks on most (at least all that ive ever used) windows OSes. The 4k granulary is for the protection status. Which i've always thought was a bit odd.On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
Sascha Katzner wrote:Steven Schveighoffer wrote:The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
Reply to Walter,Sascha Katzner wrote:Must the stack be contiguous? Could you, on a stack overflow, heap allocate a new chunk and put in a frame that gets back to the last segment?Steven Schveighoffer wrote:The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.On a 32-bit system, the amount of addressable memory space and the stack size are the factors that limit the number of threads. For example, if addressable space is 2GB, and each thread has a 1MB stack, that's 2000 threads max (this is the typical situation for Windows).Not exactly right, since the stack is organized in 4kb pages and these pages are not initialized until they are first used they usually require a lot less than 1mb.
Aug 19 2008
BCS wrote:No.The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.Must the stack be contiguous? Could you, on a stack overflow, heap allocate a new chunk and put in a frame that gets back to the last segment?
Aug 19 2008
Walter Bright wrote:BCS wrote:I suppose I should explain a bit more. Stack overflows are detected by the hardware, which means every PUSH could overflow the stack. It isn't practical to then try and continue the stack elsewhere, because the corresponding POPs would not know what was going on.No.The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.Must the stack be contiguous? Could you, on a stack overflow, heap allocate a new chunk and put in a frame that gets back to the last segment?
Aug 19 2008
Thin locks sounds pretty cool, but I do have one question. It seems like automatically inflating the lock the first time contention is detected is a fairly inefficient strategy, since the lock may be contested, but very infrequently. Would it be possible to keep track of the frequency of contesting and, if the lock is contested only, say, 1% of the time, just leave it as a spinlock?
Aug 20 2008
dsimcha wrote:Thin locks sounds pretty cool, but I do have one question. It seems like automatically inflating the lock the first time contention is detected is a fairly inefficient strategy, since the lock may be contested, but very infrequently. Would it be possible to keep track of the frequency of contesting and, if the lock is contested only, say, 1% of the time, just leave it as a spinlock?The paper describing thin locks (my post has a link to it) claims that objects essentially come in two classes--the ones that are never contested, and the ones that are constantly contested. In-between situation are relatively rare.
Aug 20 2008
On Wed, 20 Aug 2008, Bartosz Milewski wrote:dsimcha wrote:I'll agree that inbetween is relatively rare, but just because there's contention, doesn't require the switch to heavy locks. In a lot of the code I have written, one spin (assuming the spin utilizes rep; nop;) is sufficient for the 'blocked' acquire to finish it's acquire. The use of heavy weight locks in that scenario is a net loss. Later, BradThin locks sounds pretty cool, but I do have one question. It seems like automatically inflating the lock the first time contention is detected is a fairly inefficient strategy, since the lock may be contested, but very infrequently. Would it be possible to keep track of the frequency of contesting and, if the lock is contested only, say, 1% of the time, just leave it as a spinlock?The paper describing thin locks (my post has a link to it) claims that objects essentially come in two classes--the ones that are never contested, and the ones that are constantly contested. In-between situation are relatively rare.
Aug 20 2008
Brad Roberts wrote:I'll agree that inbetween is relatively rare, but just because there's contention, doesn't require the switch to heavy locks. In a lot of the code I have written, one spin (assuming the spin utilizes rep; nop;) is sufficient for the 'blocked' acquire to finish it's acquire. The use of heavy weight locks in that scenario is a net loss.A thin lock is not a spin lock. It's an optimization on top of the OS lock specifically for the no-contention case. If there is contention, the OS lock, which is part of FatLock, is used. It presumably does its own optimizations, so calling it "heavy weight" may be unfair.
Aug 20 2008
Bartosz Milewski, el 20 de agosto a las 15:14 me escribiste:Brad Roberts wrote:I would love to hear you point on optimizing an OS lock that is already optimized. Is that really necessary? Do you have numbers that back you up? Thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- aFR [afr my.farts.cause.nuclear.reaction.org] has quit IRC (Ping timeout)I'll agree that inbetween is relatively rare, but just because there's contention, doesn't require the switch to heavy locks. In a lot of the code I have written, one spin (assuming the spin utilizes rep; nop;) is sufficient for the 'blocked' acquire to finish it's acquire. The use of heavy weight locks in that scenario is a net loss.A thin lock is not a spin lock. It's an optimization on top of the OS lock specifically for the no-contention case. If there is contention, the OS lock, which is part of FatLock, is used. It presumably does its own optimizations, so calling it "heavy weight" may be unfair.
Aug 22 2008
On Fri, 22 Aug 2008 14:08:28 -0400, Leandro Lucarella <llucax gmail.com> wrote:Bartosz Milewski, el 20 de agosto a las 15:14 me escribiste:Assuming the OS lock involves a kernel call then yes, there is a major performance difference. Kernel calls are the reason that GC programs beat programs using malloc and minimizing them is the secret behind all the various 'fast' malloc replacements.Brad Roberts wrote:I would love to hear you point on optimizing an OS lock that is already optimized. Is that really necessary? Do you have numbers that back you up? Thank you.I'll agree that inbetween is relatively rare, but just because there's contention, doesn't require the switch to heavy locks. In a lot ofthe code Ihave written, one spin (assuming the spin utilizes rep; nop;) issufficient forthe 'blocked' acquire to finish it's acquire. The use of heavy weightlocks inthat scenario is a net loss.A thin lock is not a spin lock. It's an optimization on top of the OS lock specifically for the no-contention case. If there is contention, the OS lock, which is part of FatLock, is used. It presumably does its own optimizations, so calling it "heavy weight" may be unfair.
Aug 22 2008
Robert Jacques, el 22 de agosto a las 22:43 me escribiste:On Fri, 22 Aug 2008 14:08:28 -0400, Leandro Lucarella <llucax gmail.com> wrote:It doesn't (at least in Linux). -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Si pensas que el alma no se ve el alma sí se ve en los ojosBartosz Milewski, el 20 de agosto a las 15:14 me escribiste:Assuming the OS lock involves a kernel callBrad Roberts wrote:I would love to hear you point on optimizing an OS lock that is already optimized. Is that really necessary? Do you have numbers that back you up? Thank you.I'll agree that inbetween is relatively rare, but just because there's contention, doesn't require the switch to heavy locks. In a lot of the code I have written, one spin (assuming the spin utilizes rep; nop;) is sufficient for the 'blocked' acquire to finish it's acquire. The use of heavy weight locks in that scenario is a net loss.A thin lock is not a spin lock. It's an optimization on top of the OS lock specifically for the no-contention case. If there is contention, the OS lock, which is part of FatLock, is used. It presumably does its own optimizations, so calling it "heavy weight" may be unfair.
Aug 22 2008
Leandro Lucarella wrote:I would love to hear you point on optimizing an OS lock that is already optimized.I think I'm going to write a separate blog post on this topic, since it's causing a lot of controversy. The short answer is that OS locks are optimized for different scenarios then thin locks.Is that really necessary? Do you have numbers that back you up?The original thin lock paper has the numbers. Granted, they are for Java, but I don't see why they shouldn't be a ballpark for D as well.
Aug 24 2008
Reply to Walter,Walter Bright wrote:That's what the extra stack frame I mentioned is for. I'll grant you can't split a single frame across the sections but if you cold detect an impending stack overflow before the function runs (by push/popping the largest block that the function can use [calloc aside]) you could then do the new stack at that point. When the function returns, it ends up in the extra frame and it knows how to return to the right location.BCS wrote:I suppose I should explain a bit more. Stack overflows are detected by the hardware, which means every PUSH could overflow the stack. It isn't practical to then try and continue the stack elsewhere, because the corresponding POPs would not know what was going on.No.The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.Must the stack be contiguous? Could you, on a stack overflow, heap allocate a new chunk and put in a frame that gets back to the last segment?
Aug 20 2008
BCS wrote:Reply to Walter,This is basically how Fibers work, though the stack transition isn't invisible. Each Fiber has its own stack and execution is transferred to some location in this stack when the Fiber is called. SeanWalter Bright wrote:That's what the extra stack frame I mentioned is for. I'll grant you can't split a single frame across the sections but if you cold detect an impending stack overflow before the function runs (by push/popping the largest block that the function can use [calloc aside]) you could then do the new stack at that point. When the function returns, it ends up in the extra frame and it knows how to return to the right location.BCS wrote:I suppose I should explain a bit more. Stack overflows are detected by the hardware, which means every PUSH could overflow the stack. It isn't practical to then try and continue the stack elsewhere, because the corresponding POPs would not know what was going on.No.The memory isn't committed until it is used, but the address space is. You cannot move the stack if it exceeds its preallocated address space allotment.Must the stack be contiguous? Could you, on a stack overflow, heap allocate a new chunk and put in a frame that gets back to the last segment?
Aug 20 2008
Aarti_pl wrote:From your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)?This is the number I've found in the D runtime and I didn't question it (yet!). Thin lock has a lot of bits to spare--I made the recursion count ridiculously huge. Count can easily be cut down to 8 bits and all the rest could go towards the thread index if necessary. On a 64-bit machine, thread index could spill into the extra 32-bits. So we are are well prepared for the future.
Aug 19 2008
Bartosz Milewski wrote:Aarti_pl wrote:I believe the same restriction doesn't exist in Tango's runtime. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the TangoFrom your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)?This is the number I've found in the D runtime and I didn't question it (yet!).
Aug 19 2008
== Quote from Aarti_pl (aarti interia.pl)'s articleBartosz Milewski pisze:I had the same concern, however, the compact thread id is clearly necessary for the algorithm to operate properly, so what can you do. Perhaps transfer some bits from the recursion counter to the thread id to increase that number. For what it's worth, Tango doesn't use a static thread list because I think the basic approach is flawed. As you say, it creates a fixed upper bound for the number of threads the app may contain, creates visible gaps in the global thread list (visible via getAll), and causes contention for a common cache line. However, I think it's reasonable to create some sort of thread id anyway, if only to allow for thin locks to operate. I've been kicking around a few ways to graft this functionality onto Tango without incurring any undue overhead. SeanI posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ . BartoszWell, I am definitely no concurrency expert, but one questions bothers me a bit. From your article it seems that there will be limit of 2046 threads for program (11 bits, one based). Isn't it too small number, taking into account multi core processors which will emerge (able to start much more threads than currently), and applications like web servers (which serves a lot of concurrent connections)?
Aug 19 2008
"Bartosz Milewski" <bartosz relisoft.com> wrote in message news:g8aeg7$h2h$1 digitalmars.com...I posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ .One thing that occurs to me is this probably wont work with threads the D runtime doesnt know about. For example interupt call backs. For real time audio you typicaly send a callback funciton pointer to the soundcard driver and then this is called from the soundcards interupt thread. But the D runtime thread pool doesnt know about this thread. What about OS callbacks? Do they always come in on a thread that D knows about? Im not sure tbh. And if you want real time operation, pro audio, live video, ect.. then these callback threads have to remain invisible to the GC, so it cant pause them.. but you'd still need them visible in the global thread index.. or else your thin lock mechanism wont work. Then again maybe you shouldnt use the inbuilt thin locks in such circumstances anyway.
Aug 19 2008
"Bartosz Milewski" <bartosz relisoft.com> wrote in message news:g8aeg7$h2h$1 digitalmars.com...I posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments.Another thought I had is that when reading it is this. For debuging purposes the compiler could do this. When an object is created as never shared, rather than setting the thin lock to zero, you could instead set it to the current threadId. Then the compiler could atomaticy add checks into the DBC code to make sure that the object is only ever accessed from the thread that created it.
Aug 19 2008
"Bartosz Milewski" wroteI posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ .This looks a lot more promising than the original concept that Walter linked to :) There is mention of casting away shared which throws an exception if the object isn't shared. Is this the locking mechanism to both read and write data? i.e. you have to cast away sharing to use the data, then cast it back to unshared? How do you wait for an object to be available so you can unshare it? What happens if you forget to uncast it? What happens to subcomponents? i.e. shared is supposed to be transitive, right? So if you cast away shared in the top-level object, does that go and try unsharing all the objects referenced by it (i.e. setting all the currently shared bits to 0)? If not, then how do you know that the subobjects are shared in the type system? It looks like a lot of this is going to be runtime checks, or is there going to be another type modifier for 'created shared, but currently unshared', i.e. 'locked' or something? Cool stuff, I think you are on the right track. -Steve
Aug 19 2008
"Steven Schveighoffer" wrote"Bartosz Milewski" wroteReading further messages in your blog, I see that you expect sharing casts to be recursive (to preserve the transitive nature of it). What if two shared objects have references to the same shared object? If you 'unshare' one parent, the other object now is shared, but its subcomponent is not shared. Is that considered ok? -SteveI posted some implementation details of Thin Lock in my blog, http://bartoszmilewski.wordpress.com/ . I would appreciate comments. And by the way, I put it on reddit too: http://www.reddit.com/comments/6wmq4/thin_lock_implementation/ .This looks a lot more promising than the original concept that Walter linked to :) There is mention of casting away shared which throws an exception if the object isn't shared. Is this the locking mechanism to both read and write data? i.e. you have to cast away sharing to use the data, then cast it back to unshared? How do you wait for an object to be available so you can unshare it? What happens if you forget to uncast it? What happens to subcomponents? i.e. shared is supposed to be transitive, right? So if you cast away shared in the top-level object, does that go and try unsharing all the objects referenced by it (i.e. setting all the currently shared bits to 0)? If not, then how do you know that the subobjects are shared in the type system? It looks like a lot of this is going to be runtime checks, or is there going to be another type modifier for 'created shared, but currently unshared', i.e. 'locked' or something? Cool stuff, I think you are on the right track.
Aug 19 2008
Steven Schveighoffer wrote:Reading further messages in your blog, I see that you expect sharing casts to be recursive (to preserve the transitive nature of it). What if two shared objects have references to the same shared object? If you 'unshare' one parent, the other object now is shared, but its subcomponent is not shared. Is that considered ok?No, it's definitely not ok, but unless there is a reference-counting mechanism in place, we can't do anything about it during casting. However, the sub-objects will be marked as exclusively owned by the casting thread, so any attempt from another thread to lock them will result in an exception. This is true to the spirit of casting in general.
Aug 19 2008
"Bartosz Milewski" wroteSteven Schveighoffer wrote:Yes, it's similar to casting something to invariant when you are not sure that another reference doesn't exist. However, in this case, the casting is considered to be an accepted practice (and probably more likely than casting something back and forth from invariant). Normally, casting has big warning signs all over it, I see this as a commonly used feature (or should it be?). I noticed that you have 'locked' and 'unshared' concepts as separate, the former using contention and the latter which is a try-only version. Is there any enforcement for not using either of these? For example, if thread A tries to access (without casting) a shared object when thread B has casted it to unshared, does it succeed? The thing I'm trying to figure out in all this is, I can see the potential with the implementation you are coming up with for some really cool locking features, yet the 'shared/unshared' plan also has to do with lock-free programming (with fences and stuff, I'm not very familiar with these things). It looks to me like you can do one or the other, but not both? How do you know which mode you are using a shared variable in? -SteveReading further messages in your blog, I see that you expect sharing casts to be recursive (to preserve the transitive nature of it). What if two shared objects have references to the same shared object? If you 'unshare' one parent, the other object now is shared, but its subcomponent is not shared. Is that considered ok?No, it's definitely not ok, but unless there is a reference-counting mechanism in place, we can't do anything about it during casting. However, the sub-objects will be marked as exclusively owned by the casting thread, so any attempt from another thread to lock them will result in an exception. This is true to the spirit of casting in general.
Aug 19 2008
Steven Schveighoffer wrote:"Bartosz Milewski" wroteI think the implication that casting an object to unshared actually mutates the "shared" bit within the thin lock bitset. Something like this: shared MyClass c; void execThreadA() { MyClass x = makeLocal( c ); // 'shared' bit in instance of x is now 0 } T makeLocal(T)( inout T val ) { // Assume that the state of MyClass is appropriately visible to // the calling thread because this is a shared object shared T tmp = val; val = null; return cast(unshared) tmp; } If the "val = null" assignment is forgotten in makeLocal then an attempt to synchronize on c will throw a SharingException (or whatever it's called). The benefit of being able to cast away the shared label is clearly for performance... if you know that you're the sole owner of a shared object then you don't need to lock for otherwise synchronized accesses (so long), much like assertUnique for const/invariant data. SeanSteven Schveighoffer wrote:Yes, it's similar to casting something to invariant when you are not sure that another reference doesn't exist. However, in this case, the casting is considered to be an accepted practice (and probably more likely than casting something back and forth from invariant). Normally, casting has big warning signs all over it, I see this as a commonly used feature (or should it be?). I noticed that you have 'locked' and 'unshared' concepts as separate, the former using contention and the latter which is a try-only version. Is there any enforcement for not using either of these? For example, if thread A tries to access (without casting) a shared object when thread B has casted it to unshared, does it succeed? The thing I'm trying to figure out in all this is, I can see the potential with the implementation you are coming up with for some really cool locking features, yet the 'shared/unshared' plan also has to do with lock-free programming (with fences and stuff, I'm not very familiar with these things). It looks to me like you can do one or the other, but not both? How do you know which mode you are using a shared variable in?Reading further messages in your blog, I see that you expect sharing casts to be recursive (to preserve the transitive nature of it). What if two shared objects have references to the same shared object? If you 'unshare' one parent, the other object now is shared, but its subcomponent is not shared. Is that considered ok?No, it's definitely not ok, but unless there is a reference-counting mechanism in place, we can't do anything about it during casting. However, the sub-objects will be marked as exclusively owned by the casting thread, so any attempt from another thread to lock them will result in an exception. This is true to the spirit of casting in general.
Aug 19 2008
Sean Kelly wrote:If the "val = null" assignment is forgotten in makeLocal then an attempt to synchronize on c will throw a SharingException (or whatever it's called).I would like to make the nulling of the original reference part of the cast. In other words, makeLocal would take its argument by non-const reference and null it. That still doesn't solve the problem of multiple aliases, hence the need for SharingException.The benefit of being able to cast away the shared label is clearly for performance... if you know that you're the sole owner of a shared object then you don't need to lock for otherwise synchronized accesses (so long), much like assertUnique for const/invariant data.There is one complication though--the type system. A function that takes a non-shared argument can't be called with a shared object. But it's true that a sharing cast is an optimization. The safe way to do it is to make a thread-local deep copy of the object and pass that to a library function. (Object cloning is a hot topic in D.)
Aug 19 2008
== Quote from Bartosz Milewski (bartosz relisoft.com)'s articleSean Kelly wrote:Yeah, that's the nasty thing about atomics--they provide yet another reason to duplicate function bodies. Sometimes I wonder if we won't eventually end up writing only template code simply out of necessity. I've had the same thought about cloning, though. Once 'shared' is in place we'll need to perform deep copies to transfer local data between threads, and I would expect such passing to be far more common than the use of shared data. So we'll need a reliable and efficient means of cloning. But then we'll need the same thing for STM, so I guess we'll be killing two birds with one stone :-) SeanThe benefit of being able to cast away the shared label is clearly for performance... if you know that you're the sole owner of a shared object then you don't need to lock for otherwise synchronized accesses (so long), much like assertUnique for const/invariant data.There is one complication though--the type system. A function that takes a non-shared argument can't be called with a shared object. But it's true that a sharing cast is an optimization. The safe way to do it is to make a thread-local deep copy of the object and pass that to a library function. (Object cloning is a hot topic in D.)
Aug 19 2008
Sean Kelly wrote:So we'll need a reliable and efficient means of cloning. But then we'll need the same thing for STM, so I guess we'll be killing two birds with one stone :-)Exactly!
Aug 19 2008