www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thin lock vs. Futex

reply Bartosz Milewski <bartosz relisoft.com> writes:
As promised, I posted a blog comparing the thin lock to futex. Please 
vote it on reddit: http://www.reddit.com/comments/6z4sv/ .
Sep 02 2008
parent reply "Jb" <jb nowhere.com> writes:
"Bartosz Milewski" <bartosz relisoft.com> wrote in message 
news:g9jvui$nbs$1 digitalmars.com...
 As promised, I posted a blog comparing the thin lock to futex. Please vote 
 it on reddit: http://www.reddit.com/comments/6z4sv/ .
Early on you say this... "Unlock tests if the futex variable is equal to one (we are the owner, and nobody is waiting). If true, it sets it to zero. This is the fast-track common-case execution that doesn't make any futex calls whatsoever." But the futex code you list later in the blog does this.... // we own the lock, so it's either 1 or 2 if (atomic_decrement(&_word) != 1) So does the futex actualy need/use an atomic for unlocking? If it does then that makes it a great deal slower than a thin lock as that typicaly doesnt require an atomic for the unlock. As im sure you're aware, CAS and such like cost around 120..160 cycles on most cpus, so having an algorythm that cuts their usage in half is a big avantage. So I'd be touting that as good reason to use thin locks. :-)
Sep 02 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Jb wrote:
 "Bartosz Milewski" <bartosz relisoft.com> wrote in message 
 news:g9jvui$nbs$1 digitalmars.com...
 As promised, I posted a blog comparing the thin lock to futex. Please vote 
 it on reddit: http://www.reddit.com/comments/6z4sv/ .
Early on you say this... "Unlock tests if the futex variable is equal to one (we are the owner, and nobody is waiting). If true, it sets it to zero. This is the fast-track common-case execution that doesn't make any futex calls whatsoever." But the futex code you list later in the blog does this.... // we own the lock, so it's either 1 or 2 if (atomic_decrement(&_word) != 1) So does the futex actualy need/use an atomic for unlocking?
That implementation does. However, I think it could be replaced by an unordered atomic load followed by a set with "release" semantics on most modern architectures. On (non-AMD) x86, this should mean no explicit synchronization for the unlock() routine.
 If it does then that makes it a great deal slower than a thin lock as that 
 typicaly doesnt require an atomic for the unlock. As im sure you're aware, 
 CAS and such like cost around 120..160 cycles on most cpus, so having an 
 algorythm that cuts their usage in half is a big avantage.
Definitely. Sean
Sep 02 2008