digitalmars.D.learn - bool Associative Array Synchronized

Etienne (9/9) Mar 20 2014 I'd like to "cowboy it" on an AA that looks like this:

Steven Schveighoffer (8/16) Mar 20 2014 No, it's not safe. With memory reordering, there is no "safe" unless you...

Etienne (3/22) Mar 20 2014 Right, I was assuming it was always ordered, but modern processor

Chris Williams (9/11) Mar 20 2014 Even without rearranging the order of your code, your bit exists

H. S. Teoh (22/34) Mar 20 2014 Furthermore, the CPU does not access bits directly; it processes them as

Etienne (6/38) Mar 20 2014 Hmm my misunderstanding comes from not taking into account everything I

John Colvin (12/23) Mar 20 2014 On x86 only?

Etienne (2/5) Mar 20 2014 Heh. Will do!

Etienne <etcimon gmail.com> writes:

I'd like to "cowboy it" on an AA that looks like this:

__gshared bool[string] m_mutex;

I think it'll be much faster for my code because this AA could need to 
be checked and switched possibly millions of times per second and I 
wouldn't want to slow it down with a mutex protecting it.

I'm thinking the bool would be synchronized at the hardware level 
anyway, since it's just a bit being flipped or returned. Am I right to 
assume this and (maybe an assembly guru can answer) it safe to use?

Thanks

Mar 20 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 20 Mar 2014 13:43:58 -0400, Etienne <etcimon gmail.com> wrote:

 I'd like to "cowboy it" on an AA that looks like this:

 __gshared bool[string] m_mutex;

 I think it'll be much faster for my code because this AA could need to  
 be checked and switched possibly millions of times per second and I  
 wouldn't want to slow it down with a mutex protecting it.

 I'm thinking the bool would be synchronized at the hardware level  
 anyway, since it's just a bit being flipped or returned. Am I right to  
 assume this and (maybe an assembly guru can answer) it safe to use?

No, it's not safe. With memory reordering, there is no "safe" unless you  
use mutexes or low-level atomics.

Long story short, the compiler, the processor, or the memory cache can  
effectively reorder operations, making one thread see things happen in a  
different order than they are written/executed on another thread. There  
are no guarantees.

-Steve

Mar 20 2014

Etienne <etcimon gmail.com> writes:

On 2014-03-20 1:52 PM, Steven Schveighoffer wrote:
 On Thu, 20 Mar 2014 13:43:58 -0400, Etienne <etcimon gmail.com> wrote:

 I'd like to "cowboy it" on an AA that looks like this:

 __gshared bool[string] m_mutex;

 I think it'll be much faster for my code because this AA could need to
 be checked and switched possibly millions of times per second and I
 wouldn't want to slow it down with a mutex protecting it.

 I'm thinking the bool would be synchronized at the hardware level
 anyway, since it's just a bit being flipped or returned. Am I right to
 assume this and (maybe an assembly guru can answer) it safe to use?

 No, it's not safe. With memory reordering, there is no "safe" unless you
 use mutexes or low-level atomics.

 Long story short, the compiler, the processor, or the memory cache can
 effectively reorder operations, making one thread see things happen in a
 different order than they are written/executed on another thread. There
 are no guarantees.

 -Steve

Right, I was assuming it was always ordered, but modern processor 
pipelines are different I guess.

Mar 20 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
 Right, I was assuming it was always ordered, but modern 
 processor pipelines are different I guess.

Even without rearranging the order of your code, your bit exists 
in RAM but all the logic takes place in a CPU register, meaning 
that any time you operate with the bit, there's at least two 
copies. When the CPU switches threads (which could be at any 
point), it empties the CPU into RAM. If the thread it's switching 
to then tries to interact with the bit, it's going to create a 
third copy. Now whichever of the two threads is slower to write 
back to the original location is going to smash the other's.

Mar 20 2014

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 20, 2014 at 06:39:10PM +0000, Chris Williams wrote:
 On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
Right, I was assuming it was always ordered, but modern processor
pipelines are different I guess.

 
 Even without rearranging the order of your code, your bit exists in
 RAM but all the logic takes place in a CPU register, meaning that
 any time you operate with the bit, there's at least two copies. When
 the CPU switches threads (which could be at any point), it empties
 the CPU into RAM. If the thread it's switching to then tries to
 interact with the bit, it's going to create a third copy. Now
 whichever of the two threads is slower to write back to the original
 location is going to smash the other's.

Furthermore, the CPU does not access bits directly; it processes them as
(at least) bytes. To set/clear a bit in memory location X, the CPU has
to first read X into a register (at least 8 bits long), update the bit,
and write it back into memory. If two threads are simultaneously
operating on different bits that resides in the same byte, whichever CPU
runs last will smash whatever the first CPU wrote.

Let's say you start with 00000000b in location X, and CPU1 wants to set
the first bit and CPU2 wants to set the second bit. Both CPU's initially
reads 00000000b into their registers, then the first CPU sets the first
bit, so it becomes 00000001b (in register) and the second CPU sets the
second bit so it becomes 00000010b (in register). Now CPU1 writes
00000001b back to memory, followed by CPU2 writing 00000010b back to
memory. Now what CPU1 did has been smashed by CPU2's write.

Now, the current AA implementation doesn't actually pack bits like this,
so this particular problem doesn't actually happen, but other similar
problems will occur if you add/remove keys from the AA -- two CPU's will
try to update internal AA pointers simultaneously, and end up trashing
it.


T

-- 
What doesn't kill me makes me stranger.

Mar 20 2014

Etienne <etcimon gmail.com> writes:

On 2014-03-20 2:47 PM, H. S. Teoh wrote:
 On Thu, Mar 20, 2014 at 06:39:10PM +0000, Chris Williams wrote:
 On Thursday, 20 March 2014 at 18:06:18 UTC, Etienne wrote:
 Right, I was assuming it was always ordered, but modern processor
 pipelines are different I guess.

 Even without rearranging the order of your code, your bit exists in
 RAM but all the logic takes place in a CPU register, meaning that
 any time you operate with the bit, there's at least two copies. When
 the CPU switches threads (which could be at any point), it empties
 the CPU into RAM. If the thread it's switching to then tries to
 interact with the bit, it's going to create a third copy. Now
 whichever of the two threads is slower to write back to the original
 location is going to smash the other's.

 Furthermore, the CPU does not access bits directly; it processes them as
 (at least) bytes. To set/clear a bit in memory location X, the CPU has
 to first read X into a register (at least 8 bits long), update the bit,
 and write it back into memory. If two threads are simultaneously
 operating on different bits that resides in the same byte, whichever CPU
 runs last will smash whatever the first CPU wrote.

 Let's say you start with 00000000b in location X, and CPU1 wants to set
 the first bit and CPU2 wants to set the second bit. Both CPU's initially
 reads 00000000b into their registers, then the first CPU sets the first
 bit, so it becomes 00000001b (in register) and the second CPU sets the
 second bit so it becomes 00000010b (in register). Now CPU1 writes
 00000001b back to memory, followed by CPU2 writing 00000010b back to
 memory. Now what CPU1 did has been smashed by CPU2's write.

 Now, the current AA implementation doesn't actually pack bits like this,
 so this particular problem doesn't actually happen, but other similar
 problems will occur if you add/remove keys from the AA -- two CPU's will
 try to update internal AA pointers simultaneously, and end up trashing
 it.


 T

Hmm my misunderstanding comes from not taking into account everything I 
knew about CPU caches. As a side note, I think I heard somewhere that 
abstractions don't flatten out the learning curve at all, it's like a 
bigger gun but you still need to know all the basics to avoid shooting 
yourself with it. So true. Well, thanks for the explanation there :)

Mar 20 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 20 March 2014 at 17:43:58 UTC, Etienne wrote:
 I'd like to "cowboy it" on an AA that looks like this:

 __gshared bool[string] m_mutex;

 I think it'll be much faster for my code because this AA could 
 need to be checked and switched possibly millions of times per 
 second and I wouldn't want to slow it down with a mutex 
 protecting it.

 I'm thinking the bool would be synchronized at the hardware 
 level anyway, since it's just a bit being flipped or returned. 
 Am I right to assume this and (maybe an assembly guru can 
 answer) it safe to use?

 Thanks

On x86 only?

If all you do is read/write from/to each location then you are 
guaranteed atomicity. Adding/removing members? Forget it

Also, atomicity is not a strong enough guarantee for implementing 
a mutex, which is what I assume you are trying to do. You need 
ordering guarantees as well.

Watch both of these:

http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2

and then look at core.atomic

and then give up and use a lock :p

Mar 20 2014

Etienne <etcimon gmail.com> writes:

On 2014-03-20 1:58 PM, John Colvin wrote:
 Also, atomicity is not a strong enough guarantee for implementing a
 mutex, which is what I assume you are trying to do. You need ordering
 guarantees as well.

Heh. Will do!

Mar 20 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - bool Associative Array Synchronized