digitalmars.D - A Lock-Free Hash Table (google video)
- Knud Soerensen (2/2) Apr 01 2007 I would like to share this interesting video
- Dejan Lekic (1/1) Apr 01 2007 Mr. Soerensen, thanks - a really nice video.
- Manfred Nowak (8/11) Apr 01 2007 The author admits, that he has some problems when resizing.
- Craig Black (3/3) Apr 01 2007 Good stuff. BTW, does anyone know when is D going to start using lock f...
- David B. Held (5/7) Apr 01 2007 When STM is implemented. ;) And I hope I'm not giving away too much by
- Brad Roberts (23/33) Apr 01 2007 STM (software transactional memory) isn't required for lock-free simple
- David B. Held (11/16) Apr 01 2007 Well, the point is that even if nobody rewrites any containers to use
- Sean Kelly (3/11) Apr 01 2007 Will this require the user to expose a .clone() method for classes
- David B. Held (12/24) Apr 01 2007 I don't want to overpromise and underdeliver, so I will start out with
- Craig Black (9/17) Apr 02 2007 This is very good. It seems that there are a number of things in the wo...
- Dan (6/6) Apr 02 2007 It seems to me that this is the only optimal form for hashtables - a mas...
- David B. Held (3/10) Apr 02 2007 Nothing more exciting than bug fixes, but that's always good, right? ;)
- Lionello Lunesu (8/11) Apr 02 2007 Very interesting, thanks for that. Maybe now we can convince Walter to
- Craig Black (5/15) Apr 02 2007 Yes. I've done benchmarks and power of two is better. I wonder, does T...
- Dan (14/17) Apr 02 2007 oh my! I thought this was obvious; as was the powers of two, and using ...
- Sean Kelly (4/26) Apr 02 2007 Nope. It's all plain old D code at the moment IIRC.
- Dan (6/11) Apr 02 2007 Rehashing, you'd probe until you've found the right location *then* move...
- Sean Kelly (4/13) Apr 02 2007 I have no idea. I thought you were asking if there were inline asm
- Mark (3/3) May 15 2007 For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores...
- Sean Kelly (6/8) Apr 02 2007 It's on my "to do" list, actually. So far, however, I have been
- Sean Kelly (5/17) Apr 02 2007 No. I've considered creating a GC with per-thread memory pools and
I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
Apr 01 2007
Mr. Soerensen, thanks - a really nice video.
Apr 01 2007
Knud Soerensen wroteI would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862The author admits, that he has some problems when resizing. He solved them by "stalling". With 64MB hashes the stall time is short though. http://blogs.azulsystems.com/cliff/ But I wonder, how much "stalling" will be done on resizing to some milliards of hash positions. -manfred
Apr 01 2007
Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers? -Craig
Apr 01 2007
Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
Apr 01 2007
David B. Held wrote:Craig Black wrote:STM (software transactional memory) isn't required for lock-free simple data structures (not all structures can be done lock free and I'm not an expert in that area). It can be used, but it's overkill. A better answer, however, is that thread safety isn't something that you apply to every single layer of data structures. You need to really consider the appropriate level of granularity. Even lock free routines have a cost that you don't necessarily want to pay all the time. Given that, the lowest level is generally _not_ the right place to be applying synchronization logic (be it with atomic operations or locks). And to be clear, I've asked this before and been convinced of the above. You could ask the same thing of the stl for c++, and the same answer would apply. NOW, and even better answer might be to have the safety be a pluggable policy decision and then people would be able to make the choice for themselves. That'd allow the best of all worlds. The problem is that writing code that flexibly is quite a bit of work and someone would need to volunteer to do it since I'm just about 100% sure that the code in question here isn't going to change otherwise. Your question is a good one, and the answer is a lot more complex than it seems at first blush. Later, BradGood stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
Apr 01 2007
Brad Roberts wrote:[...] STM (software transactional memory) isn't required for lock-free simple data structures (not all structures can be done lock free and I'm not an expert in that area). It can be used, but it's overkill. [...]Well, the point is that even if nobody rewrites any containers to use lock-free algorithms, you can get correct lock-free behavior at whatever level of granularity you feel is appropriate by writing the relevant code in an STM atomic {} block. Granted, there are cases where a lock-free container makes sense and may provide, for instance, better concurrency in the high-contention case. But much of the attraction of STM is that for the simple case, no code actually has to be rewritten, which is great for people who need to build big data structures but don't want to become experts in designing lock-free algorithms. Dave
Apr 01 2007
David B. Held wrote:Craig Black wrote:Will this require the user to expose a .clone() method for classes involved in transactions?Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...
Apr 01 2007
Sean Kelly wrote:David B. Held wrote:I don't want to overpromise and underdeliver, so I will start out with the disclaimer that nothing I say about D features is guaranteed to be implemented. ;) That being said, word-based STM is the most generic form, and requires no special work on the part of any user whatsoever. The first form of STM provided may or may not be word-based. ;) However, Walter and his "STM advisors" are aware of the other forms of STM, including object-based, and there is a possibility of having user-specifiable STM implementations. I think everyone realizes that concurrency is the hot new buzzword, and you can rest assured that D will take it very, very seriously. DaveCraig Black wrote:Will this require the user to expose a .clone() method for classes involved in transactions?Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...
Apr 01 2007
"David B. Held" <dheld codelogicconsulting.com> wrote in message news:eupvmj$1au5$1 digitalmars.com...Craig Black wrote:This is very good. It seems that there are a number of things in the works for the D compiler. The const stuff, Walter has mentioned run-time reflection, and now this STM as you now mention. Any of these things would be very cool IMO. Although Walter sometimes mentions stuff currently in development, he does not say much about it. Do you have any insight on what is coming next? -CraigGood stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
Apr 02 2007
It seems to me that this is the only optimal form for hashtables - a massively scaleable one. Why? For cases where n < 8, a linear search is typically faster than any other search algorithm. The reason is less overhead. For cases where n < 1024, a level order binary search array is typically faster than any other search algorithm I've learned about, including a hashtable. The reason is again less overhead. For cases where n > 1024, I currently think that O(log(n)) in a level order binary search array is actually more expensive than that to manage a hashtable. For cases where n > 1048576, the data set is getting quite large. Assuming, through heuristics, that the number of gets and puts is also increasing, it becomes very practical to parallelize the hashtable. My understanding is that we tend to organize data either very small or very large, rarely between 1024 and 1048576. So implementing a hashtable, to me, suggests it ought to always be parallelizeable. Especially considering the low overhead involved if it's done right.
Apr 02 2007
Craig Black wrote:[...] This is very good. It seems that there are a number of things in the works for the D compiler. The const stuff, Walter has mentioned run-time reflection, and now this STM as you now mention. Any of these things would be very cool IMO. Although Walter sometimes mentions stuff currently in development, he does not say much about it. Do you have any insight on what is coming next?Nothing more exciting than bug fixes, but that's always good, right? ;) Dave
Apr 02 2007
Knud Soerensen wrote:I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.
Apr 02 2007
Yes. I've done benchmarks and power of two is better. I wonder, does Tango use a power of two algorithm for its AA's? -Craig "Lionello Lunesu" <lio lunesu.remove.com> wrote in message news:euqekc$24nu$1 digitalmars.com...Knud Soerensen wrote:I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.
Apr 02 2007
Craig Black Wrote:oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking. I'm going to hope that moving a 16-byte uses SSE2...Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD?
Apr 02 2007
Dan wrote:Craig Black Wrote:Yup. It's rehashing that gets a bit messy.oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking.Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD?I'm going to hope that moving a 16-byte uses SSE2...Nope. It's all plain old D code at the moment IIRC. Sean
Apr 02 2007
Sean Kelly Wrote:Yup. It's rehashing that gets a bit messy.Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2??? Oh my... Hasn't SSE been out since 1997? I guess with 64-bit GPR's, SSE1 is practically irrelevant now, but SSE2 and SSE3 could definitely improve performance of certain things... like copying arrays of 16-byte Value structs in DMDScript. I assumed that was what Walter meant when he commented that using 16 byte Value structs was critical for performance. I'll be sure to use little asm { } blocks to get certain things done then. The wonderful thing about D is that Walter provides a default mechanism for prototyping that works well. If you want something that works superbly,you can cut under it without much hassle. A superb language, that D.I'm going to hope that moving a 16-byte uses SSE2...Nope. It's all plain old D code at the moment IIRC.
Apr 02 2007
Dan wrote:Sean Kelly Wrote:I have no idea. I thought you were asking if there were inline asm blocks. I suppose the codegen may use SSE2 instructions where appropriate. SeanYup. It's rehashing that gets a bit messy.Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2???I'm going to hope that moving a 16-byte uses SSE2...Nope. It's all plain old D code at the moment IIRC.
Apr 02 2007
For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores is implementation defined. They might be atomic on newer architectures, but older designs implement them as two discrete 64-bit operations.
May 15 2007
Craig Black wrote:Yes. I've done benchmarks and power of two is better. I wonder, does Tango use a power of two algorithm for its AA's?It's on my "to do" list, actually. So far, however, I have been avoiding making making unnecessary changes to the compiler runtime in case Walter decides to take over management of it. But there are a few things I want to resolve before Tango goes 1.0 either way. Sean
Apr 02 2007
Lionello Lunesu wrote:Knud Soerensen wrote:Yes.I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads?Any difference between Phobos/Tango?No. I've considered creating a GC with per-thread memory pools and such, but haven't had the time. Sean
Apr 02 2007