digitalmars.D.learn - Memory Efficient HashSet
- =?UTF-8?B?Tm9yZGzDtnc=?= (6/6) Mar 08 2016 Has anybody put together a memory-efficient D-implementation of a
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/5) Mar 08 2016 It appears to be very slow?
- =?UTF-8?B?Tm9yZGzDtnc=?= (23/29) Mar 09 2016 My knowledge database engine I'm building cannot afford the
- Martin Tschierschke (9/20) Mar 10 2016 May be the discussion associated with this post, can give help???
- =?UTF-8?B?Tm9yZGzDtnc=?= (5/6) Mar 10 2016 I need something close to the memory overhead of a sorted array
- =?UTF-8?B?Tm9yZGzDtnc=?= (2/4) Mar 10 2016 See also: http://smerity.com/articles/2015/google_sparsehash.html
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/4) Mar 10 2016 Sounds like a cool project! You could also look into using a trie.
- thedeemon (17/18) Mar 10 2016 Here's mine:
- Edwin van Leeuwen (20/26) Mar 08 2016 There is an implementation in:
- Steven Schveighoffer (6/11) Mar 10 2016 D needs a way to use allocators for hash nodes.
Has anybody put together a memory-efficient D-implementation of a HashSet Something like sparse_hash_set<> contained in https://github.com/sparsehash/sparsehash but in D.
Mar 08 2016
On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:sparse_hash_set<> contained in https://github.com/sparsehash/sparsehashIt appears to be very slow? What do you need it for?
Mar 08 2016
On Tuesday, 8 March 2016 at 12:25:36 UTC, Ola Fosheim Grøstad wrote:On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:My knowledge database engine I'm building cannot afford the memory overhead of D's builtin associative arrays. For instance the following program void main(string[] args) { alias T = uint; bool[T] x; // emulates a HashSet with significant memory overhead import std.range : iota; const n = 10_000_000; foreach (const i; iota(0, n)) { x[i] = true; // indicate that it's stored } import std.stdio : writeln, readln; writeln("Press return: "); readln; } consumes 842.m MiB on my Ubuntu. The consumption with Google's sparsehash unordered_set is about 50 MiB.sparse_hash_set<> contained in https://github.com/sparsehash/sparsehashIt appears to be very slow? What do you need it for?
Mar 09 2016
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote: [...]foreach (const i; iota(0, n)) { x[i] = true; // indicate that it's stored } import std.stdio : writeln, readln; writeln("Press return: "); readln; } consumes 842.m MiB on my Ubuntu. The consumption with Google's sparsehash unordered_set is about 50 MiB.May be the discussion associated with this post, can give help??? https://forum.dlang.org/post/mailman.217.1457590242.26339.digitalmars-d-bugs puremagic.com I tried your little program and played around. I tried x.rehash, but this resulted in even more memory consumption. 80 bytes for 4 bytes key and one bit storage seams a lot. With smaller n it comes down to app. 60 Bytes. With how many data sets at the end, you will have to deal?
Mar 10 2016
On Thursday, 10 March 2016 at 10:15:10 UTC, Martin Tschierschke wrote:With how many data sets at the end, you will have to deal?I need something close to the memory overhead of a sorted array of values. That is why I'm considering converting SparseHash's sparse_hash_set to D.
Mar 10 2016
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:The consumption with Google's sparsehash unordered_set is about 50 MiB.See also: http://smerity.com/articles/2015/google_sparsehash.html
Mar 10 2016
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:My knowledge database engine I'm building cannot afford the memory overhead of D's builtin associative arrays.Sounds like a cool project! You could also look into using a trie.
Mar 10 2016
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:consumes 842.m MiB on my Ubuntu.Here's mine: https://bitbucket.org/infognition/robinhood/src (you just need one file rbhash.d to use in your apps) The following test takes ~130 MB and can take less with some tweaks in the settings: import std.stdio, rbhash; alias Set(T) = RHHash!(T, void); void main() { auto set = new Set!uint; foreach(i; 0..10_000_000) set.add(i); writeln(set.contains(444)); readln; } It's a GC-free Robin Hood hash table implementation with special treatment for void as value type, i.e. sets.
Mar 10 2016
On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:Has anybody put together a memory-efficient D-implementation of a HashSet Something like sparse_hash_set<> contained in https://github.com/sparsehash/sparsehash but in D.There is an implementation in: code.dlang.org/packages/emsi_containers But to be honest I got stuck trying to use it (copy constructor disabled), so I used this very minimal wrapper around associative array: private struct HashSet(E) { // TODO switch to proper implementation (not using AA) bool put( E el ) { if ( el in set ) return false; set[el] = set.length; return true; } size_t[E] set; } (I only needed put, since I used it to check whether we already encountered a value before in a lazy/non sorted implementation of uniq)
Mar 08 2016
On 3/8/16 3:12 AM, Nordlöw wrote:Has anybody put together a memory-efficient D-implementation of a HashSet Something like sparse_hash_set<> contained in https://github.com/sparsehash/sparsehash but in D.D needs a way to use allocators for hash nodes. In Dcollections, both the performance and memory usage I was able to optimize by using a specialized allocator that allocates in pages and then divides up the pages. -Steve
Mar 10 2016