digitalmars.D.learn - Memory Efficient HashSet

=?UTF-8?B?Tm9yZGzDtnc=?= (6/6) Mar 08 2016 Has anybody put together a memory-efficient D-implementation of a

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/5) Mar 08 2016 It appears to be very slow?

=?UTF-8?B?Tm9yZGzDtnc=?= (23/29) Mar 09 2016 My knowledge database engine I'm building cannot afford the

Martin Tschierschke (9/20) Mar 10 2016 May be the discussion associated with this post, can give help???

=?UTF-8?B?Tm9yZGzDtnc=?= (5/6) Mar 10 2016 I need something close to the memory overhead of a sorted array

=?UTF-8?B?Tm9yZGzDtnc=?= (2/4) Mar 10 2016 See also: http://smerity.com/articles/2015/google_sparsehash.html
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/4) Mar 10 2016 Sounds like a cool project! You could also look into using a trie.
thedeemon (17/18) Mar 10 2016 Here's mine:

Edwin van Leeuwen (20/26) Mar 08 2016 There is an implementation in:
Steven Schveighoffer (6/11) Mar 10 2016 D needs a way to use allocators for hash nodes.

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

Has anybody put together a memory-efficient D-implementation of a 
HashSet

Something like

sparse_hash_set<> contained in

https://github.com/sparsehash/sparsehash

but in D.

Mar 08 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

It appears to be very slow?
What do you need it for?

Mar 08 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Tuesday, 8 March 2016 at 12:25:36 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

 It appears to be very slow?
 What do you need it for?

My knowledge database engine I'm building cannot afford the 
memory overhead of D's builtin associative arrays.

For instance the following program

void main(string[] args)
{
     alias T = uint;
     bool[T] x; // emulates a HashSet with significant memory 
overhead
     import std.range : iota;
     const n = 10_000_000;
     foreach (const i; iota(0, n))
     {
         x[i] = true; // indicate that it's stored
     }

     import std.stdio : writeln, readln;
     writeln("Press return: ");
     readln;
}

consumes 842.m MiB on my Ubuntu.

The consumption with Google's sparsehash unordered_set is about 
50 MiB.

Mar 09 2016

Martin Tschierschke <mt smartdolphin.de> writes:

On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
[...]
     foreach (const i; iota(0, n))
     {
         x[i] = true; // indicate that it's stored
     }

     import std.stdio : writeln, readln;
     writeln("Press return: ");
     readln;
 }

 consumes 842.m MiB on my Ubuntu.

 The consumption with Google's sparsehash unordered_set is about 
 50 MiB.

May be the discussion associated with this post, can give help???
https://forum.dlang.org/post/mailman.217.1457590242.26339.digitalmars-d-bugs puremagic.com


I tried your little program and played around. I tried x.rehash, 
but this resulted in even more memory consumption. 80 bytes for 4 
bytes key and one bit storage seams a lot.
With smaller n it comes down to app. 60 Bytes.

With how many data sets at the end, you will have to deal?

Mar 10 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Thursday, 10 March 2016 at 10:15:10 UTC, Martin Tschierschke 
wrote:
 With how many data sets at the end, you will have to deal?

I need something close to the memory overhead of a sorted array 
of values. That is why I'm considering converting SparseHash's 
sparse_hash_set to D.

Mar 10 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 The consumption with Google's sparsehash unordered_set is about 
 50 MiB.

See also: http://smerity.com/articles/2015/google_sparsehash.html

Mar 10 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 My knowledge database engine I'm building cannot afford the 
 memory overhead of D's builtin associative arrays.

Sounds like a cool project! You could also look into using a trie.

Mar 10 2016

thedeemon <dlang thedeemon.com> writes:

On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 consumes 842.m MiB on my Ubuntu.

Here's mine:
https://bitbucket.org/infognition/robinhood/src
(you just need one file rbhash.d to use in your apps)

The following test takes ~130 MB and can take less with some 
tweaks in the settings:

import std.stdio, rbhash;

alias Set(T) = RHHash!(T, void);

void main() {
     auto set = new Set!uint;
     foreach(i; 0..10_000_000)
         set.add(i);

     writeln(set.contains(444));
     readln;
}

It's a GC-free Robin Hood hash table implementation with special 
treatment for void as value type, i.e. sets.

Mar 10 2016

Edwin van Leeuwen <edder tkwsping.nl> writes:

On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 Has anybody put together a memory-efficient D-implementation of 
 a HashSet

 Something like

 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

 but in D.

There is an implementation in:
code.dlang.org/packages/emsi_containers

But to be honest I got stuck trying to use it (copy constructor 
disabled), so I used this very minimal wrapper around associative 
array:

private struct HashSet(E) {
     // TODO switch to proper implementation (not using AA)
     bool put( E el )
     {
         if ( el in set )
             return false;
         set[el] = set.length;
         return true;
     }
     size_t[E] set;
}

(I only needed put, since I used it to check whether we already 
encountered a value before in a lazy/non sorted implementation of 
uniq)

Mar 08 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/8/16 3:12 AM, Nordlöw wrote:
 Has anybody put together a memory-efficient D-implementation of a HashSet

 Something like

 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

 but in D.

D needs a way to use allocators for hash nodes.

In Dcollections, both the performance and memory usage I was able to 
optimize by using a specialized allocator that allocates in pages and 
then divides up the pages.

-Steve

Mar 10 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Memory Efficient HashSet