www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Question about AA's prime_list.

In internal/aaA.d there's a list of primes, used as the lengths of the 
hash-table:

static uint[] prime_list = [
     97UL,         389UL,
     1543UL,       6151UL,
     24593UL,      98317UL,
     393241UL,     1572869UL,
     6291469UL,    25165843UL,
     100663319UL,  402653189UL,
     1610612741UL, 4294967291UL
];

The hash-table is resized when the number of nodes is larger than 4 
times the table's length:

[279]	if (nodes > aa.a.b.length * 4)

The problem, however, is that the primes in the prime_list are too close 
together (a/b < 4.0) and many get skipped.

hash[97] can contain 388 items (1)
hash[389] can contain 1556 items (-13)
hash[1543] can contain 6172 items (-21)
hash[6151] can contain 24604 items (-11)
hash[24593] can contain 98372 items (-55)
hash[98317] can contain 393268 items (-27)
hash[393241] can contain 1572964 items (-95)
hash[1572869] can contain 6291476 items (-7)
hash[6291469] can contain 25165876 items (-33)
hash[25165843] can contain 100663372 items (-53)
hash[100663319] can contain 402653276 items (-87)
hash[402653189] can contain 1610612756 items (-15)

A table with length 97 can contain 388 items. When one item is added, 
the table will be resized to 389 (OK). A table with length 389 can 
contain 1556 items, but when one's added, it will not be resized to 1543 
(the next prime in the list) but to 6172... The hash table will therefor 
be resized like this: 97 389 6151 98317 1572869 25165843 402653189...

Is this intentional? It seems to be good for performance (less hash 
collisions) but it's bad for memory usage.

A list of primes that don't have the aforementioned problem:

static uint[] prime_list = [
     97UL,         389UL,
     1559UL,       6241UL,
     24967UL,      99871UL,
     399491UL,     1597969UL,
     6391877UL,    25567523UL,
     102270127UL,  409080533UL,
     1636322141UL, 4294967291UL
];


L.
Oct 03 2006