digitalmars.D - Ternary Search Trees

bearophile (6/6) Apr 13 2009 Does someone has some need for Ternary Search Trees into Phobos (for D1....

Robert Fraser (6/14) Apr 13 2009 I'd love to see them! Got a Tango version? ;-P
Robert Fraser (4/12) Apr 15 2009 Hey, could you please post your implementation (assuming it's

Andrei Alexandrescu (3/21) Apr 15 2009 BTW, anyone got a KD-tree implementation? I could use one.

bearophile (4/5) Apr 15 2009 I think at the moment I have none, but a 2D/3D one deserves to be in the...
Daniel Keep (6/29) Apr 16 2009 A random idea occurs: it might be interesting to put up a page somewhere

bearophile (35/38) Apr 16 2009 It's just a translation to D of this Py code:

Fawzi Mohamed (8/22) Apr 16 2009 struct TST(T){ ... }
Jason House (2/55) Apr 16 2009 Why use a struct for TST? it requires you to use pointers everywhere and...

Robert Fraser (4/60) Apr 16 2009 I assume it saves some space (classes have a monitor and vtbl pointer
bearophile (129/131) Apr 17 2009 Thank you, now I understand. I am slowly learning more.

bearophile (31/31) Apr 17 2009 To shorten the code further I have replaced the Block struct with a type...

bearophile (39/39) Apr 17 2009 OK, now the code works. Besude the TST struct I have created this helper...

Robert Fraser (3/56) Apr 17 2009 Cool, thanks for posting this... Why is the default block size so

bearophile (6/7) Apr 17 2009 If the block is too much small you waste too much time allocating small ...

Robert Fraser (2/55) Apr 16 2009 Awesome; thanks

bearophile (66/66) Apr 18 2009 This is the final version I am added to the dlibs:
Robert Fraser (31/39) Apr 24 2009 I implemented a version of the TST you posted using Tango + D1... Here

Robert Fraser (16/67) Apr 24 2009 Results are similar on a much larger word list:
bearophile (10/19) Apr 24 2009 Nice.

Robert Fraser (11/26) Apr 24 2009 Attached, if you don't mind NO comments.

bearophile (24/26) Apr 24 2009 This is your code, I can see you have not used the union trick to reduce...

Robert Fraser (14/44) Apr 24 2009 How does the union trick work exactly?

bearophile <bearophileHUGS lycos.com> writes:

Does someone has some need for Ternary Search Trees into Phobos (for D1. And
eventually later for D2 too)?
TSTs allow to find keys, key prefixes, or even keys with holes. Keys are arrays
of T, where T is the template type.
They can be designed to store the keys alone, or as an associative data
structure.

With some benchmarks I have seen that a simple TST implementation is about as
fast as the built-in AAs of D (but much slower than Python dicts).

Bye,
bearophile

Apr 13 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for D1. And
eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys are
arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative data
structure.
 
 With some benchmarks I have seen that a simple TST implementation is about as
fast as the built-in AAs of D (but much slower than Python dicts).
 
 Bye,
 bearophile

I'd love to see them! Got a Tango version? ;-P

Also, D's AAs aren't a great benchmark, since they are outperformed by 
even a basic hashtable implementation (compare them to 
tango.util.collection.HashMap). But TSTs support many more operations, 
so the fact that they're slower than BSTs/Hashtables is to be expected.

Apr 13 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for D1. And
eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys are
arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative data
structure.
 
 With some benchmarks I have seen that a simple TST implementation is about as
fast as the built-in AAs of D (but much slower than Python dicts).
 
 Bye,
 bearophile

Hey, could you please post your implementation (assuming it's 
open-source?) I'd love to use them, but can't be bothered to implement 
it. Thanks!

Apr 15 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Fraser wrote:
 bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for 
 D1. And eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys 
 are arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative 
 data structure.

 With some benchmarks I have seen that a simple TST implementation is 
 about as fast as the built-in AAs of D (but much slower than Python 
 dicts).

 Bye,
 bearophile

 
 Hey, could you please post your implementation (assuming it's 
 open-source?) I'd love to use them, but can't be bothered to implement 
 it. Thanks!

BTW, anyone got a KD-tree implementation? I could use one.

Andrei

Apr 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 BTW, anyone got a KD-tree implementation? I could use one.

I think at the moment I have none, but a 2D/3D one deserves to be in the std
lib, because it allows you to reduce the complexity of lot of geometric-based
code.

Bye,
bearophile

Apr 15 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 Robert Fraser wrote:
 bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for
 D1. And eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys
 are arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative
 data structure.

 With some benchmarks I have seen that a simple TST implementation is
 about as fast as the built-in AAs of D (but much slower than Python
 dicts).

 Bye,
 bearophile

 Hey, could you please post your implementation (assuming it's
 open-source?) I'd love to use them, but can't be bothered to implement
 it. Thanks!

 
 BTW, anyone got a KD-tree implementation? I could use one.
 
 Andrei

A random idea occurs: it might be interesting to put up a page somewhere
of "things we'd like in the standard library."  Provide details on a
rough API, functionality and license.  Might get some people bored on a
weekend dropping in code for the standard library.  :)

  -- Daniel

Apr 16 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Fraser:
 Hey, could you please post your implementation (assuming it's 
 open-source?) I'd love to use them, but can't be bothered to implement 
 it. Thanks!

It's just a translation to D of this Py code:
Info:
http://jeethurao.com/blog/?p=146
Code:
http://bitbucket.org/woadwarrior/trie/

Where necessary you can also follow the ideas from the original article here
(but using an enum is MUCH better and it's also safer in C, to avoid aliasing):
http://www.ddj.com/windows/184410528

My implementation is about 57 times faster than the Python version and much
faster than the Psyco version too.

You can also take a look here:
http://sourceforge.net/projects/ternary/

You can implement it with a _single_ template struct, filled with all the
methods you want:

struct TST(T) {
    TST* left, right; 
    union {
      TST* mid;
      int index;
    }
    T item;

    // methods here
}

Note that inside there you don't need TST!(T)*, TST* is probably enough, but I
don't know why yet.

You can start with only the few basic methods, test them, and later when/if you
want you can add more methods, taking ideas from here:
http://abc.se/~re/code/tst/tst_docs/index.html

I'd like to not show my code because it's not nice, it has no comments, tests,
etc. But it's quite close to that Py code.

A problem with that template struct is that even if you add a opIn_r() method
as I have done, then you have to do (note the star):
if (!(k in *ds))

I don't know if in D2 for structs you can use the syntax you use with classes:
if (!(k in ds))

Even better syntax is:
if (k !in ds)

In Python you write something better still:
if k not in ds:

Bye,
bearophile

Apr 16 2009

Fawzi Mohamed <fmohamed mac.com> writes:

On 2009-04-16 18:19:36 +0200, bearophile <bearophileHUGS lycos.com> said:

 [...]
 struct TST(T) {
     TST* left, right;
     union {
       TST* mid;
       int index;
     }
     T item;
 
     // methods here
 }
 
 Note that inside there you don't need TST!(T)*, TST* is probably 
 enough, but I don't know why yet.

struct TST(T){ ... }

is syntactic sugar for

template TST(T){
  struct TST{ ...}
}

so TST in the structure refers by default to the struct itself.

Fawzi

Apr 16 2009

Jason House <jason.james.house gmail.com> writes:

bearophile Wrote:

 Robert Fraser:
 Hey, could you please post your implementation (assuming it's 
 open-source?) I'd love to use them, but can't be bothered to implement 
 it. Thanks!

 
 It's just a translation to D of this Py code:
 Info:
 http://jeethurao.com/blog/?p=146
 Code:
 http://bitbucket.org/woadwarrior/trie/
 
 Where necessary you can also follow the ideas from the original article here
(but using an enum is MUCH better and it's also safer in C, to avoid aliasing):
 http://www.ddj.com/windows/184410528
 
 My implementation is about 57 times faster than the Python version and much
faster than the Psyco version too.
 
 You can also take a look here:
 http://sourceforge.net/projects/ternary/
 
 You can implement it with a _single_ template struct, filled with all the
methods you want:
 
 struct TST(T) {
     TST* left, right; 
     union {
       TST* mid;
       int index;
     }
     T item;
 
     // methods here
 }
 
 Note that inside there you don't need TST!(T)*, TST* is probably enough, but I
don't know why yet.
 
 You can start with only the few basic methods, test them, and later when/if
you want you can add more methods, taking ideas from here:
 http://abc.se/~re/code/tst/tst_docs/index.html
 
 I'd like to not show my code because it's not nice, it has no comments, tests,
etc. But it's quite close to that Py code.
 
 A problem with that template struct is that even if you add a opIn_r() method
as I have done, then you have to do (note the star):
 if (!(k in *ds))
 
 I don't know if in D2 for structs you can use the syntax you use with classes:
 if (!(k in ds))
 
 Even better syntax is:
 if (k !in ds)
 
 In Python you write something better still:
 if k not in ds:
 
 Bye,
 bearophile

Why use a struct for TST? it requires you to use pointers everywhere and
passing by value hardly makes any sense. A class (optionally with final methods
for speed) would work a lot better. It even fixes your issue with opIn

Apr 16 2009

Robert Fraser <fraserofthenight gmail.com> writes:

Jason House wrote:
 bearophile Wrote:
 
 Robert Fraser:
 Hey, could you please post your implementation (assuming it's 
 open-source?) I'd love to use them, but can't be bothered to implement 
 it. Thanks!

 It's just a translation to D of this Py code:
 Info:
 http://jeethurao.com/blog/?p=146
 Code:
 http://bitbucket.org/woadwarrior/trie/

 Where necessary you can also follow the ideas from the original article here
(but using an enum is MUCH better and it's also safer in C, to avoid aliasing):
 http://www.ddj.com/windows/184410528

 My implementation is about 57 times faster than the Python version and much
faster than the Psyco version too.

 You can also take a look here:
 http://sourceforge.net/projects/ternary/

 You can implement it with a _single_ template struct, filled with all the
methods you want:

 struct TST(T) {
     TST* left, right; 
     union {
       TST* mid;
       int index;
     }
     T item;

     // methods here
 }

 Note that inside there you don't need TST!(T)*, TST* is probably enough, but I
don't know why yet.

 You can start with only the few basic methods, test them, and later when/if
you want you can add more methods, taking ideas from here:
 http://abc.se/~re/code/tst/tst_docs/index.html

 I'd like to not show my code because it's not nice, it has no comments, tests,
etc. But it's quite close to that Py code.

 A problem with that template struct is that even if you add a opIn_r() method
as I have done, then you have to do (note the star):
 if (!(k in *ds))

 I don't know if in D2 for structs you can use the syntax you use with classes:
 if (!(k in ds))

 Even better syntax is:
 if (k !in ds)

 In Python you write something better still:
 if k not in ds:

 Bye,
 bearophile

 
 Why use a struct for TST? it requires you to use pointers everywhere and
passing by value hardly makes any sense. A class (optionally with final methods
for speed) would work a lot better. It even fixes your issue with opIn

I assume it saves some space (classes have a monitor and vtbl pointer 
which are un-needed here). For maximum efficiency, rather than mallocing 
individual nodes, you'd want to be using some sort of array.

Apr 16 2009

bearophile <bearophileHUGS lycos.com> writes:

Fawzi:
so TST in the structure refers by default to the struct itself.<

Thank you, now I understand. I am slowly learning more.

-------------------------

Jason House:

 Why use a struct for TST?<

To give you a good answer I have done some tests using a file
"english_words.txt", of about 2.1 MB that contains about 220_000 different
words.

The following tiny program loads that file and splits it.
It requires 6.2 MB of RAM and about 0.09 seconds to run (once the file cache is
warm. This thanks to the last patch to the GC that speeds this up a lot):

import std.stdio: writefln;
import std.string: split;
import std.file: read;
void main() {
    auto words = split(cast(string)read("english_words.txt"));
}

--------

The following program tests the TST (positive lookups only, no negative lookups
yet), this is for struct-based TST!(char). For the TST-class-based version ,
you can just replace *root with root:

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    auto root = new TST!(char);
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in *root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

The struct-based needs 3.95 seconds and 24.35 MB RAM to run.
The class-based (final methods) needs 4.03 seconds and 24.56 MB RAM (DMD v1.042
on WinXP, 2 GHz CPU).

Both versions allocate 461_495 TST nodes. [Each TST struct node needs 16 bytes.
Each TST object probably 24 bytes that I think the GC allocates as 32. But the
actual memory used denies this twofold difference, I don't know why.]

I guess the speed difference comes mostly from the higher memory overhead of
classes (because the more memory there is, the more the CPU caches have to
work).

--------

With a tiny change, adding "scope", so just the root of the tree gets allocated
on the stack seems to reduce the running time of the class-based version from
4.03 to 3.99 seconds.

I don't know why moving the root to the stack gives so much difference. It's
not a small difference, because on a 2 GHz CPU ~0.04 seconds mean something
like 80 million clock ticks. The root reference is traversed each time, so the
removal of just this level of indirection may explain the time difference. A
mammalian brain like mine isn't well equipped to understand something that
performs billions of operations every second. I have taken a look at the ASM
generated by the class-based with and without scope, but I have not understood
the asm well enough:

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    scope auto root = new TST!(char); // scope ****
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

--------

This, with a struct-based TST, needs 3.96 seconds (0.48 s just building the
tree):

void main() {
    auto words = split(cast(string)read("english_words.txt"));
    TST!(char) root;
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

--------

The following with a struct-based TST, needs 3.71 s, 13.68 MB (0.32 s just
building, 0.31 s with no GC) (0.26 s just building, allocating memory with
calloc, total 3.52 s, 13.37 MB):

TST[] nodes;
int nnode;

void main() {
    nodes = new TST!(char)[461_495];
    auto words = split(cast(string)read("english_words.txt"));
    auto root = nodes[nnode++];
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}


Idem, but struct fields aligned to 1 byte: 3.89 s, 12.72 MB.  (0.48 s just
building), (12.46 MB allocating memory with calloc)

--------

Without align, using std.gc.malloc + hasNoPointers + std.c.memset: 3.53 s total.

void main() {
    disable();
    TST.nodes = cast(TST*)gcmalloc(461_495 * TST.sizeof);
    hasNoPointers(TST.nodes);
    memset(TST.nodes, 0, 461_495 * TST.sizeof);
    auto words = split(cast(string)read("english_words.txt"));
    auto root = TST.nodes[TST.nnode++];
    root.bulk_add(words);
    for (int i; i < 100; i++)
        foreach (word; words)
            if (!(word in root))
                throw new Exception("\"" ~ word ~ "\" not found");
}

Of course you can add static methods to the struct to create the array, create
a new node moving an index forward, and freeing the whole array, plus static
fields of the array and #nodes used so far.

--------

Once tree nodes are items of an array, you can use indexes to address node
children, so if memory gets very tight you can even use 24-bit integers as
indexes:
http://www.fantascienza.net/leonardo/so/dlibs/uint24.html

They are very slow, so in most situations you want to use ints:

struct TST(T, TyIndex=int) {...

Then this uses less memory, as low as 11 bytes/node, but TST must also be
aligned to 1 byte:

TST!(char, Uint24) root; // 

--------

The last examples work only if you know the number of nodes at compile time. 
If you need a more dynamic data structure, the code gets more tricky/hairy.

Probably the best thing you can do is to keep an array of blocks of nodes, each
block not too much big (es 64 KB), and allocate such blocks on-demand.
Something like (code untested, it surely contains bugs. This comes after
several rounds of code simplification):

struct TST(T) {
    // *** TST instance attributes here ***

    struct Block {
        // Block is not too much big nor too much small
        const int SIZE = ((1 << 16) / TST.sizeof) > 1 ? ((1 << 16) /
TST.sizeof) : 1;
        TST[TST.Block.SIZE] nodes;
    }

    // usedLastBlock keeps the number of nodes used from the last block of
blocks
    // there's no need to keep the whole number of blocks, because its usage is
    // slower and all the blocks but the last one are full anyway.
    static int usedLastBlock;

    static Block*[] blocks;

    static TST* newNode() {
        if (!TST.blocks.length || TST.usedLastBlock >= TST.Block.SIZE) {
            // add new block
            TST.blocks.length = blocks.length + 1;
            // this whole new block gets scanned by the GC even if you need
only 1 node
            TST.blocks[$-1] = new Block;
            TST.usedLastBlock = 0;
        }

        return &(TST.blocks[$-1].nodes[TST.usedLastBlock++]);
    }

    /// deallocate blocks of nodes
    static void free() {
        foreach (block_ptr; TST.blocks)
            delete block_ptr;
        TST.blocks[] = null;
        TST.blocks.length = 0;
    }

    // *** all TST methods here ***
}

This last code doesn't work yet, I have so far failed to debug it, the compiler
returns:
struct test.TST!(char).TST no size yet for forward reference
I'll try to fix it when I have the time. If you know how to fix it, or what the
problem is, please tell me.

Bye,
bearophile

Apr 17 2009

bearophile <bearophileHUGS lycos.com> writes:

To shorten the code further I have replaced the Block struct with a typedefined
fixed-size array:

struct TST(T) {
    // *** TST instance attributes here ***

    const int BLOCKSIZE = ((1 << 16) / TST.sizeof) > 1 ? ((1 << 16) /
TST.sizeof) : 1;

    // usedLastBlock keeps the number of nodes used from the last block of
blocks
    // there's no need to keep the whole number of blocks, because its usage is
    // slower and all the blocks but the last one are full anyway.
    static int usedLastBlock;

    typedef TST[BLOCKSIZE] Block;
    static Block*[] blocks;

    static TST* newNode() {
        if (!TST.blocks.length || TST.usedLastBlock >= Block.length) {
            // add new block
            TST.blocks.length = blocks.length + 1;
            // this whole new block gets scanned by the GC even if you need
only 1 node
            TST.blocks[$-1] = new Block; // this doesn't work, you can't new a
static array
            TST.usedLastBlock = 0;
        }

        return &(*(TST.blocks[$-1])[TST.usedLastBlock++]);
    }

    /// deallocate blocks of nodes
    static void free() {
        foreach (block_ptr; TST.blocks)
            delete block_ptr;
        TST.blocks.length = 0;
    }


But beside the same bug as before:
struct test.TST!(char).TST no size yet for forward reference
it's even more buggy, because now you can't even new a Block. Oh, well.

Bye,
bearophile

Apr 17 2009

bearophile <bearophileHUGS lycos.com> writes:

OK, now the code works. Besude the TST struct I have created this helper struct
that keeps a pool of tree nodes:

struct StructPool(TyStruct, int MAX_BLOCK_SIZE_BYTES=1 << 14) {
    static assert(MAX_BLOCK_SIZE_BYTES >= 1);

    struct Block {
        // Block is not too much big nor too much small
        const int MAXB = MAX_BLOCK_SIZE_BYTES / TyStruct.sizeof;
        const int SIZE = MAXB >= 1 ? MAXB : 1;
        TyStruct[StructPool.Block.SIZE] structs;
    }

    static TyStruct* next_free_struct_ptr, last_struct_ptr;
    static Block*[] blocks;

    static TyStruct* newStruct() {
        if (StructPool.next_free_struct_ptr >= StructPool.last_struct_ptr) {
            // then add new block

            // this whole new block gets scanned by the GC even if you need
only 1 node
            StructPool.blocks.length = StructPool.blocks.length + 1;
            StructPool.blocks[$-1] = new Block;

            StructPool.next_free_struct_ptr =
StructPool.blocks[$-1].structs.ptr;
            StructPool.last_struct_ptr = StructPool.next_free_struct_ptr +
Block.SIZE;

        }

        return StructPool.next_free_struct_ptr++;
    }

    /// deallocate blocks of structs
    static void free() {
        foreach (block_ptr; StructPool.blocks)
            delete block_ptr;
        StructPool.blocks[] = null;
        StructPool.blocks.length = 0;
        StructPool.next_free_struct_ptr = null;
        StructPool.last_struct_ptr = null;
    }
}

alias StructPool!(TST!(char)) TSTpool;

(I have used to static pointers, one single index to the last filled up struct
is also enough.)
That newStruct() is fast, about as class allocations in Java :-)

It's not nice code yet, with that external alias and so on, but it seems to
work and it almost halves the total allocation time of a big ternary search
tree (but despite the improve cache coherence the positive lookups into such
TST are about as fast as in a normally allocated TST, maybe because the D GC
packs the nodes closely in memory anyway). So I think in certain situations it
can be useful.

Now I'm trying to make the code nicer and less bug-prone, for example turning
that StructPool struct into a template mixin that I can use to add such pool
capability to other structs too that need to allocate many small structs and to
deallocate them all at the same time. But so far I have had the same kinds
errors I have shown in the two past posts.

Bye,
bearophile

Apr 17 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 OK, now the code works. Besude the TST struct I have created this helper
struct that keeps a pool of tree nodes:
 
 struct StructPool(TyStruct, int MAX_BLOCK_SIZE_BYTES=1 << 14) {
     static assert(MAX_BLOCK_SIZE_BYTES >= 1);
 
     struct Block {
         // Block is not too much big nor too much small
         const int MAXB = MAX_BLOCK_SIZE_BYTES / TyStruct.sizeof;
         const int SIZE = MAXB >= 1 ? MAXB : 1;
         TyStruct[StructPool.Block.SIZE] structs;
     }
 
     static TyStruct* next_free_struct_ptr, last_struct_ptr;
     static Block*[] blocks;
 
     static TyStruct* newStruct() {
         if (StructPool.next_free_struct_ptr >= StructPool.last_struct_ptr) {
             // then add new block
 
             // this whole new block gets scanned by the GC even if you need
only 1 node
             StructPool.blocks.length = StructPool.blocks.length + 1;
             StructPool.blocks[$-1] = new Block;
 
             StructPool.next_free_struct_ptr =
StructPool.blocks[$-1].structs.ptr;
             StructPool.last_struct_ptr = StructPool.next_free_struct_ptr +
Block.SIZE;
 
         }
 
         return StructPool.next_free_struct_ptr++;
     }
 
     /// deallocate blocks of structs
     static void free() {
         foreach (block_ptr; StructPool.blocks)
             delete block_ptr;
         StructPool.blocks[] = null;
         StructPool.blocks.length = 0;
         StructPool.next_free_struct_ptr = null;
         StructPool.last_struct_ptr = null;
     }
 }
 
 alias StructPool!(TST!(char)) TSTpool;
 
 (I have used to static pointers, one single index to the last filled up struct
is also enough.)
 That newStruct() is fast, about as class allocations in Java :-)
 
 It's not nice code yet, with that external alias and so on, but it seems to
work and it almost halves the total allocation time of a big ternary search
tree (but despite the improve cache coherence the positive lookups into such
TST are about as fast as in a normally allocated TST, maybe because the D GC
packs the nodes closely in memory anyway). So I think in certain situations it
can be useful.
 
 Now I'm trying to make the code nicer and less bug-prone, for example turning
that StructPool struct into a template mixin that I can use to add such pool
capability to other structs too that need to allocate many small structs and to
deallocate them all at the same time. But so far I have had the same kinds
errors I have shown in the two past posts.
 
 Bye,
 bearophile

Cool, thanks for posting this... Why is the default block size so 
freaking huge (16kb)?

Apr 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Fraser:
 Why is the default block size so freaking huge (16kb)?<

If the block is too much small you waste too much time allocating small blocks
of memory (about as much time as you waste allocating the single tree nodes).
If the block is too much big you waste empty nodes in the last block, and over
a certain size you don't gain much anyway. And too much big chunks don't fit in
the data part of the L1 cache anyway (it's 32 KB on many CPUs).
From experiments I have seen that usually having a block bigger than 64 KB lets
you gain very little. And having blocks less than 4-8 KB is a waste of time.
16-32 KB is the faster and on a 2 GB RAM PC it doesn't waste much RAM. If you
want to save some RAM you can reduce that value at compile time.

Bye,
bearophile

Apr 17 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 Robert Fraser:
 Hey, could you please post your implementation (assuming it's 
 open-source?) I'd love to use them, but can't be bothered to implement 
 it. Thanks!

 
 It's just a translation to D of this Py code:
 Info:
 http://jeethurao.com/blog/?p=146
 Code:
 http://bitbucket.org/woadwarrior/trie/
 
 Where necessary you can also follow the ideas from the original article here
(but using an enum is MUCH better and it's also safer in C, to avoid aliasing):
 http://www.ddj.com/windows/184410528
 
 My implementation is about 57 times faster than the Python version and much
faster than the Psyco version too.
 
 You can also take a look here:
 http://sourceforge.net/projects/ternary/
 
 You can implement it with a _single_ template struct, filled with all the
methods you want:
 
 struct TST(T) {
     TST* left, right; 
     union {
       TST* mid;
       int index;
     }
     T item;
 
     // methods here
 }
 
 Note that inside there you don't need TST!(T)*, TST* is probably enough, but I
don't know why yet.
 
 You can start with only the few basic methods, test them, and later when/if
you want you can add more methods, taking ideas from here:
 http://abc.se/~re/code/tst/tst_docs/index.html
 
 I'd like to not show my code because it's not nice, it has no comments, tests,
etc. But it's quite close to that Py code.
 
 A problem with that template struct is that even if you add a opIn_r() method
as I have done, then you have to do (note the star):
 if (!(k in *ds))
 
 I don't know if in D2 for structs you can use the syntax you use with classes:
 if (!(k in ds))
 
 Even better syntax is:
 if (k !in ds)
 
 In Python you write something better still:
 if k not in ds:
 
 Bye,
 bearophile

Awesome; thanks

Apr 16 2009

bearophile <bearophileHUGS lycos.com> writes:

This is the final version I am added to the dlibs:

/********************************************
To create a memory pool of a native item, like structs. Not thread-safe.
It allows to create new items (it performs block-allocations for speed), and
then
free them all at once (and then you can create some more).
*/
struct MemoryPool(T, int MAX_BLOCK_BYTES=1 << 14) {
    static assert(!is(T == class), "MemoryPool is designed for native data, not
classes.");
    static assert(MAX_BLOCK_BYTES >= 1,
                  "MemoryPool: MAX_BLOCK_BYTES must be >= 1 bytes.");

    struct Block {
        static assert(MAX_BLOCK_BYTES >= T.sizeof,
                      "MemoryPool: MAX_BLOCK_BYTES must be bigger than a T.");
        static if ((T.sizeof * 5) > MAX_BLOCK_BYTES)
            pragma(msg, "MemoryPool: Block is very small, so probably a
MemoryPool isn't useful.");

        T[(MAX_BLOCK_BYTES / T.sizeof)] items;
    }

    static Block*[] blocks;
    static T* nextFree, lastFree; // a single int that keeps the number of
items used
                                  // in the last block is enough, but it may
lead to a
                                  // bit slower code.

    /// Return pointer to a new item. Allocates memory if necessary.
    static T* newItem() {
        if (nextFree >= lastFree) { // then add new block.
            // this whole new block gets scanned by the GC,
            // even if you need only 1 item
            blocks.length = blocks.length + 1;
            blocks[$-1] = new Block;

            nextFree = blocks[$-1].items.ptr;
            lastFree = nextFree + Block.items.length;
        }

        return nextFree++;
    }

    /// Deallocate all items at once. You can then create new ones.
    static void freeAll() {
        foreach (block_ptr; blocks)
            delete block_ptr;
        blocks[] = null;
        blocks.length = 0;
        nextFree = null;
        lastFree = null;
    }

    // This is like free() but doesn't deallocate memory, just cleans it up so
    //   when you create new items there's no new memory allocation, and it's
    //   faster. But to implement this other static members seem necessary.
    //static void cleanMemory() {
    //}
} // end of MemoryPool!()

unittest { // tests of MemoryPool!()
    struct BinaryTreeNode {
        BinaryTreeNode* left, right;
        int data;

        string toString() {
            string nrepr(BinaryTreeNode* n) { return n is null ? "nil" :
str(*n); }
            return "<" ~ str(data) ~ ", " ~ nrepr(left)  ~ ", " ~ nrepr(right) 
~ ">";
        }
    }

    MemoryPool!(BinaryTreeNode) BinaryTreeNodePool;

    auto t = BinaryTreeNodePool.newItem();
    t.data = 100;
    t.left = BinaryTreeNodePool.newItem();
    t.left.data = 7;
    t.right = BinaryTreeNodePool.newItem();
    t.right.data = 150;

    assert(str(*t) == "<100, <7, nil, nil>, <150, nil, nil>>");
} // end tests of MemoryPool!()

Apr 18 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for D1. And
eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys are
arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative data
structure.
 
 With some benchmarks I have seen that a simple TST implementation is about as
fast as the built-in AAs of D (but much slower than Python dicts).
 
 Bye,
 bearophile

I implemented a version of the TST you posted using Tango + D1... Here 
are my results:

Test                Part                Mean      Median    Max
----                ----                ----      ------    ---
TST                 Insertion           0.0428    0.0338    0.0886
TST                 Iteration           0.0022    0.0022    0.0024
TST                 Lookup              0.0225    0.0223    0.0237
HashMap             Insertion           0.0621    0.0421    0.2205
HashMap             Iteration           0.0035    0.0034    0.0036
HashMap             Lookup              0.0169    0.0168    0.0184
TreeMap             Insertion           0.1045    0.1041    0.1058
TreeMap             Iteration           0.0041    0.0041    0.0044
TreeMap             Lookup              0.0895    0.0892    0.0917
AssocArray          Insertion           0.0262    0.0262    0.0268
AssocArray          Iteration           0.0015    0.0015    0.0016
AssocArray          Lookup              0.0130    0.0129    0.0132

(TreeMap and HashMap are in tango,util.container, AssocArray is the 
built-in D associative array; testing code is attached. Compiled with -O 
-release -inline using DSSS 0.78 and DMD 1.043 on Windows XP x86 SP3).

Indeed, TSTs seem to beat Red-Black BSTs (the implementation Tango's 
TreeMap uses) quite handily, and have faster insert times than hash 
maps, though slower lookup. The big winner here, though, appears to be 
D's built-in associative arrays. I thought they were supposed to be very 
slow, but Tango's implementation, at least, looks pretty good (without 
any re-hashing).

For order-sensitive collections, I'm definitely using TSTs; I'm sold on 
the concept. Not only are they faster, allowing prefix search could be 
very useful. However, the times here are very low (the word file is only 
1.1MB, and my system is fairly fast, though this is memory-dependent not 
CPU-dependent)... I'll try with a larger word list & see what results I get.

Apr 24 2009

Robert Fraser <fraserofthenight gmail.com> writes:

Robert Fraser wrote:
 bearophile wrote:
 Does someone has some need for Ternary Search Trees into Phobos (for 
 D1. And eventually later for D2 too)?
 TSTs allow to find keys, key prefixes, or even keys with holes. Keys 
 are arrays of T, where T is the template type.
 They can be designed to store the keys alone, or as an associative 
 data structure.

 With some benchmarks I have seen that a simple TST implementation is 
 about as fast as the built-in AAs of D (but much slower than Python 
 dicts).

 Bye,
 bearophile

 
 I implemented a version of the TST you posted using Tango + D1... Here 
 are my results:
 
 Test                Part                Mean      Median    Max
 ----                ----                ----      ------    ---
 TST                 Insertion           0.0428    0.0338    0.0886
 TST                 Iteration           0.0022    0.0022    0.0024
 TST                 Lookup              0.0225    0.0223    0.0237
 HashMap             Insertion           0.0621    0.0421    0.2205
 HashMap             Iteration           0.0035    0.0034    0.0036
 HashMap             Lookup              0.0169    0.0168    0.0184
 TreeMap             Insertion           0.1045    0.1041    0.1058
 TreeMap             Iteration           0.0041    0.0041    0.0044
 TreeMap             Lookup              0.0895    0.0892    0.0917
 AssocArray          Insertion           0.0262    0.0262    0.0268
 AssocArray          Iteration           0.0015    0.0015    0.0016
 AssocArray          Lookup              0.0130    0.0129    0.0132
 
 (TreeMap and HashMap are in tango,util.container, AssocArray is the 
 built-in D associative array; testing code is attached. Compiled with -O 
 -release -inline using DSSS 0.78 and DMD 1.043 on Windows XP x86 SP3).
 
 Indeed, TSTs seem to beat Red-Black BSTs (the implementation Tango's 
 TreeMap uses) quite handily, and have faster insert times than hash 
 maps, though slower lookup. The big winner here, though, appears to be 
 D's built-in associative arrays. I thought they were supposed to be very 
 slow, but Tango's implementation, at least, looks pretty good (without 
 any re-hashing).
 
 For order-sensitive collections, I'm definitely using TSTs; I'm sold on 
 the concept. Not only are they faster, allowing prefix search could be 
 very useful. However, the times here are very low (the word file is only 
 1.1MB, and my system is fairly fast, though this is memory-dependent not 
 CPU-dependent)... I'll try with a larger word list & see what results I 
 get.
 

Results are similar on a much larger word list:
Test                Part                Mean      Median    Max
----                ----                ----      ------    ---
TST                 Insertion           0.1906    0.1630    0.2495
TST                 Iteration           0.0032    0.0031    0.0035
TST                 Lookup              0.1650    0.1651    0.1662
HashMap             Insertion           0.1637    0.1504    0.2679
HashMap             Iteration           0.0029    0.0028    0.0030
HashMap             Lookup              0.1212    0.1210    0.1241
TreeMap             Insertion           0.6133    0.6120    0.6276
TreeMap             Iteration           0.0035    0.0035    0.0035
TreeMap             Lookup              0.6310    0.6292    0.6446
AssocArray          Insertion           0.1131    0.1129    0.1154
AssocArray          Iteration           0.0014    0.0014    0.0016
AssocArray          Lookup              0.0939    0.0928    0.1045

Apr 24 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Fraser:
 I implemented a version of the TST you posted using Tango + D1... Here 
 are my results:

Nice.
Is your TST a map? (not a set).
Are you willing to show me your TST code? Your code is surely better than mine,
and there can be few little things that can be tuned.
And are you also willing to add it to Phobos? (I'll port it to Phobos1, it's
probably easy to do)?


 The big winner here, though, appears to be 
 D's built-in associative arrays. I thought they were supposed to be very 
 slow, but Tango's implementation, at least, looks pretty good (without 
 any re-hashing).

Try comparing the performance of built-in D AAs with Psyco+Python dicts, or a
good C++ STL hash set/hash map.
D AAs are slow probably also because Phobos is statically compiled, they aren't
a template.


  However, the times here are very low (the word file is only 
 1.1MB, and my system is fairly fast, though this is memory-dependent not 
 CPU-dependent)... I'll try with a larger word list & see what results I get.

With modern CPUs lot of tests become too much fast :-)

Bye,
bearophile

Apr 24 2009

Robert Fraser <fraserofthenight gmail.com> writes:

bearophile wrote:
 Robert Fraser:
 I implemented a version of the TST you posted using Tango + D1... Here 
 are my results:

 
 Nice.
 Is your TST a map? (not a set).

Yes, it's a map (for my use case).

 Are you willing to show me your TST code? Your code is surely better than
mine, and there can be few little things that can be tuned.
 And are you also willing to add it to Phobos? (I'll port it to Phobos1, it's
probably easy to do)?

Attached, if you don't mind NO comments.

I really want to port this: 
http://dasnar.sdf-eu.org/res/ctst-README.html ... The automatic 
balancing might make it a lot better if elements are inserted in order. 
It might be very easy by replacing the malloc/free calls with 
gc_malloc/gc_free calls.

 The big winner here, though, appears to be 
 D's built-in associative arrays. I thought they were supposed to be very 
 slow, but Tango's implementation, at least, looks pretty good (without 
 any re-hashing).

 
 Try comparing the performance of built-in D AAs with Psyco+Python dicts, or a
good C++ STL hash set/hash map.
 D AAs are slow probably also because Phobos is statically compiled, they
aren't a template.

I think Phobos1's AAs were pretty bad, but have you tried 
Tango's/druntimes? They're quite optimized (they're implemented as a 
hash with trees used for collisions).

Apr 24 2009

bearophile <bearophileHUGS lycos.com> writes:

Robert Fraser:
 Attached, if you don't mind NO comments.

This is your code, I can see you have not used the union trick to reduce the
memory used by each node that I suggested you:

	private static struct TreeNode
	{
		TreeNode* left;
		TreeNode* right;
	    TreeNode* mid;
		K split;
		K[] key;
		V value;
	}

And it seems all nodes contain a value, not just the leaves.

the compiler wasn't unrolling the tail call<

I think still DMD isn't able to unroll tail calls.

Regarding the memory.d, in my implementation I have used a:
T[(MAX_BLOCK_BYTES / T.sizeof)] items;

you have used:
private void[BLOCK_SIZE] data = void;
memset(blk.data.ptr, 0, BLOCK_SIZE);

Not setting that void array (of uint? ubyte?) to void the compiler sets it to
zero with a memset, so I don't understand what you have gained.

And allocating an array of nodes offers a GC a chance to know what's inside the
block of memory, so it's more compatible with better future GCs (it may be
better even now).
It also allows for safer initializations of the struct fields, for example on
some CPUs null can be != 0

Later I'll try to adapt your code to Phobos1.

Bye,
bearophile

Apr 24 2009

Robert Fraser <fraserofthenight gmail.com> writes:

Thanks for reading through it!

bearophile wrote:
 Robert Fraser:
 Attached, if you don't mind NO comments.

 
 This is your code, I can see you have not used the union trick to reduce the
memory used by each node that I suggested you:
 
 	private static struct TreeNode
 	{
 		TreeNode* left;
 		TreeNode* right;
 	    TreeNode* mid;
 		K split;
 		K[] key;
 		V value;
 	}

How does the union trick work exactly?

 And it seems all nodes contain a value, not just the leaves.

I'm assuming that V.sizeof == (void*).sizeof. There's actually 12 bytes 
that are unused on all but nodes containing values, since the key is 
also cached in there. This could eliminated by keeping track of the 
current stack in the opApply methods, but that slows down iteration.

Non-leaf nodes can also have values. For example, if you have "var" = 5 
and "var2" = 10, "r" will have a value and a mid.

 the compiler wasn't unrolling the tail call<

 
 I think still DMD isn't able to unroll tail calls.
 
 Regarding the memory.d, in my implementation I have used a:
 T[(MAX_BLOCK_BYTES / T.sizeof)] items;
 
 you have used:
 private void[BLOCK_SIZE] data = void;
 memset(blk.data.ptr, 0, BLOCK_SIZE);
 
 Not setting that void array (of uint? ubyte?) to void the compiler sets it to
zero with a memset, so I don't understand what you have gained.

void[SIZE]; fails with an obscure error message... you need to 
initialize it to void.

 And allocating an array of nodes offers a GC a chance to know what's inside
the block of memory, so it's more compatible with better future GCs (it may be
better even now).
 It also allows for safer initializations of the struct fields, for example on
some CPUs null can be != 0

I agree, but I wanted a generic construct that could also be used for 
allocating class instances.

 Later I'll try to adapt your code to Phobos1.

Thanks!

Apr 24 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Ternary Search Trees