www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.container.rbtree as Interval Tree?

reply James Blachly <james.blachly gmail.com> writes:
I tried to implement an interval tree backed by 
std.container.rbtree today and fell flat.

A standard way to make an interval tree is to make an augmented 
tree; I supposed since rbtree was a generic container and because 
I could define opCmp, this should be a cinch. I ran into two 
problems.

First (minor problem), RedBlackTree considers a return value of 0 
from opCmp equivalent to "equals", which is discordant with the 
guidance given for opCmp.[1] This is a  minor pedantic point, 
since you cannot store un-orderable elements in the tree anyway.

More importantly, though, I cannot figure out how to implement an 
interval query function on the tree. Typically, the tree would 
have a "key" that is the left position of the interval and the 
augmented part of the tree would be that a second value -- a 
right, or end, position. My Elem == key type is a struct 
encapsulating both of these (start, end; plus some metadata).

For my Interval element type, I overloaded opCmp to take an 
integer, but unfortunately RedBlackTree's upperBound() and 
lowerBound() functions are defined in terms of "Elem" which is 
aliased to the contained element type, rather than a generic type.

Q1: Is there a simple or elegant way to do this without 
re-implementing RedBlackTree? I apologize if it is obvious and I 
am missing it. I suppose it may work if I rewrite Interval's 
opCmp to not consider the upper bound, and by creating a dummy 
interval to pass to upperBound and lowerBound, but that seems 
inelegant compared to passing an integer.

Q2: Would replacing "Elem" with a generic type "T" in the 
function signatures for upperBound, lowerBound, and various 
related fns like _firstGreater / _firstGreaterEqual solve this 
problem?

James

[1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For 
example ... x and y are disjoint sets, then neither x < y nor y < 
x holds, but that does not imply that x == y. Thus, it is 
insufficient to determine equality purely based on opCmp alone. ")
Feb 04 2019
next sibling parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
 I tried to implement an interval tree backed by 
 std.container.rbtree today and fell flat.

 [...]
You can use alias this [1] in your interval element type: struct IntervalElem { size_t start, end; /* ... other declarations */ alias start this; } [1] https://dlang.org/spec/class.html#AliasThis
 Q2: Would replacing "Elem" with a generic type "T" in the 
 function signatures for upperBound, lowerBound, and various 
 related fns like _firstGreater / _firstGreaterEqual solve this 
 problem?

 [...]
Elem is already a generic type. I don't know how you can make it more generic without adding other template parameters to the class declaration (which means reimplementing RBTree from std.container);
 James

 [1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For 
 example ... x and y are disjoint sets, then neither x < y nor y 
 < x holds, but that does not imply that x == y. Thus, it is 
 insufficient to determine equality purely based on opCmp alone. 
 ")
Feb 05 2019
parent James Blachly <james.blachly gmail.com> writes:
On Tuesday, 5 February 2019 at 10:10:44 UTC, RazvanN wrote:
 On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
 I tried to implement an interval tree backed by 
 std.container.rbtree today and fell flat.

 [...]
You can use alias this [1] in your interval element type: struct IntervalElem { size_t start, end; /* ... other declarations */ alias start this; } [1] https://dlang.org/spec/class.html#AliasThis
Thanks -- I always seem to forget about `alias this` ! However, I don't think that this helps me to call functions expecting type Elem with an integer. At least, it failed in my test -- could this be because Elem itself is already an alias?
 Q2: Would replacing "Elem" with a generic type "T" in the 
 function signatures for upperBound, lowerBound, and various 
 related fns like _firstGreater / _firstGreaterEqual solve this 
 problem?

 [...]
Elem is already a generic type. I don't know how you can make it more generic without adding other template parameters to the class declaration (which means reimplementing RBTree from std.container);
Agree, (although I think I would only need to revise only perhaps 25% of the module in this case) but I definitely wanted to avoid this if possible.
Feb 05 2019
prev sibling parent reply Eduard Staniloiu <edi33416 gmail.com> writes:
On Monday, 4 February 2019 at 22:54:01 UTC, James Blachly wrote:
 I tried to implement an interval tree backed by 
 std.container.rbtree today and fell flat.

 A standard way to make an interval tree is to make an augmented 
 tree; I supposed since rbtree was a generic container and 
 because I could define opCmp, this should be a cinch. I ran 
 into two problems.

 First (minor problem), RedBlackTree considers a return value of 
 0 from opCmp equivalent to "equals", which is discordant with 
 the guidance given for opCmp.[1] This is a  minor pedantic 
 point, since you cannot store un-orderable elements in the tree 
 anyway.

 More importantly, though, I cannot figure out how to implement 
 an interval query function on the tree. Typically, the tree 
 would have a "key" that is the left position of the interval 
 and the augmented part of the tree would be that a second value 
 -- a right, or end, position. My Elem == key type is a struct 
 encapsulating both of these (start, end; plus some metadata).

 For my Interval element type, I overloaded opCmp to take an 
 integer, but unfortunately RedBlackTree's upperBound() and 
 lowerBound() functions are defined in terms of "Elem" which is 
 aliased to the contained element type, rather than a generic 
 type.

 Q1: Is there a simple or elegant way to do this without 
 re-implementing RedBlackTree? I apologize if it is obvious and 
 I am missing it. I suppose it may work if I rewrite Interval's 
 opCmp to not consider the upper bound, and by creating a dummy 
 interval to pass to upperBound and lowerBound, but that seems 
 inelegant compared to passing an integer.

 Q2: Would replacing "Elem" with a generic type "T" in the 
 function signatures for upperBound, lowerBound, and various 
 related fns like _firstGreater / _firstGreaterEqual solve this 
 problem?

 James

 [1] https://dlang.org/spec/operatoroverloading.html#eqcmp ("For 
 example ... x and y are disjoint sets, then neither x < y nor y 
 < x holds, but that does not imply that x == y. Thus, it is 
 insufficient to determine equality purely based on opCmp alone. 
 ")
I think you are making a slight confusion. Your `Interval` struct and the `Elem` type that `lowerBound` takes, are the same type. You can define your RBTree and Interval as follows ``` struct Interval { int start; int end; } alias IntervalTree = RedBlackTree!(Interval, (i1, i2) => i1.start < i2.start); ``` Please see this runable example: https://run.dlang.io/is/4cPTik The in-order traversal will be the same as the wikipedia example here: https://en.wikipedia.org/wiki/Interval_tree Hope this helps. Cheers, Edi
Feb 05 2019
parent reply James Blachly <james.blachly gmail.com> writes:
On Tuesday, 5 February 2019 at 16:24:03 UTC, Eduard Staniloiu 
wrote:
 I think you are making a slight confusion. Your `Interval` 
 struct and the `Elem` type that `lowerBound` takes, are the 
 same type.

 You can define your RBTree and Interval as follows
 ```
 struct Interval
 {
     int start;
     int end;
 }

 alias IntervalTree = RedBlackTree!(Interval, (i1, i2) => 
 i1.start < i2.start);
 ```

 Please see this runable example: https://run.dlang.io/is/4cPTik

 The in-order traversal will be the same as the wikipedia 
 example here: https://en.wikipedia.org/wiki/Interval_tree

 Hope this helps.

 Cheers,
 Edi
Edi, thanks for your quick reply! I do understand that Elem is aliased to my Interval type. Your suggested rewrite of the less fn is precisely what I was groaning about (although not explicitly) in terms of rewriting opCmp to be a simple `this.start < other.start`. The reason that this is undesirable is that distinct intervals with same starting coordinates are then considered equal and not added, unless RBTree tepmlate is instantiated with allowDuplicates. However, even when allowing (pseudo)duplicates, this means the distinct intervals with same start but different end coordinates are not deterministically placed/sorted within the tree, because they are not sortable with the simple `this.start < other.start` less function. Anyway, in the end, I will assume that in my problem domain there are no degenerate intervals with same start coordinate and use the simpler less to accomplish goal. Hopefully the upperBound and lowerBound functions are lazy...
Feb 05 2019
parent Eduard Staniloiu <edi33416 gmail.com> writes:
On Tuesday, 5 February 2019 at 19:12:43 UTC, James Blachly wrote:
 However, even when allowing (pseudo)duplicates, this means the 
 distinct intervals with same start but different end 
 coordinates are not deterministically placed/sorted within the 
 tree, because they are not sortable with the simple `this.start 
 < other.start` less function.
I have updated the example with an `opCmp` implementation. https://run.dlang.io/is/ailnsy I believe this is what you are looking for. Cheers, Edi
Feb 07 2019