www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Template-style polymorphism in table structure

reply data pulverizer <data.pulverizer gmail.com> writes:
I am trying to build a data table object with unrestricted column 
types. The approach I am taking is to build a generic interface 
BaseVector class and then a subtype GenericVector(T) which 
inherits from the BaseVector. I then to build a Table class which 
contains columns that is a BaseVector array to represent the 
columns in the table.

My main question is how to return GenericVector!(T) from the 
getCol() method in the Table class instead of BaseVector.

Perhaps my Table implementation somehow needs to be linked to 
GenericVector(T) or maybe I have written BaseTable instead and I 
need to do something like a GenericTable(T...). However, my 
previous approach created a tuple type data object but once 
created, the type structure (column type configuration) could not 
be changed so no addition/removal of columns.


------------------------------------------------
import std.stdio : writeln, write, writefln;
import std.format : format;

interface BaseVector{
     BaseVector get(size_t);
}

class GenericVector(T) : BaseVector{
     T[] data;
     alias data this;
     GenericVector get(size_t i){
         return new GenericVector!(T)(data[i]);
     }
     this(T[] arr){
         this.data = arr;
     }
     this(T elem){
         this.data ~= elem;
     }
     void append(T[] arr){
         this.data ~= arr;
     }

     override string toString() const {
         return format("%s", data);
     }
}

class Table{
private:
     BaseVector[] data;
public:
     // How to return GenericVector!(T) here instead of BaseVector
     BaseVector getCol(size_t i){
         return data[i];
     }
     this(BaseVector[] x ...){
         foreach(col; x)
             this.data ~= col;
     }
     this(BaseVector[] x){
         this.data ~= x;
     }
     this(Table x, BaseVector[] y ...){
         this.data = x.data;
         foreach(col; y){
             this.data ~= col;
         }
     }
     void append(BaseVector[] x ...){
         foreach(col; x)
             this.data ~= x;
     }
}


void main(){
     auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
     auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 
4.4, 5.5]);
     auto names = new GenericVector!(string)(["one", "two", 
"three", "four", "five"]);
     Table df = new Table(index, numbers, names);
     // I'd like this to be GenericVector!(T)
     writeln(typeid(df.getCol(0)));
}
Sep 04 2016
next sibling parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer 
wrote:
 [...]
Your code is not very D style and, based on your needs, there may be better ways to achieve your goal, but without knowing your use case, it's difficult to give correct advice. Talking about that writeln statement, your code is not working because of a known compiler bug [1]. If you change your interface BaseVector to an abstract class and add the needed override annotation to GenericVector, then typeid returns the expected result. [1] https://issues.dlang.org/show_bug.cgi?id=13833
Sep 04 2016
parent data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 14:02:03 UTC, Lodovico Giaretta 
wrote:
 Your code is not very D style
... Well I guess I could have contracted the multiple constructors in GenericVector(T) and and DataFrame?
Sep 04 2016
prev sibling next sibling parent ZombineDev <petar.p.kirov gmail.com> writes:
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer 
wrote:
 I am trying to build a data table object with unrestricted 
 column types. The approach I am taking is to build a generic 
 interface BaseVector class and then a subtype GenericVector(T) 
 which inherits from the BaseVector. I then to build a Table 
 class which contains columns that is a BaseVector array to 
 represent the columns in the table.

 My main question is how to return GenericVector!(T) from the 
 getCol() method in the Table class instead of BaseVector.

 Perhaps my Table implementation somehow needs to be linked to 
 GenericVector(T) or maybe I have written BaseTable instead and 
 I need to do something like a GenericTable(T...). However, my 
 previous approach created a tuple type data object but once 
 created, the type structure (column type configuration) could 
 not be changed so no addition/removal of columns.


 ------------------------------------------------
 import std.stdio : writeln, write, writefln;
 import std.format : format;

 interface BaseVector{
     BaseVector get(size_t);
 }

 class GenericVector(T) : BaseVector{
     T[] data;
     alias data this;
     GenericVector get(size_t i){
         return new GenericVector!(T)(data[i]);
     }
     this(T[] arr){
         this.data = arr;
     }
     this(T elem){
         this.data ~= elem;
     }
     void append(T[] arr){
         this.data ~= arr;
     }

     override string toString() const {
         return format("%s", data);
     }
 }

 class Table{
 private:
     BaseVector[] data;
 public:
     // How to return GenericVector!(T) here instead of 
 BaseVector
     BaseVector getCol(size_t i){
         return data[i];
     }
     this(BaseVector[] x ...){
         foreach(col; x)
             this.data ~= col;
     }
     this(BaseVector[] x){
         this.data ~= x;
     }
     this(Table x, BaseVector[] y ...){
         this.data = x.data;
         foreach(col; y){
             this.data ~= col;
         }
     }
     void append(BaseVector[] x ...){
         foreach(col; x)
             this.data ~= x;
     }
 }


 void main(){
     auto index = new GenericVector!(int)([1, 2, 3, 4, 5]);
     auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 
 4.4, 5.5]);
     auto names = new GenericVector!(string)(["one", "two", 
 "three", "four", "five"]);
     Table df = new Table(index, numbers, names);
     // I'd like this to be GenericVector!(T)
     writeln(typeid(df.getCol(0)));
 }
Since BaseVector is a polymorphic type you can't know in advance (at compile-time) the type of the object at a particular index. The only way to get a typed result is to specify the type that you expect, by providing a type parameter to the function: The cast operator will perform a dynamic cast at runtime which will return an object of the requested type, or null, if object is of some other type. GenericVector!ExpectedType getTypedCol(ExpectedType)(size_t i){ assert (cast(GenericVector!ExpectedType)data[i], format("The vector at col %s is not of type %s, but %s", i, ExpectedType.stringof, typeof(data[i]))); return cast(GenericVector!ExpectedType)data[i]; } void main(){ auto index = new GenericVector!(int)([1, 2, 3, 4, 5]); auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]); auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]); Table df = new Table(index, numbers, names); if (typeid(df.getCol(0) == typeid(string)) writeln(df.getTypedCol!string(0).data); else if (typeid(df.getCol(0) == typeid(int)) writeln(df.getTypedCol!int(0).data); // and so on... } Another way to approach the problem is to keep your data in an Algebraic. (https://dpaste.dzfl.pl/7a4e9bf408d1): import std.meta : AliasSeq; import std.variant : Algebraic, visit; import std.stdio : writefln; alias AllowedTypes = AliasSeq!(int[], double[], string[]); alias Vector = Algebraic!AllowedTypes; alias Table = Vector[]; void main() { Vector indexes = [1, 2, 3, 4, 5]; Vector numbers = [1.1, 2.2, 3.3, 4.4, 5.5]; Vector names = ["one", "two", "three", "four", "five"]; Table table = [indexes, numbers, names]; foreach (idx, col; table) col.visit!( (int[] indexColumn) => writefln("An index column at %s. Contents: %s", idx, indexColumn), (double[] numberColumn) => writefln("A number column at %s. Contents: %s", idx, numberColumn), (string[] namesColumn) => writefln("A string column at %s. Contents: %s", idx, namesColumn) ); } Application output: An index column at 0. Contents: [1, 2, 3, 4, 5] A number column at 1. Contents: [1.1, 2.2, 3.3, 4.4, 5.5] A string column at 2. Contents: ["one", "two", "three", "four", "five"]
Sep 04 2016
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer 
wrote:
 My main question is how to return GenericVector!(T) from the 
 getCol() method in the Table class instead of BaseVector.
I think I just solved my own query, change the BaseVector interface to a class and override it in the GenericVector(T) class: ---------------------------- class BaseVector{ BaseVector get(size_t){ return new BaseVector; }; } class GenericVector(T) : BaseVector{ T[] data; alias data this; override GenericVector get(size_t i){ return new GenericVector!(T)(data[i]); } this(T[] arr){ this.data = arr; } this(T elem){ this.data ~= elem; } void append(T[] arr){ this.data ~= arr; } override string toString() const { return format("%s", data); } } class Table{ // ... as before } void main(){ auto index = new GenericVector!(int)([1, 2, 3, 4, 5]); auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]); auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]); Table df = new Table(index, numbers, names); // now prints table.GenericVector!int.GenericVector writeln(typeid(df.getCol(0))); }
Sep 04 2016
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer 
wrote:
 Lodovico Giaretta Thanks I just saw your update!
Sep 04 2016
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer 
wrote:
 On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer 
 wrote:
  Lodovico Giaretta Thanks I just saw your update!
Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...
Sep 04 2016
parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 4 September 2016 at 14:24:12 UTC, data pulverizer 
wrote:
 On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer 
 wrote:
 On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer 
 wrote:
  Lodovico Giaretta Thanks I just saw your update!
Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...
The constructors can be less. In fact, a typesafe variadic ctor also works for the single element case and for the array case. But you already recognized that. Instead of reinventing the wheel for your GenericVector!T, you could use an `alias this` to directly inherit all operation on the underlying array, without having to reimplement them (like your append method). Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type: GenericVector!T getCol!T(size_t i) { if(typeid(cols[i]) == typeid(GenericVector!T)) return cast(GenericVector!T)cols[i]; else // assert(0) or throw exception } Another solution: if you don't need to dynamically change the type of the columns you can have the addColumn function create a new type. I show you with Tuples because it's easier: Tuple!(T,U) append(U, T...)(Tuple!T tup, U col) { return Tuple!(T,U)(tup.expand, col); } Tuple!int t1; Tuple!(int, float) t2 = t1.append(2.0); Tuple!(int, float, char) t3 = t2.append('c');
Sep 04 2016
next sibling parent data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta 
wrote:
 On Sunday, 4 September 2016 at 14:24:12 UTC, data pulverizer 
 wrote:
 On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer 
 wrote:
  Lodovico Giaretta BTW what do you mean that my code is not 
 very D style? Please expand on this ...
The constructors can be less. In fact, a typesafe variadic ctor also works for the single element case and for the array case. But you already recognized that. Instead of reinventing the wheel for your GenericVector!T, you could use an `alias this` to directly inherit all operation on the underlying array, without having to reimplement them (like your append method). Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type: GenericVector!T getCol!T(size_t i) { if(typeid(cols[i]) == typeid(GenericVector!T)) return cast(GenericVector!T)cols[i]; else // assert(0) or throw exception } Another solution: if you don't need to dynamically change the type of the columns you can have the addColumn function create a new type. I show you with Tuples because it's easier: Tuple!(T,U) append(U, T...)(Tuple!T tup, U col) { return Tuple!(T,U)(tup.expand, col); } Tuple!int t1; Tuple!(int, float) t2 = t1.append(2.0); Tuple!(int, float, char) t3 = t2.append('c');
Thank you for the very useful suggestions, I shall take these forward. On the suggestion of creating Tuple-like tables, I already tried that but found as you said that once the table is created, adding/removing columns is essentially creating a different data type, which needs a new variable name each time. I am building a table type I hope will be used for data manipulation for data science and statistics applications, so I require a data structure that can allow adding and removing columns of various types as well as a data structure that can cope with any type that hasn't been planned for, which is why I selected this polymorphic template approach. It is more flexible than other data structures I have seen in dynamic programming languages R's data frame and Python pandas. Even Scala's Spark dataframes rely on wrapping everything in Any and the user still has to write a special data structure for each new type. The only thing that is similar to this approach is Julia's DataFrame but Julia - though a very good programming language has limitations. I feel as if I am constantly scratching the surface of what D can do, but I have recently managed to get more time on my hands and it looks as if that will continue into the future which will mean more focusing on D, improving my generic programming skills and hopefully creating some useful artifacts. Perhaps I need to read Andrei's Modern C++ Design book for a better way to think about generics.
Sep 04 2016
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta 
wrote:
 Your getCol(i) could become getCol!T(i) and return an instance 
 of GenericVector!T directly, after checking that the required 
 column has in fact that type:

 GenericVector!T getCol!T(size_t i)
 {
     if(typeid(cols[i]) == typeid(GenericVector!T))
         return cast(GenericVector!T)cols[i];
     else
         // assert(0) or throw exception
 }
I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid!
Sep 04 2016
next sibling parent data pulverizer <data.pulverizer gmail.com> writes:
On Monday, 5 September 2016 at 06:45:07 UTC, data pulverizer 
wrote:
 On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta 
 wrote:
 Your getCol(i) could become getCol!T(i) and return an instance 
 of GenericVector!T directly, after checking that the required 
 column has in fact that type:

 GenericVector!T getCol!T(size_t i)
 {
     if(typeid(cols[i]) == typeid(GenericVector!T))
         return cast(GenericVector!T)cols[i];
     else
         // assert(0) or throw exception
 }
I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid!
Just found this on dynamic dispatching (https://wiki.dlang.org/Dispatching_an_object_based_on_its_dynamic_type) but even if you took this approach, you'd still have to register all the types you would be using at the start of the script for all your methods. It's either that or explicitly limited type list as ZombineDev suggests.
Sep 05 2016
prev sibling next sibling parent Lodovico Giaretta <lodovico giaretart.net> writes:
On Monday, 5 September 2016 at 06:45:07 UTC, data pulverizer 
wrote:
 On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta 
 wrote:
 Your getCol(i) could become getCol!T(i) and return an instance 
 of GenericVector!T directly, after checking that the required 
 column has in fact that type:

 GenericVector!T getCol!T(size_t i)
 {
     if(typeid(cols[i]) == typeid(GenericVector!T))
         return cast(GenericVector!T)cols[i];
     else
         // assert(0) or throw exception
 }
I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid!
ZombineDev is definitely correct, in that one thing is the static type, and another thing is the dynamic type. The type of a variable, or the return type of a method are based on the static type, computed at compile time. The "true" dynamic type is only available at runtime. That's why I was showing you the use of tuples. If your code does not have branches that assign different column types, having the types statically determined as template parameters is the best choice. If you really need dynamic types, then there's no alternative: the user must explicitly cast things to the correct dynamic type (my version of getCol is just a nice wrapper to do that). In fact, idiomatic D code tries to avoid dynamic types when possible, preferring templates.
Sep 05 2016
prev sibling parent Kagamin <spam here.lot> writes:
On Monday, 5 September 2016 at 06:45:07 UTC, data pulverizer 
wrote:
 I just realized that typeid only gives the class and not the 
 actual type, so the object will still need to be cast as you 
 mentioned above, however your above function will not infer T, 
 so the user will have to provide it. I wonder if there is a way 
 to dispatch the right type by a dynamic cast or I fear that 
 ZombineDev may be correct and the types will have to be 
 limited, which I definitely want to avoid!
If you know at compile time that column 0 is of type int, you don't have freedom at run time to add non-int column 0.
Sep 09 2016