digitalmars.D.learn - Need Advice: Union or Variant?
- jwatson-CO-edu (33/33) Nov 17 2022 I have an implementation of the "[Little
- H. S. Teoh (19/43) Nov 17 2022 In this case, since you're already keeping track of what type of data is
- jwatson-CO-edu (9/30) Nov 17 2022 Thank you! This seems nice except there are a few fields that
- H. S. Teoh (23/42) Nov 17 2022 [...]
- jwatson-CO-edu (3/21) Nov 18 2022 Thank you, something similar to what you suggested reduced the
- jwatson-CO-edu (32/34) Nov 18 2022 Oh, based on another forum post I added constructors in addition
- Petar Kirov [ZombineDev] (18/51) Nov 17 2022 In general, I recommend
- jwatson-CO-edu (10/41) Nov 17 2022 Thank you! This is intriguing.
I have an implementation of the "[Little Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)" educational programming language written in D, [here](https://github.com/jwatson-CO-edu/SPARROW)". It has many problems, but the one I want to solve first is the size of the "atoms" (units of data). `Atom` is a struct that has fields for every possible type of data that the language supports. This means that a bool `Atom` unnecessarily takes up space in memory with fields for number, string, structure, etc. Here is the [definition](https://github.com/jwatson-CO-edu/SPARROW/blob/main/lil_schemer.d#L55): ```d enum F_Type{ CONS, // Cons pair STRN, // String/Symbol NMBR, // Number EROR, // Error object BOOL, // Boolean value FUNC, // Function } struct Atom{ F_Type kind; // ---------------- What kind of atom this is Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } ``` Question: **Where do I begin my consolidation of space within `Atom`? Do I use unions or variants?**
Nov 17 2022
On Thu, Nov 17, 2022 at 08:54:46PM +0000, jwatson-CO-edu via Digitalmars-d-learn wrote: [...]```d enum F_Type{ CONS, // Cons pair STRN, // String/Symbol NMBR, // Number EROR, // Error object BOOL, // Boolean value FUNC, // Function } struct Atom{ F_Type kind; // ---------------- What kind of atom this is Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } ``` Question: **Where do I begin my consolidation of space within `Atom`? Do I use unions or variants?**In this case, since you're already keeping track of what type of data is being stored in an Atom, use a union: struct Atom { F_Type kind; union { // anonymous union Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } } Use Variant if you don't want to keep track of the type yourself. T -- An elephant: A mouse built to government specifications. -- Robert Heinlein
Nov 17 2022
On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:Thank you! This seems nice except there are a few fields that need to coexist. I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}. `err` will be outside the union as well because I have decided that any type can have an error code attached. As in an error number (other than NaN) can be returned instead of reserving certain numbers to represent errors. Imagine if there was NaN for every datatype.Question: **Where do I begin my consolidation of space within `Atom`? Do I use unions or variants?**In this case, since you're already keeping track of what type of data is being stored in an Atom, use a union: struct Atom { F_Type kind; union { // anonymous union Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } } Use Variant if you don't want to keep track of the type yourself. T
Nov 17 2022
On Thu, Nov 17, 2022 at 10:16:04PM +0000, jwatson-CO-edu via Digitalmars-d-learn wrote:On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:[...][...]struct Atom { F_Type kind; union { // anonymous union Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } }Thank you! This seems nice except there are a few fields that need to coexist. I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}. `err` will be outside the union as well because I have decided that any type can have an error code attached. As in an error number (other than NaN) can be returned instead of reserving certain numbers to represent errors. Imagine if there was NaN for every datatype.[...] Just create a nested anonymous struct, like this: struct Atom { F_Type kind; union { // anonymous union struct { Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer } struct { double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies } bool bul; // ----------------- Boolean value } F_Error err = F_Error.NOVALUE; // Error code } T -- Meat: euphemism for dead animal. -- Flora
Nov 17 2022
On Thursday, 17 November 2022 at 22:49:37 UTC, H. S. Teoh wrote:Just create a nested anonymous struct, like this: struct Atom { F_Type kind; union { // anonymous union struct { Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer } struct { double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies } bool bul; // ----------------- Boolean value } F_Error err = F_Error.NOVALUE; // Error code } TThank you, something similar to what you suggested reduced the atom size from 72 bytes to 40.
Nov 18 2022
On Saturday, 19 November 2022 at 03:38:26 UTC, jwatson-CO-edu wrote:Thank you, something similar to what you suggested reduced the atom size from 72 bytes to 40.Oh, based on another forum post I added constructors in addition to reducing the atom size 44%. ```d struct Atom{ F_Type kind; // What kind of atom this is union{ double num; // NMBR: Number value string str; // STRN: String value, D-string bool bul; // BOOL: Boolean value struct{ // ---- CONS: pair Atom* car; // Left `Atom` Pointer Atom* cdr; // Right `Atom` Pointer } struct{ // ---- EROR: Code + Message F_Error err; // Error code string msg; // Detailed desc } } // https://forum.dlang.org/post/omsbr8$7do$1 digitalmars.com this( double n ){ kind = F_Type.NMBR; num = n; } // make number this( string s ){ kind = F_Type.STRN; str = s; } // make string this( bool b ){ kind = F_Type.BOOL; bul = b; } // make bool this( Atom* a, Atom* d ){ kind = F_Type.CONS; car = a; cdr = d; } // make cons this( F_Error e, string m ){ kind = F_Type.EROR; err = e; msg = m; } // make error } ```
Nov 18 2022
On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu wrote:I have an implementation of the "[Little Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)" educational programming language written in D, [here](https://github.com/jwatson-CO-edu/SPARROW)". It has many problems, but the one I want to solve first is the size of the "atoms" (units of data). `Atom` is a struct that has fields for every possible type of data that the language supports. This means that a bool `Atom` unnecessarily takes up space in memory with fields for number, string, structure, etc. Here is the [definition](https://github.com/jwatson-CO-edu/SPARROW/blob/main/lil_schemer.d#L55): ```d enum F_Type{ CONS, // Cons pair STRN, // String/Symbol NMBR, // Number EROR, // Error object BOOL, // Boolean value FUNC, // Function } struct Atom{ F_Type kind; // ---------------- What kind of atom this is Atom* car; // ----------------- Left `Atom` Pointer Atom* cdr; // ----------------- Right `Atom` Pointer double num; // ----------------- Number value string str; // ----------------- String value, D-string underlies bool bul; // ----------------- Boolean value F_Error err = F_Error.NOVALUE; // Error code } ``` Question: **Where do I begin my consolidation of space within `Atom`? Do I use unions or variants?**In general, I recommend [`std.sumtype`](https://dlang.org/phobos/std_sumtype), as it is one of the best D libraries for this purpose. It is implemented as a struct containing two fields: the `kind` and a `union` of all the possible types. That said, one difficulty you are likely to face is with refactoring your code to use the [`match`](https://dlang.org/phobos/std_sumtype#.match) and [`tryMatch`](https://dlang.org/phobos/std_sumtype#.tryMatch) functions, as `std.sumtype.SumType` does not expose the underlying kind field. Other notable alternatives are: * [`mir-core`](https://code.dlang.org/packages/mir-core)'s `mir.algebraic`: http://mir-core.libmir.org/mir_algebraic.html * [`taggedalgebraic`](https://code.dlang.org/packages/taggedalgebraic): https://vibed.org/api/taggedalgebraic.taggedalgebraic/
Nov 17 2022
On Thursday, 17 November 2022 at 21:19:56 UTC, Petar Kirov [ZombineDev] wrote:On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu wrote:Thank you! This is intriguing. The different flavors of `Atom` I need will have either {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}. Does SumType allow me to store the multiple fields {`car`, `cdr`} in one of the types, while the other types have only one field? Since this is a dynamically-typed language, I need the atoms to both be interchangeable and to serve different purposes at the same time.I have an implementation of the "[Little Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)" educational programming language written in D, [here](https://github.com/jwatson-CO-edu/SPARROW)". It has many problems, but the one I want to solve first is the size of the "atoms" (units of data). `Atom` is a struct that has fields for every possible type of data that the language supports. This means that a bool `Atom` unnecessarily takes up space in memory with fields for number, string, structure, etc. [...] Do I use unions or variants?**In general, I recommend [`std.sumtype`](https://dlang.org/phobos/std_sumtype), as it is one of the best D libraries for this purpose. It is implemented as a struct containing two fields: the `kind` and a `union` of all the possible types. That said, one difficulty you are likely to face is with refactoring your code to use the [`match`](https://dlang.org/phobos/std_sumtype#.match) and [`tryMatch`](https://dlang.org/phobos/std_sumtype#.tryMatch) functions, as `std.sumtype.SumType` does not expose the underlying kind field. Other notable alternatives are: * [`mir-core`](https://code.dlang.org/packages/mir-core)'s `mir.algebraic`: http://mir-core.libmir.org/mir_algebraic.html * [`taggedalgebraic`](https://code.dlang.org/packages/taggedalgebraic): https://vibed.org/api/taggedalgebraic.taggedalgebraic/
Nov 17 2022