digitalmars.D.learn - Need Advice: Union or Variant?

jwatson-CO-edu (33/33) Nov 17 2022 I have an implementation of the "[Little

H. S. Teoh (19/43) Nov 17 2022 In this case, since you're already keeping track of what type of data is

jwatson-CO-edu (9/30) Nov 17 2022 Thank you! This seems nice except there are a few fields that

H. S. Teoh (23/42) Nov 17 2022 [...]

jwatson-CO-edu (3/21) Nov 18 2022 Thank you, something similar to what you suggested reduced the

jwatson-CO-edu (32/34) Nov 18 2022 Oh, based on another forum post I added constructors in addition

Petar Kirov [ZombineDev] (18/51) Nov 17 2022 In general, I recommend

jwatson-CO-edu (10/41) Nov 17 2022 Thank you! This is intriguing.

jwatson-CO-edu <real.name colorado.edu> writes:

I have an implementation of the "[Little 
Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)"
educational programming language written in D,
[here](https://github.com/jwatson-CO-edu/SPARROW)".

It has many problems, but the one I want to solve first is the 
size of the "atoms" (units of data).

`Atom` is a struct that has fields for every possible type of 
data that the language supports. This means that a bool `Atom` 
unnecessarily takes up space in memory with fields for number, 
string, structure, etc.

Here is the 
[definition](https://github.com/jwatson-CO-edu/SPARROW/blob/main/lil_schemer.d#L55):

```d
enum F_Type{
     CONS, // Cons pair
     STRN, // String/Symbol
     NMBR, // Number
     EROR, // Error object
     BOOL, // Boolean value
     FUNC, // Function
}

struct Atom{
     F_Type  kind; // ---------------- What kind of atom this is
     Atom*   car; // ----------------- Left  `Atom` Pointer
     Atom*   cdr; // ----------------- Right `Atom` Pointer
     double  num; // ----------------- Number value
     string  str; // ----------------- String value, D-string 
underlies
     bool    bul; // ----------------- Boolean value
     F_Error err = F_Error.NOVALUE; // Error code
}

```
Question:
**Where do I begin my consolidation of space within `Atom`?  Do I 
use unions or variants?**

Nov 17 2022

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Thu, Nov 17, 2022 at 08:54:46PM +0000, jwatson-CO-edu via
Digitalmars-d-learn wrote:
[...]
 ```d
 enum F_Type{
     CONS, // Cons pair
     STRN, // String/Symbol
     NMBR, // Number
     EROR, // Error object
     BOOL, // Boolean value
     FUNC, // Function
 }
 
 struct Atom{
     F_Type  kind; // ---------------- What kind of atom this is
     Atom*   car; // ----------------- Left  `Atom` Pointer
     Atom*   cdr; // ----------------- Right `Atom` Pointer
     double  num; // ----------------- Number value
     string  str; // ----------------- String value, D-string underlies
     bool    bul; // ----------------- Boolean value
     F_Error err = F_Error.NOVALUE; // Error code
 }
 
 ```
 Question:
 **Where do I begin my consolidation of space within `Atom`?  Do I use
 unions or variants?**

In this case, since you're already keeping track of what type of data is
being stored in an Atom, use a union:

	struct Atom {
		F_Type kind;
		union {		// anonymous union
			Atom*   car; // ----------------- Left  `Atom` Pointer
			Atom*   cdr; // ----------------- Right `Atom` Pointer
			double  num; // ----------------- Number value
			string  str; // ----------------- String value, D-string underlies
			bool    bul; // ----------------- Boolean value
			F_Error err = F_Error.NOVALUE; // Error code
		}
	}

Use Variant if you don't want to keep track of the type yourself.


T

-- 
An elephant: A mouse built to government specifications. -- Robert Heinlein

Nov 17 2022

jwatson-CO-edu <real.name colorado.edu> writes:

On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:
 Question:
 **Where do I begin my consolidation of space within `Atom`?  
 Do I use
 unions or variants?**

 In this case, since you're already keeping track of what type 
 of data is being stored in an Atom, use a union:

 	struct Atom {
 		F_Type kind;
 		union {		// anonymous union
 			Atom*   car; // ----------------- Left  `Atom` Pointer
 			Atom*   cdr; // ----------------- Right `Atom` Pointer
 			double  num; // ----------------- Number value
 			string  str; // ----------------- String value, D-string 
 underlies
 			bool    bul; // ----------------- Boolean value
 			F_Error err = F_Error.NOVALUE; // Error code
 		}
 	}

 Use Variant if you don't want to keep track of the type 
 yourself.
 T

Thank you!  This seems nice except there are a few fields that 
need to coexist.
I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}.
`err` will be outside the union as well because I have decided 
that any type can have an error code attached.  As in an error 
number (other than NaN) can be returned instead of reserving 
certain numbers to represent errors.  Imagine if there was NaN 
for every datatype.

Nov 17 2022

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Thu, Nov 17, 2022 at 10:16:04PM +0000, jwatson-CO-edu via
Digitalmars-d-learn wrote:
 On Thursday, 17 November 2022 at 21:05:43 UTC, H. S. Teoh wrote:

[...]
 	struct Atom {
 		F_Type kind;
 		union {		// anonymous union
 			Atom*   car; // ----------------- Left  `Atom` Pointer
 			Atom*   cdr; // ----------------- Right `Atom` Pointer
 			double  num; // ----------------- Number value
 			string  str; // ----------------- String value, D-string underlies
 			bool    bul; // ----------------- Boolean value
 			F_Error err = F_Error.NOVALUE; // Error code
 		}
 	}


[...]
 Thank you!  This seems nice except there are a few fields that need to
 coexist.
 I need {`car`, `cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}.
 `err` will be outside the union as well because I have decided that
 any type can have an error code attached.  As in an error number
 (other than NaN) can be returned instead of reserving certain numbers
 to represent errors.  Imagine if there was NaN for every datatype.

[...]

Just create a nested anonymous struct, like this:

 	struct Atom {
 		F_Type kind;
 		union {		// anonymous union
			struct {
				Atom*   car; // ----------------- Left  `Atom` Pointer
				Atom*   cdr; // ----------------- Right `Atom` Pointer
			}
			struct {
				double  num; // ----------------- Number value
				string  str; // ----------------- String value, D-string underlies
			}
 			bool    bul; // ----------------- Boolean value
 		}
 		F_Error err = F_Error.NOVALUE; // Error code
 	}


T

-- 
Meat: euphemism for dead animal. -- Flora

Nov 17 2022

jwatson-CO-edu <real.name colorado.edu> writes:

On Thursday, 17 November 2022 at 22:49:37 UTC, H. S. Teoh wrote:
 Just create a nested anonymous struct, like this:

  	struct Atom {
  		F_Type kind;
  		union {		// anonymous union
 			struct {
 				Atom*   car; // ----------------- Left  `Atom` Pointer
 				Atom*   cdr; // ----------------- Right `Atom` Pointer
 			}
 			struct {
 				double  num; // ----------------- Number value
 				string  str; // ----------------- String value, D-string 
 underlies
 			}
  			bool    bul; // ----------------- Boolean value
  		}
  		F_Error err = F_Error.NOVALUE; // Error code
  	}


 T

Thank you, something similar to what you suggested reduced the 
atom size from 72 bytes to 40.

Nov 18 2022

jwatson-CO-edu <real.name colorado.edu> writes:

On Saturday, 19 November 2022 at 03:38:26 UTC, jwatson-CO-edu 
wrote:
 Thank you, something similar to what you suggested reduced the 
 atom size from 72 bytes to 40.

Oh, based on another forum post I added constructors in addition 
to reducing the atom size 44%.

```d
struct Atom{
     F_Type  kind; // What kind of atom this is
     union{
         double  num; // NMBR: Number value
         string  str; // STRN: String value, D-string
         bool    bul; // BOOL: Boolean value
         struct{ // ---- CONS: pair
             Atom* car; // Left  `Atom` Pointer
             Atom* cdr; // Right `Atom` Pointer
         }
         struct{ // ---- EROR: Code + Message
             F_Error err; // Error code
             string  msg; // Detailed desc
         }
     }
     // https://forum.dlang.org/post/omsbr8$7do$1 digitalmars.com
     this( double n ){ kind = F_Type.NMBR; num = n; } // make 
number
     this( string s ){ kind = F_Type.STRN; str = s; } // make 
string
     this( bool   b ){ kind = F_Type.BOOL; bul = b; } // make bool
     this( Atom* a, Atom* d ){ kind = F_Type.CONS; car = a; cdr = 
d; } // make cons
     this( F_Error e, string m ){ kind = F_Type.EROR; err = e; msg 
= m; } // make error
}
```

Nov 18 2022

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu 
wrote:
 I have an implementation of the "[Little 
 Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)"
educational programming language written in D,
[here](https://github.com/jwatson-CO-edu/SPARROW)".

 It has many problems, but the one I want to solve first is the 
 size of the "atoms" (units of data).

 `Atom` is a struct that has fields for every possible type of 
 data that the language supports. This means that a bool `Atom` 
 unnecessarily takes up space in memory with fields for number, 
 string, structure, etc.

 Here is the 
 [definition](https://github.com/jwatson-CO-edu/SPARROW/blob/main/lil_schemer.d#L55):

 ```d
 enum F_Type{
     CONS, // Cons pair
     STRN, // String/Symbol
     NMBR, // Number
     EROR, // Error object
     BOOL, // Boolean value
     FUNC, // Function
 }

 struct Atom{
     F_Type  kind; // ---------------- What kind of atom this is
     Atom*   car; // ----------------- Left  `Atom` Pointer
     Atom*   cdr; // ----------------- Right `Atom` Pointer
     double  num; // ----------------- Number value
     string  str; // ----------------- String value, D-string 
 underlies
     bool    bul; // ----------------- Boolean value
     F_Error err = F_Error.NOVALUE; // Error code
 }

 ```
 Question:
 **Where do I begin my consolidation of space within `Atom`?  Do 
 I use unions or variants?**

In general, I recommend 
[`std.sumtype`](https://dlang.org/phobos/std_sumtype), as it is 
one of the best D libraries for this purpose. It is implemented 
as a struct containing two fields: the `kind` and a `union` of 
all the possible types.
That said, one difficulty you are likely to face is with 
refactoring your code to use the 
[`match`](https://dlang.org/phobos/std_sumtype#.match) and 
[`tryMatch`](https://dlang.org/phobos/std_sumtype#.tryMatch) 
functions, as `std.sumtype.SumType` does not expose the 
underlying kind field.

Other notable alternatives are:
* [`mir-core`](https://code.dlang.org/packages/mir-core)'s 
`mir.algebraic`: http://mir-core.libmir.org/mir_algebraic.html
* 
[`taggedalgebraic`](https://code.dlang.org/packages/taggedalgebraic):
https://vibed.org/api/taggedalgebraic.taggedalgebraic/

Nov 17 2022

jwatson-CO-edu <real.name colorado.edu> writes:

On Thursday, 17 November 2022 at 21:19:56 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Thursday, 17 November 2022 at 20:54:46 UTC, jwatson-CO-edu 
 wrote:
 I have an implementation of the "[Little 
 Scheme](https://mitpress.mit.edu/9780262560993/the-little-schemer/)"
educational programming language written in D,
[here](https://github.com/jwatson-CO-edu/SPARROW)".

 It has many problems, but the one I want to solve first is the 
 size of the "atoms" (units of data).

 `Atom` is a struct that has fields for every possible type of 
 data that the language supports. This means that a bool `Atom` 
 unnecessarily takes up space in memory with fields for number, 
 string, structure, etc.

[...]
 Do I use unions or variants?**

 In general, I recommend 
 [`std.sumtype`](https://dlang.org/phobos/std_sumtype), as it is 
 one of the best D libraries for this purpose. It is implemented 
 as a struct containing two fields: the `kind` and a `union` of 
 all the possible types.
 That said, one difficulty you are likely to face is with 
 refactoring your code to use the 
 [`match`](https://dlang.org/phobos/std_sumtype#.match) and 
 [`tryMatch`](https://dlang.org/phobos/std_sumtype#.tryMatch) 
 functions, as `std.sumtype.SumType` does not expose the 
 underlying kind field.

 Other notable alternatives are:
 * [`mir-core`](https://code.dlang.org/packages/mir-core)'s 
 `mir.algebraic`: http://mir-core.libmir.org/mir_algebraic.html
 * 
 [`taggedalgebraic`](https://code.dlang.org/packages/taggedalgebraic):
https://vibed.org/api/taggedalgebraic.taggedalgebraic/

Thank you!  This is intriguing.
The different flavors of `Atom` I need will have either {`car`, 
`cdr`} -or- {`num`} -or- {`str`} -or- {`bul`}.  Does SumType 
allow me to store the multiple fields {`car`, `cdr`} in one of 
the types, while the other types have only one field?

Since this is a dynamically-typed language, I need the atoms to 
both be interchangeable and to serve different purposes at the 
same time.

Nov 17 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Need Advice: Union or Variant?