digitalmars.D - Interesting Research Paper on Constructors in OO Languages

Meta (7/7) Jul 15 2013 I saw an interesting post on Hacker News about constructors in OO

H. S. Teoh (163/171) Jul 15 2013 Thanks for the link; this touches on one of my pet peeves about OO

Meta (36/287) Jul 15 2013 I also find constructors with multiple arguments a pain to use.

H. S. Teoh (121/186) Jul 16 2013 Yeah, when every level of the hierarchy introduces 2-3 new overloads of

Dicebot (3/5) Jul 16 2013 Wasn't this stated in TDPL as one of primary design rationales

H. S. Teoh (8/14) Jul 16 2013 Haha, you're right. I read it before but apparently the only thing that

H. S. Teoh (110/150) Jul 16 2013 [...]

Jacob Carlborg (7/74) Jul 16 2013 On the other hand D supports the following syntax:
deadalnix (9/9) Jul 16 2013 My policy is to require the bare minimum to construct a valid

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/13) Jul 17 2013 +0.33

Regan Heath (62/69) Jul 16 2013 First thought; constructors with positional arguments aren't any

Craig Dillabaugh (4/31) Jul 16 2013 How do you envision this working where Name or Age must be set to

Dicebot (2/5) Jul 16 2013 Contracts are run-time entities (omitted in release AFAIR).
Wyatt (13/16) Jul 16 2013 I'm not sure if it's practical or covers all the bases, but it

Craig Dillabaugh (1/18) Jul 16 2013
Craig Dillabaugh (6/23) Jul 16 2013 Sorry for the empty post (previous). In general, I think the

Regan Heath (18/55) Jul 16 2013 The idea isn't to run the invariant itself at compile time - as you say,...

H. S. Teoh (66/107) Jul 16 2013 Maybe I'm missing something obvious, but isn't this essentially the same

Regan Heath (44/103) Jul 17 2013 Yes, if we're comparing this to ctors with named parameters. I wasn't

H. S. Teoh (315/414) Jul 17 2013 Ah, I see. So basically, you need some kind of enforcement of a

w0rp (3/3) Jul 17 2013 I always just avoided confusion by limiting myself to a maximum

H. S. Teoh (21/24) Jul 17 2013 My original motivation for trying to tackle this problem was when I was

eles (15/16) Jul 17 2013 This is how it is done in Ecere SDK and the eC language:

eles (16/19) Jul 17 2013 Example:

Regan Heath (43/136) Jul 18 2013 Yes, that's basically it.

H. S. Teoh (125/232) Jul 18 2013 Why would requiring separate objects be a problem?

Regan Heath (68/189) Jul 19 2013 It's not a problem, it's just better not to, if at all possible. K.I.S.S...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (20/30) Jul 16 2013 Here's a way to do it in Scala:

Regan Heath (22/46) Jul 17 2013 On Tue, 16 Jul 2013 18:54:06 +0100, J=C3=A9r=C3=B4me M. Berger

"Meta" <jared771 gmail.com> writes:

I saw an interesting post on Hacker News about constructors in OO 
languages. Apparently they are a real stumbling block for some 
programmers, which was quite a surprise to me. I think this might 
be relevant to a discussion about named parameters and whether we 
should ditch constructors for another kind of construct.

Link to the newsgroup post, the link to the paper is near the top:
http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

Jul 15 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jul 15, 2013 at 09:06:38PM +0200, Meta wrote:
 I saw an interesting post on Hacker News about constructors in OO
 languages. Apparently they are a real stumbling block for some
 programmers, which was quite a surprise to me. I think this might be
 relevant to a discussion about named parameters and whether we
 should ditch constructors for another kind of construct.
 
 Link to the newsgroup post, the link to the paper is near the top:
 http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

Thanks for the link; this touches on one of my pet peeves about OO
libraries: constructors.

I consider myself to be a "systematic" programmer (according to the
definition in the paper); I can work equally well with ctors with
arguments vs. create-set-call objects. But I find that mandatory ctors
with arguments are a pain to work with, *both* to write and to use.

On the usability side, there's the mental workload of having to remember
which order the arguments appear in (or look it up in the IDE, or
whatever -- the point is that I can't just type the ctor call straight
from my head). Then there's the problem of needing to create objects
required by the ctor before you can call the ctor. In some cases, this
can be inconvenient -- I always have to remember to setup and create
other objects before I can create this one, because its ctor requires
said objects as arguments. Then there's the lack of flexibility: no
matter what you do, it seems that anything that requires more than a
single ctor argument inevitably becomes either (1) too complex,
requiring too many arguments, and therefore very difficult to use, or
(2) too simplistic, and therefore unable to do some things that I may
want to do (e.g. some fields are default-initialized with no way to
specify the initial values of the fields, 'cos otherwise the ctor would
have too many arguments). No matter what you do, it seems almost
impossible to come up with an ideal ctor except in trivial cases where
it requires only 1 argument or is a default ctor.

On the writability side, one of my pet peeves is base class ctors that
require multiple arguments. Every level of inheritance inevitably adds
more arguments each time, and by the time you're 5-6 levels down the
class hierarchy, your ctor calls just have an unmanageable number of
parameters. Not to mention the violation of DRY by requiring much
redundant typing just to pass arguments from the inherited class' ctor
up the class hierarchy. Tons of bugs to be had everywhere, given the
amount of repeated typing needed.

In the simplest cases, of course, these aren't big issues, but this kind
of ctor design is clearly not scalable.

OTOH, the create-set-call pattern isn't panacea either. One of the
biggest problems with this pattern is that you can't guarantee your
objects are in a consistent state at all times. This is very bad,
because all your methods will have to check if some value has been set
yet, before it uses it. This adds a lot of complexity that could've been
avoided had everything been set at ctor-time. This also makes class
invariants needlessly complex. Moreover, I've seen many classes in this
category exhibit undefined behaviour if you call a value-setting method
after you start using the object. Too many classes falsely assume that
you will always call set methods and then "use" methods in that order.
If you call a set method after calling a "use" method, you're quite
likely to run into bugs in the class, e.g. part of the object's state
doesn't reflect the new value you set, because the "use" methods were
written with the assumption that when they were called the first time,
the values you set earlier won't change thereafter.

I've always found Perl's approach a more balanced way to tackle this
problem (even though Perl's OO system as a whole suffers from other,
shall we say, idiosyncrasies). In Perl, objects start out as arbitrary
key-value pairs, and nothing differentiates them from a regular AA until
you call the 'bless' built-in function on them, at which point they
become "officially" a member of some particular class. This neatly
sidesteps the whole ctor mess: you can initialize the initial AA with
whatever values you want, in whatever order you want. When you finally
"kicked it into shape", as the cited paper puts it, you "promote" that
set of key-value pairs into an "official" member of the class, and
thereafter, you can't simply modify fields anymore except through class
methods. This means you now have the possibility of enforcing invariants
on the object without crippling the flexibility of constructing it.
(Well, OK, in Perl, this last bit isn't necessarily true, but in an
ideal implementation of this initialize-bless-use approach, the object's
fields would become non-public after being blessed and can only be
updated by "official" object methods.)

In the spirit of this approach, I've written some C++ code in the past
that looked something like this:

	class BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args {
			int baseparm1, baseparm2;
		};
		BaseClass(Args args) {
			// initialize object based on fields in
			// BaseClass::Args.
		}
	};

	class MyClass : public BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args : BaseClass::Args {
			int parm1, parm2;
		};

		MyClass(Args args) : BaseClass(args) {
			// initialize object based on fields in args
		}
	};

Basically, the Args structs let the user set up whatever values they
want to, in whatever order they wish, then they are "blessed" into real
class instances by the ctor. Encapsulating ctor arguments in these
structs alleviates the problem of proliferating ctor arguments as the
class hierarchy grows: each derived class simply hands off the Args
struct (which is itself in a hierarchy that parallels that of the
classes) to the base class ctor. All ctors in the class hierarchy needs
only a single (polymorphic) argument.

This approach also localizes the changes required when you modify base
class arguments -- in the old way of having multiple ctor arguments,
adding or changing arguments to the base class ctor requires you to
update every single derived class ctor accordingly -- very bad. But
here, adding a new field to BaseClass::Args requires zero changes to all
derived classes, which is a Good Thing(tm).

In some cases, if the class in relatively simple, the private members of
the class can simply be themselves an instance of the Args struct, so
the ctor could be nothing more than just:

	MyClass(Args args) : BaseClass(args), myArgs(args) {}

which gets rid of that silly baroque dance of naming ctor arguments as
_a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c (which can be
rather error prone if you mistype a _ somewhere or forget to assign one
of the members). Since the private copy of Args is not accessible from
outside, class methods can use the values freely without having to worry
about inconsistent states -- the ctor can check class invariants before
creating the class object, ensuring that the internal copy of Args is in
a consistent state.

The Args structs themselves, of course, can have ctors that setup sane
default values for each field, so that lazy users can simply call:

	MyClass *obj = new MyClass(MyClass::Args());

and get a working, consistent class object with default settings. This
way of setting default values also lets the user only change fields that
they don't want to use default values for, rather than be constricted by
the order of ctor default arguments: if you're unlucky enough to need a
non-default value in a later parameter, you're forced to repeat the
default values for everything that comes before it.

In D, this approach isn't quite as nice, because D structs don't have
inheritance, so you can't simply pass Args from derived class to base
class. You'd have to explicitly do something like:

	class BaseClass {
	public:
		struct Args { ...  }
		this(Args args) { ... }
	}

	class MyClass {
	public:
		struct Args {
			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
			...
		}
		this(Args args) {
			super(args.base);	// <-- more verbose than just super(args);
			...
		}
	}

Initializing the args also isn't as nice, since user code will have to
know exactly which fields are in .base and which aren't. You can't just
write, like in C++:

	// C++
	MyClass::Args args;
	args.basefield1 = 123;
	args.field2 = 321;

you'd have to write, in D:

	// D
	MyClass.Args args;
	args.base.basefield1 = 123;
	args.field2 = 321;

which isn't as nice in terms of encapsulation, since ideally user code
should need to care about the exact boundaries between base class and
derived class.

I haven't really thought about how this might be made nicer in D,
though.


T

-- 
I am Ohm of Borg. Resistance is voltage over current.

Jul 15 2013

"Meta" <jared771 gmail.com> writes:

On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
 I consider myself to be a "systematic" programmer (according to 
 the
 definition in the paper); I can work equally well with ctors 
 with
 arguments vs. create-set-call objects. But I find that 
 mandatory ctors
 with arguments are a pain to work with, *both* to write and to 
 use.

I also find constructors with multiple arguments a pain to use. 
They get difficult to maintain as your project grows. One of my 
pet projects has a very shallow class hierarchy, but the 
constructors of each object down the tree have many arguments, 
with descendants adding on even more. It gets to be a real 
headache when you have more than 3 constructors per class to deal 
with base class overloads, multiple arguments, etc.

 On the usability side, there's the mental workload of having to 
 remember
 which order the arguments appear in (or look it up in the IDE, 
 or
 whatever -- the point is that I can't just type the ctor call 
 straight
 from my head). Then there's the problem of needing to create 
 objects
 required by the ctor before you can call the ctor. In some 
 cases, this
 can be inconvenient -- I always have to remember to setup and 
 create
 other objects before I can create this one, because its ctor 
 requires
 said objects as arguments. Then there's the lack of 
 flexibility: no
 matter what you do, it seems that anything that requires more 
 than a
 single ctor argument inevitably becomes either (1) too complex,
 requiring too many arguments, and therefore very difficult to 
 use, or
 (2) too simplistic, and therefore unable to do some things that 
 I may
 want to do (e.g. some fields are default-initialized with no 
 way to
 specify the initial values of the fields, 'cos otherwise the 
 ctor would
 have too many arguments). No matter what you do, it seems almost
 impossible to come up with an ideal ctor except in trivial 
 cases where
 it requires only 1 argument or is a default ctor.

Having to create other objects to pass to a constructor is 
particularly painful. You'd better pray that they have trivial 
constructors, or else things can get hairy really fast. Multiple 
nested constructors can also create a large amount of code bloat. 
Once the constructor grows large enough, I generally put each 
argument on its own line to ensure that it's clear what I'm 
calling it with. This has the unfortunate side effect of making 
the call span multiple lines. In my opinion, a constructor 
requiring more than 10 lines is an unsightly abomination.

 On the writability side, one of my pet peeves is base class 
 ctors that
 require multiple arguments. Every level of inheritance 
 inevitably adds
 more arguments each time, and by the time you're 5-6 levels 
 down the
 class hierarchy, your ctor calls just have an unmanageable 
 number of
 parameters. Not to mention the violation of DRY by requiring 
 much
 redundant typing just to pass arguments from the inherited 
 class' ctor
 up the class hierarchy. Tons of bugs to be had everywhere, 
 given the
 amount of repeated typing needed.

 In the simplest cases, of course, these aren't big issues, but 
 this kind
 of ctor design is clearly not scalable.

 OTOH, the create-set-call pattern isn't panacea either. One of 
 the
 biggest problems with this pattern is that you can't guarantee 
 your
 objects are in a consistent state at all times. This is very 
 bad,
 because all your methods will have to check if some value has 
 been set
 yet, before it uses it. This adds a lot of complexity that 
 could've been
 avoided had everything been set at ctor-time. This also makes 
 class
 invariants needlessly complex. Moreover, I've seen many classes 
 in this
 category exhibit undefined behaviour if you call a 
 value-setting method
 after you start using the object. Too many classes falsely 
 assume that
 you will always call set methods and then "use" methods in that 
 order.
 If you call a set method after calling a "use" method, you're 
 quite
 likely to run into bugs in the class, e.g. part of the object's 
 state
 doesn't reflect the new value you set, because the "use" 
 methods were
 written with the assumption that when they were called the 
 first time,
 the values you set earlier won't change thereafter.

I've found that a good way to keep constructors manageable is to 
use the builder pattern. Create a builder object that has its 
fields set by the programmer, which is then passed to the 'real' 
object for construction. You can provide default arguments, 
optional arguments, etc. Combine this with a fluid interface and 
I think it looks a lot better. Of course, this has the 
disadvantage of requiring a *lot* of boilerplate, but I think 
this could be okay in D, as a builder class is exactly the kind 
of thing that can be automatically generated.

 I've always found Perl's approach a more balanced way to tackle 
 this
 problem (even though Perl's OO system as a whole suffers from 
 other,
 shall we say, idiosyncrasies). In Perl, objects start out as 
 arbitrary
 key-value pairs, and nothing differentiates them from a regular 
 AA until
 you call the 'bless' built-in function on them, at which point 
 they
 become "officially" a member of some particular class. This 
 neatly
 sidesteps the whole ctor mess: you can initialize the initial 
 AA with
 whatever values you want, in whatever order you want. When you 
 finally
 "kicked it into shape", as the cited paper puts it, you 
 "promote" that
 set of key-value pairs into an "official" member of the class, 
 and
 thereafter, you can't simply modify fields anymore except 
 through class
 methods. This means you now have the possibility of enforcing 
 invariants
 on the object without crippling the flexibility of constructing 
 it.
 (Well, OK, in Perl, this last bit isn't necessarily true, but 
 in an
 ideal implementation of this initialize-bless-use approach, the 
 object's
 fields would become non-public after being blessed and can only 
 be
 updated by "official" object methods.)

 In the spirit of this approach, I've written some C++ code in 
 the past
 that looked something like this:

 	class BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args {
 			int baseparm1, baseparm2;
 		};
 		BaseClass(Args args) {
 			// initialize object based on fields in
 			// BaseClass::Args.
 		}
 	};

 	class MyClass : public BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args : BaseClass::Args {
 			int parm1, parm2;
 		};

 		MyClass(Args args) : BaseClass(args) {
 			// initialize object based on fields in args
 		}
 	};

 Basically, the Args structs let the user set up whatever values 
 they
 want to, in whatever order they wish, then they are "blessed" 
 into real
 class instances by the ctor. Encapsulating ctor arguments in 
 these
 structs alleviates the problem of proliferating ctor arguments 
 as the
 class hierarchy grows: each derived class simply hands off the 
 Args
 struct (which is itself in a hierarchy that parallels that of 
 the
 classes) to the base class ctor. All ctors in the class 
 hierarchy needs
 only a single (polymorphic) argument.

 This approach also localizes the changes required when you 
 modify base
 class arguments -- in the old way of having multiple ctor 
 arguments,
 adding or changing arguments to the base class ctor requires 
 you to
 update every single derived class ctor accordingly -- very bad. 
 But
 here, adding a new field to BaseClass::Args requires zero 
 changes to all
 derived classes, which is a Good Thing(tm).

 In some cases, if the class in relatively simple, the private 
 members of
 the class can simply be themselves an instance of the Args 
 struct, so
 the ctor could be nothing more than just:

 	MyClass(Args args) : BaseClass(args), myArgs(args) {}

 which gets rid of that silly baroque dance of naming ctor 
 arguments as
 _a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c 
 (which can be
 rather error prone if you mistype a _ somewhere or forget to 
 assign one
 of the members). Since the private copy of Args is not 
 accessible from
 outside, class methods can use the values freely without having 
 to worry
 about inconsistent states -- the ctor can check class 
 invariants before
 creating the class object, ensuring that the internal copy of 
 Args is in
 a consistent state.

 The Args structs themselves, of course, can have ctors that 
 setup sane
 default values for each field, so that lazy users can simply 
 call:

 	MyClass *obj = new MyClass(MyClass::Args());

 and get a working, consistent class object with default 
 settings. This
 way of setting default values also lets the user only change 
 fields that
 they don't want to use default values for, rather than be 
 constricted by
 the order of ctor default arguments: if you're unlucky enough 
 to need a
 non-default value in a later parameter, you're forced to repeat 
 the
 default values for everything that comes before it.

 In D, this approach isn't quite as nice, because D structs 
 don't have
 inheritance, so you can't simply pass Args from derived class 
 to base
 class. You'd have to explicitly do something like:

 	class BaseClass {
 	public:
 		struct Args { ...  }
 		this(Args args) { ... }
 	}

 	class MyClass {
 	public:
 		struct Args {
 			BaseClass.Args base;	// <-- explicit inclusion of 
 BaseClass.Args
 			...
 		}
 		this(Args args) {
 			super(args.base);	// <-- more verbose than just super(args);
 			...
 		}
 	}

 Initializing the args also isn't as nice, since user code will 
 have to
 know exactly which fields are in .base and which aren't. You 
 can't just
 write, like in C++:

 	// C++
 	MyClass::Args args;
 	args.basefield1 = 123;
 	args.field2 = 321;

 you'd have to write, in D:

 	// D
 	MyClass.Args args;
 	args.base.basefield1 = 123;
 	args.field2 = 321;

 which isn't as nice in terms of encapsulation, since ideally 
 user code
 should need to care about the exact boundaries between base 
 class and
 derived class.

 I haven't really thought about how this might be made nicer in 
 D,
 though.


 T

See above, this is basically the builder pattern. It's a neat 
trick, giving your args objects a class hierarchy of their own. I 
think that one drawback of that, however, is that now you have to 
maintain *two* class hierarchies. Have you found this to be a 
problem in practice?

As an aside, you could probably simulate the inheritance of the 
args objects in D either with alias this or even opDispatch. 
Still, this means that you need to nest the structs within 
each-other, and this could get silly after 2-3 "generations" of 
args objects.

Jul 15 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 16, 2013 at 03:54:27AM +0200, Meta wrote:
 On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
I consider myself to be a "systematic" programmer (according to the
definition in the paper); I can work equally well with ctors with
arguments vs. create-set-call objects. But I find that mandatory
ctors with arguments are a pain to work with, *both* to write and to
use.

 
 I also find constructors with multiple arguments a pain to use. They
 get difficult to maintain as your project grows. One of my pet
 projects has a very shallow class hierarchy, but the constructors of
 each object down the tree have many arguments, with descendants
 adding on even more. It gets to be a real headache when you have
 more than 3 constructors per class to deal with base class
 overloads, multiple arguments, etc.

Yeah, when every level of the hierarchy introduces 2-3 new overloads of
the ctor, you get an exponential explosion of derived class ctors if you
want to account for all possibilities. Most of the time, you just end up
oversimplifying 'cos anything else is simply unmanageable.


[...]
 Having to create other objects to pass to a constructor is
 particularly painful. You'd better pray that they have trivial
 constructors, or else things can get hairy really fast. Multiple
 nested constructors can also create a large amount of code bloat.
 Once the constructor grows large enough, I generally put each
 argument on its own line to ensure that it's clear what I'm calling
 it with. This has the unfortunate side effect of making the call
 span multiple lines. In my opinion, a constructor requiring more
 than 10 lines is an unsightly abomination.

I usually bail out way before then. :) A 10-line ctor call is just
unpalatable.


[...]
 I've found that a good way to keep constructors manageable is to use
 the builder pattern. Create a builder object that has its fields set
 by the programmer, which is then passed to the 'real' object for
 construction. You can provide default arguments, optional arguments,
 etc. Combine this with a fluid interface and I think it looks a lot
 better. Of course, this has the disadvantage of requiring a *lot* of
 boilerplate, but I think this could be okay in D, as a builder class
 is exactly the kind of thing that can be automatically generated.

In my C++ version of this, you could even just reuse the builder object
directly, since it's just a struct containing ctor arguments. But yeah,
there's some amount boilerplate necessary.


[...]
In the spirit of this approach, I've written some C++ code in the
past that looked something like this:

	class BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args {
			int baseparm1, baseparm2;
		};
		BaseClass(Args args) {
			// initialize object based on fields in
			// BaseClass::Args.
		}
	};

	class MyClass : public BaseClass {
	public:
		// Encapsulate ctor arguments
		struct Args : BaseClass::Args {
			int parm1, parm2;
		};

		MyClass(Args args) : BaseClass(args) {
			// initialize object based on fields in args
		}
	};


[...]
 See above, this is basically the builder pattern. It's a neat trick,
 giving your args objects a class hierarchy of their own. I think that
 one drawback of that, however, is that now you have to maintain *two*
 class hierarchies. Have you found this to be a problem in practice?

Well, there *is* a certain amount of boilerplate, to be sure, so it
isn't a perfect solution. But nesting the structs inside the class they
correspond with helps to prevent mismatches between the two hierarchies.
It also allows reusing the name "Args" so that you don't have to invent
a whole new set of names just for these builders. Minimizing these
differences makes it less likely to make a mistake and inherit Args from
the wrong base class, for example.

In fact, now that I think of this, in D this could actually work out
even better, since you could just write:

	class MyClass : BaseClass {
	public:
		class Args : typeof(super).Args {
			int parm1 = 1;
			int parm2 = 2;
		}

		this(Args args) {
			super(args);
			...
		}
	}

The compile-time introspection allows you to just write "class Args :
typeof(super).Args" consistently for all such builders, so you never
have to worry about inventing new names or mismatches in the two
hierarchies. The "typeof(super).Args" will automatically pick up the
correct base class Args to inherit from, even if you shuffle the classes
around the hierarchy. Furthermore, since the declaration is exactly
identical across the board (except for the actual fields), you could
just factor this into a mixin and thereby minimize the boilerplate.

The only major disadvantage in the D version is that you can't use
structs, but you have to allocate the Args objects on the GC heap, so
you may end up generating lots of GC garbage. If only D structs had
inheritance, this would've been a much cleaner solution.


 As an aside, you could probably simulate the inheritance of the args
 objects in D either with alias this or even opDispatch. Still, this
 means that you need to nest the structs within each-other, and this
 could get silly after 2-3 "generations" of args objects.

Hmm. This is a good idea! And with a mixin, this may not turn out so bad
after all. Maybe start with something like this:

	class BaseClass {
	public:
		struct Args {
			int baseparm1 = 1;
			int baseparm2 = 2;
			...
		}
	}

	class MyClass : BaseClass {
	public:
		struct Args {
			typeof(super).Args base;
			alias base this;

			int parm1 = 1;
			int parm2 = 2;
			...
		}
		this(Args args) {
			super(args);	// works 'cos of alias this
		}
	}

	void main() {
		MyClass.Args args;
		args.baseparm1 = 2;	// works 'cos of alias this
		args.parm1 = 3;
		auto obj = new MyClass(args);
	}

Using alias this, we have the nice effect that user code no longer needs
to refer to the .base member of the structs, and indeed, doesn't need to
know about it. So this is effectively like struct inheritance... heh,
cool. Just discovered a new trick in D: struct inheritance using alias
this. :)

The boilerplate can be put into a mixin, say something like this:

	mixin template BuilderArgs(string fields) {
		struct Args {
			typeof(super).Args base;
			alias base this;
			mixin(fields);
		}
	};

	class MyClass : BaseClass {
	public:
		// Hmm, doesn't look too bad!
		mixin BuilderArgs!(q{
			int parm1 = 1;
			int parm2 = 2;
		});
		this(Args args) {
			super(args);
			...
		}
	}

	class AnotherClass : BaseClass {
	public:
		// N.B. Looks exactly the same like MyClass.args except
		// for the fields! The template automatically picks up
		// the right base class Args to "inherit" from.
		mixin BuilderArgs!(q{
			string anotherparm1 = "abc";
			string anotherparm2 = "def";
		});
		this(Args args) {
			super(args);
			...
		}
	}

Not bad at all!  Though, I haven't actually tested any of this code, so
I've no idea if it will actually work yet. But it certainly looks
promising! I'll give it a spin tomorrow morning (way past my bedtime
now).


T

-- 
Meat: euphemism for dead animal. -- Flora

Jul 16 2013

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 16 July 2013 at 08:19:10 UTC, H. S. Teoh wrote:
 Just discovered a new trick in D: struct inheritance using alias
 this. :)

Wasn't this stated in TDPL as one of primary design rationales 
behind "alias this"? :)

Jul 16 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 16, 2013 at 11:18:31AM +0200, Dicebot wrote:
 On Tuesday, 16 July 2013 at 08:19:10 UTC, H. S. Teoh wrote:
Just discovered a new trick in D: struct inheritance using alias
this. :)

 
 Wasn't this stated in TDPL as one of primary design rationales behind
 "alias this"? :)

Haha, you're right. I read it before but apparently the only thing that
stuck in my mind is that alias this is to allow a type to masquerade as
another type. But looking at the relevant sections again, Andrei did
describe it as "subtyping", both w.r.t. classes and structs. Touch�. :)


T

-- 
Mediocrity has been pushed to extremes.

Jul 16 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 16, 2013 at 01:17:30AM -0700, H. S. Teoh wrote:
[...]
 	mixin template BuilderArgs(string fields) {
 		struct Args {
 			typeof(super).Args base;
 			alias base this;
 			mixin(fields);
 		}
 	};
 
 	class MyClass : BaseClass {
 	public:
 		// Hmm, doesn't look too bad!
 		mixin BuilderArgs!(q{
 			int parm1 = 1;
 			int parm2 = 2;
 		});
 		this(Args args) {
 			super(args);
 			...
 		}
 	}
 
 	class AnotherClass : BaseClass {
 	public:
 		// N.B. Looks exactly the same like MyClass.args except
 		// for the fields! The template automatically picks up
 		// the right base class Args to "inherit" from.
 		mixin BuilderArgs!(q{
 			string anotherparm1 = "abc";
 			string anotherparm2 = "def";
 		});
 		this(Args args) {
 			super(args);
 			...
 		}
 	}
 
 Not bad at all!  Though, I haven't actually tested any of this code, so
 I've no idea if it will actually work yet. But it certainly looks
 promising! I'll give it a spin tomorrow morning (way past my bedtime
 now).

[...]

Yep, confirmed that this code actually works! Here's the actual test
code that I wrote:

	import std.stdio;

	mixin template CtorArgs(string fields) {
		struct Args {
			static if (!is(typeof(super) == Object)) {
				typeof(super).Args base;
				alias base this;
			}
			mixin(fields);
		}
	}

	class Base {
	private:
		int sum;
	public:
		mixin CtorArgs!(q{
			int basefield1 = 1;
			int basefield2 = 2;
		});
		this(Args args) {
			sum = args.basefield1 + args.basefield2;
		}
		int getResult() {
			return sum;
		}
	}

	class Derived : Base {
		int derivedSum;
	public:
		mixin CtorArgs!(q{
			int parm1 = 3;
			int parm2 = 4;
		});
		this(Args args) {
			super(args);
			derivedSum = args.parm1 + args.parm2;
		}
		override int getResult() {
			return super.getResult() + derivedSum;
		}
	}

	class AnotherDerived : Base {
	private:
		int anotherSum;
	public:
		mixin CtorArgs!(q{
			int another1 = 5;
			int another2 = 6;
		});
		this(Args args) {
			super(args);
			anotherSum = args.another1 + args.another2;
		}
		override int getResult() {
			return super.getResult() + anotherSum;
		}
	}

	// Test usage in a deeper hierarchy
	class VeryDerived : AnotherDerived {
		int divisor;
	public:
		mixin CtorArgs!(q{
			int divisor = 5;
		});
		this(Args args) {
			super(args);
			this.divisor = args.divisor;
		}
		override int getResult() {
			return super.getResult() / divisor;
		}
	}

	void main() {
		Derived.Args args1;
		args1.basefield1 = 10;
		args1.parm1 = 20;
		auto obj1 = new Derived(args1);
		assert(obj1.getResult() == 10 + 2 + 20 + 4);

		AnotherDerived.Args args2;
		args2.basefield2 = 20;
		args2.another1 = 30;
		auto obj2 = new AnotherDerived(args2);
		assert(obj2.getResult() == 1 + 20 + 30 + 6);

		VeryDerived.Args args3;
		args3.divisor = 7;
		auto obj3 = new VeryDerived(args3);
		assert(obj3.getResult() == 2);
	}

Note the nice thing about this: you can construct the ctor arguments
(har har) in any order you like, and it Just Works. Referencing ctor
parameters of base class ctors is just as easy; no need for ugliness
like "args.base.base.base.baseparm1" thanks to alias this.  The ctors
themselves just hand Args over to the base class: alias this makes the
struct inheritance pretty transparent. The mixin line itself is
identical across the board, thanks to the static if in the mixin
template, so you can actually re-root the class hierarchy or otherwise
move classes around the hierarchy without having to re-wire any of the
Args handling, and things will Just Work.

Wow. So not only this technique works in D, it's working much *better*
than my original C++ code! I think I shall add this to my personal D
library. :) (Unless people think this is Phobos material.)


T

-- 
People who are more than casually interested in computers should have at
least some idea of what the underlying hardware is like. Otherwise the
programs they write will be pretty weird. -- D. Knuth

Jul 16 2013

Jacob Carlborg <doob me.com> writes:

On 2013-07-16 00:27, H. S. Teoh wrote:

 In the spirit of this approach, I've written some C++ code in the past
 that looked something like this:

 	class BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args {
 			int baseparm1, baseparm2;
 		};
 		BaseClass(Args args) {
 			// initialize object based on fields in
 			// BaseClass::Args.
 		}
 	};

 	class MyClass : public BaseClass {
 	public:
 		// Encapsulate ctor arguments
 		struct Args : BaseClass::Args {
 			int parm1, parm2;
 		};

 		MyClass(Args args) : BaseClass(args) {
 			// initialize object based on fields in args
 		}
 	};

 Basically, the Args structs let the user set up whatever values they
 want to, in whatever order they wish, then they are "blessed" into real
 class instances by the ctor. Encapsulating ctor arguments in these
 structs alleviates the problem of proliferating ctor arguments as the
 class hierarchy grows: each derived class simply hands off the Args
 struct (which is itself in a hierarchy that parallels that of the
 classes) to the base class ctor. All ctors in the class hierarchy needs
 only a single (polymorphic) argument.

That's actually quite cleaver.

 In D, this approach isn't quite as nice, because D structs don't have
 inheritance, so you can't simply pass Args from derived class to base
 class. You'd have to explicitly do something like:

 	class BaseClass {
 	public:
 		struct Args { ...  }
 		this(Args args) { ... }
 	}

 	class MyClass {
 	public:
 		struct Args {
 			BaseClass.Args base;	// <-- explicit inclusion of BaseClass.Args
 			...
 		}
 		this(Args args) {
 			super(args.base);	// <-- more verbose than just super(args);
 			...
 		}
 	}

 Initializing the args also isn't as nice, since user code will have to
 know exactly which fields are in .base and which aren't. You can't just
 write, like in C++:

 	// C++
 	MyClass::Args args;
 	args.basefield1 = 123;
 	args.field2 = 321;

 you'd have to write, in D:

 	// D
 	MyClass.Args args;
 	args.base.basefield1 = 123;
 	args.field2 = 321;

 which isn't as nice in terms of encapsulation, since ideally user code
 should need to care about the exact boundaries between base class and
 derived class.

 I haven't really thought about how this might be made nicer in D,
 though.

On the other hand D supports the following syntax:

MyClass.Args args = { field1: 1, field2: 2 };

Unfortunately that syntax doesn't work for function calls.

-- 
/Jacob Carlborg

Jul 16 2013

"deadalnix" <deadalnix gmail.com> writes:

My policy is to require the bare minimum to construct a valid 
object, in order to avoid initialization hell.

Not knowing what/when to initialize thing is really painful as 
well. It also introduce sequential coupling and wrongly 
initialized object tends to explode far away from their 
construction point.

What goes in this category ? Any state that can't have any 
default value that make sense, as well as any state that is 
expansive to initialize.

Jul 16 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/16/2013 01:30 AM, deadalnix wrote:
 My policy is to require the bare minimum to construct a valid object, in
 order to avoid initialization hell.

+0.33

 Not knowing what/when to initialize thing is really painful as well. It
 also introduce sequential coupling and wrongly initialized object tends
 to explode far away from their construction point.

+0.33

 What goes in this category ? Any state that can't have any default value
 that make sense, as well as any state that is expansive to initialize.

+0.33

And to complete: +0.01 :p

Ali

Jul 17 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Mon, 15 Jul 2013 20:06:38 +0100, Meta <jared771 gmail.com> wrote:

 I saw an interesting post on Hacker News about constructors in OO  
 languages. Apparently they are a real stumbling block for some  
 programmers, which was quite a surprise to me. I think this might be  
 relevant to a discussion about named parameters and whether we should  
 ditch constructors for another kind of construct.

 Link to the newsgroup post, the link to the paper is near the top:
 http://erlang.org/pipermail/erlang-questions/2012-March/065519.html

First thought;  constructors with positional arguments aren't any  
different to methods or functions with positional arguments WRT  
remembering the arguments.  The difficulties with one are the same as with  
another - you need to remember them, or look them up, or get help from  
intellisense.

I think the point about constructed objects being in valid states is the  
important one.  If the object requires N arguments which cannot be  
sensibly defaulted, then IMO they /have/ to be specified at construction,  
and should not be delayed as in the create-set-call style mentioned.

Granted, A create-set-call style object could throw detailed/useful  
messages when used before initialisation, but that's a runtime check so  
IMO not a great solution to the issue.

Also, I find compelling the issue that a create-set-call style object with  
N required set calls could be initialised N! ways, and that each of these  
different orderings have effectively the same semantic meaning.. so it  
becomes a lot harder to see what is really happening.  Add to that, that  
someone could interleave the initialisation of another object into the  
first and .. well .. shudder.

So, given the desire to have objects constructed in a valid state, and  
given the restriction that this may require N arguments which cannot be  
defaulted how do you alleviate the problem of having to remember the  
parameters required and the ordering of those parameters?

Named parameters only help up to a point.  Like ordered parameters you  
need to remember which parameters are required, all that has changed is  
that instead of remembering their order you have to remember their names.   
So, IMO this doesn't really solve the problem at all.

A lot can be done with sufficiently clever intellisense in either case  
(ordered/named parameters), but is there anything which can be done  
without it using just a text editor and compiler?

Or, perhaps another way to ask a similar W is.. can the compiler  
statically verify that a create-set-call style object has been  
initialised, or rather that an attempt has at least been made to  
initialise all the required parts.

We have class invariants.. these define the things which must be  
initialised to reach a valid state.  If we had compiler recognisable  
properties as well, then we could have an initialise construct like..

class Foo
{
   string name;
   int age;

   invariant
   {
     assert(name != null);
     assert(age > 0);
   }

   property string Name...
   property int Age...
}

void main()
{
   Foo f = new Foo() {
     Name = "test",    // calls property Name setter
     Age = 12          // calls property Age setter
   };
}

The compiler could statically verify that the variables tested in the  
invariant (name, age) were set (by setter properies) inside the initialise  
construct {} following the new Foo().

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 16 2013

"Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:

On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

 We have class invariants.. these define the things which must 
 be initialised to reach a valid state.  If we had compiler 
 recognisable properties as well, then we could have an 
 initialise construct like..

 class Foo
 {
   string name;
   int age;

   invariant
   {
     assert(name != null);
     assert(age > 0);
   }

   property string Name...
   property int Age...
 }

 void main()
 {
   Foo f = new Foo() {
     Name = "test",    // calls property Name setter
     Age = 12          // calls property Age setter
   };
 }

 The compiler could statically verify that the variables tested 
 in the invariant (name, age) were set (by setter properies) 
 inside the initialise construct {} following the new Foo().

 R

How do you envision this working where Name or Age must be set to
a value not known at compile time?

Jul 16 2013

"Dicebot" <public dicebot.lv> writes:

On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

Contracts are run-time entities (omitted in release AFAIR).

Jul 16 2013

"Wyatt" <wyatt.epp gmail.com> writes:

On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

I'm not sure if it's practical or covers all the bases, but it 
sounds like you would need to keep track of member initialisation 
during compilation, and abort if the code attempts to use the 
object or one of its members as an AssignExpression without 
initialising the whole thing.

Setting aside the fact that there's compiler work mentioned at 
all, have I missed some nuance of this pattern?  I guess there's 
the situation where you conditionally may or may not assign, or 
pass it around and accrete mutations, so it might be best to only 
do it for some properly-annotated (how?) subset of the whole?  
Not sure.

-Wyatt

Jul 16 2013

"Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:

On Tuesday, 16 July 2013 at 16:07:30 UTC, Wyatt wrote:
 On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh 
 wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

 I'm not sure if it's practical or covers all the bases, but it 
 sounds like you would need to keep track of member 
 initialisation during compilation, and abort if the code 
 attempts to use the object or one of its members as an 
 AssignExpression without initialising the whole thing.

 Setting aside the fact that there's compiler work mentioned at 
 all, have I missed some nuance of this pattern?  I guess 
 there's the situation where you conditionally may or may not 
 assign, or pass it around and accrete mutations, so it might be 
 best to only do it for some properly-annotated (how?) subset of 
 the whole?  Not sure.

 -Wyatt

Jul 16 2013

"Craig Dillabaugh" <cdillaba cg.scs.careton.ca> writes:

On Tuesday, 16 July 2013 at 16:07:30 UTC, Wyatt wrote:
 On Tuesday, 16 July 2013 at 13:35:00 UTC, Craig Dillabaugh 
 wrote:
 How do you envision this working where Name or Age must be set 
 to
 a value not known at compile time?

 I'm not sure if it's practical or covers all the bases, but it 
 sounds like you would need to keep track of member 
 initialisation during compilation, and abort if the code 
 attempts to use the object or one of its members as an 
 AssignExpression without initialising the whole thing.

 Setting aside the fact that there's compiler work mentioned at 
 all, have I missed some nuance of this pattern?  I guess 
 there's the situation where you conditionally may or may not 
 assign, or pass it around and accrete mutations, so it might be 
 best to only do it for some properly-annotated (how?) subset of 
 the whole?  Not sure.

 -Wyatt

Sorry for the empty post (previous).  In general, I think the 
proposed idea is quite nice, and as Dicebot pointed out, my 
initial concern was misguided because the invariant is evaluated 
at runtime, not compile time (and Dicebot, I checked the docs, 
and you are correct about it getting stripped for release builds).

Jul 16 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh  
<cdillaba cg.scs.careton.ca> wrote:

 On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

 clip

 We have class invariants.. these define the things which must be  
 initialised to reach a valid state.  If we had compiler recognisable  
 properties as well, then we could have an initialise construct like..

 class Foo
 {
   string name;
   int age;

   invariant
   {
     assert(name != null);
     assert(age > 0);
   }

   property string Name...
   property int Age...
 }

 void main()
 {
   Foo f = new Foo() {
     Name = "test",    // calls property Name setter
     Age = 12          // calls property Age setter
   };
 }

 The compiler could statically verify that the variables tested in the  
 invariant (name, age) were set (by setter properies) inside the  
 initialise construct {} following the new Foo().

 R

 How do you envision this working where Name or Age must be set to
 a value not known at compile time?

The idea isn't to run the invariant itself at compile time - as you say, a  
runtime only value may be used.  In fact, in the example above the  
compiler would have to hold off running the invariant until the closing }  
of that initialise statement or it may fail.

The idea was to /use/ the code in the invariant to determine which member  
fields should be set during the initialisation statement and then  
statically verify that a call was made to some member function to set  
them.  The actual values set aren't important, just that some attempt has  
been made to set them.  That's about the limit of what I think you could  
do statically, in the general case.

In some specific cases we could extend this to say that if all the values  
set were evaluable at compile time, then we could actually run the  
invariant using CTFE, perhaps.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 16 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh
 <cdillaba cg.scs.careton.ca> wrote:
 
On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

We have class invariants.. these define the things which must be
initialised to reach a valid state.  If we had compiler
recognisable properties as well, then we could have an
initialise construct like..

class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



Maybe I'm missing something obvious, but isn't this essentially the same
thing as having named ctor parameters?


[...]
 The idea was to /use/ the code in the invariant to determine which
 member fields should be set during the initialisation statement and
 then statically verify that a call was made to some member function
 to set them.  The actual values set aren't important, just that some
 attempt has been made to set them.  That's about the limit of what I
 think you could do statically, in the general case.

[...]

This seems to be the same thing as using named parameters: assuming the
compiler actually supported such a thing, it would be able to tell at
compile-time whether all required named parameters have been specified,
and abort if not. There would be no need for any invariant-based
guessing of what fields are required and what aren't, and no need for
adding any property feature to the language -- the function signature of
the ctor itself indicates what is required, and the compiler can check
this at compile-time. (Of course, actual verification of the ctor
parameters can only happen at runtime -- which is OK.)

This still doesn't address the issue of ctor argument proliferation,
though: if each level of the class hierarchy adds 1-2 additional
parameters, you still need to write tons of boilerplate in your derived
classes to percolate those additional parameters up the inheritance
tree. If a base class ctor requires parameters parmA, parmB, parmC, then
any derived class ctor must declare at least parmA, parmB, parmC in
their function signature (or provide default values for them), and you
must still write super(parmA, parmB, parmC) in order to percolate these
parameters to the base class. If the derived class requires additional
parameters, say parmD, then that's added on top of all of the base class
ctor arguments. And any further derived class will now have to declare
at least parmA, parmB, parmC, parmD, and then tack on any additional
parameters they may need. This is not scalable -- deeply derived classes
will have ctors with ridiculous numbers of arguments.

Now imagine if at some point you need to change some base class ctor
parameters. Now instead of making a single change to the base class, you
have to update every single derived class to make the same change to
every ctor, so that the new version of the parameter (or new parameter)
is properly percolated up the inheritance tree. This defeats the goal in
OOP of restricting the scope of changes to only localized changes. This
is especially bad when you need to add an *optional* parameter to the
base class: you have to do all that work of updating every single
derived class yet most of the code that uses those derived classes don't
even care about this new parameter! That's a lot of work for almost no
benefit. (And you can't get away without doing it either, since a user
of a derived class may at some point want to customize that optional
base class parameter, so *all* derived class ctors must also declare it
as an optional parameter.)

I think my approach of using builder structs with a parallel inheritance
tree is still better: adding/removing/changing parameters to a base
class's builder struct automatically propagates to all derived classes
with no further code change. With the help of mixin templates, the
amount of boilerplate is greatly reduced. And thanks to the use of
typeof(super), you can even shuffle classes around your class hierarchy
without needing to change anything more than the base class name in the
class declaration -- the mixin automatically picks up the right base
class builder struct to inherit from, thus guaranteeing that the
parallel hierarchy is consistent at all times.

The only weakness I can see is that mandatory arguments with no
reasonable default values can't be easily handled. In the simple cases,
you can expand the mixin to allow you to specify builder struct ctors
that have required arguments; but then this suffers from the same
scalability problems that we were trying to solve in the first place,
since all derived classes' builder structs will now require mandatory
arguments to be propagated through their ctors. But I think this
shouldn't be a big problem in practice: we can use Nullable fields in
the builder struct and have the class ctor verify that all mandatory
arguments are present, and throw an error if any arguments are not set
properly.


T

-- 
ASCII stupid question, getty stupid ANSI.

Jul 16 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:
 On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 14:34:59 +0100, Craig Dillabaugh
 <cdillaba cg.scs.careton.ca> wrote:

On Tuesday, 16 July 2013 at 09:47:35 UTC, Regan Heath wrote:

clip

We have class invariants.. these define the things which must be
initialised to reach a valid state.  If we had compiler
recognisable properties as well, then we could have an
initialise construct like..

class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



 Maybe I'm missing something obvious, but isn't this essentially the same
 thing as having named ctor parameters?

Yes, if we're comparing this to ctors with named parameters.  I wasn't  
doing that however, I was asking this Q:

"Or, perhaps another way to ask a similar W is.. can the compiler  
statically verify that a create-set-call style object has been  
initialised, or rather that an attempt has at least been made to  
initialise all the required parts."

Emphasis on "create-set-call" :)  The weakness to create-set-call style is  
the desire for a valid object as soon as an attempt can be made to use  
it.  Which implies the need for some sort of enforcement of initialisation  
and as I mentioned in my first post the issue of preventing this  
intialisation being spread out, or intermingled with others and thus  
making the semantics of it harder to see.

My idea here attempted to solve those issues with create-set-call only.

 [...]
 The idea was to /use/ the code in the invariant to determine which
 member fields should be set during the initialisation statement and
 then statically verify that a call was made to some member function
 to set them.  The actual values set aren't important, just that some
 attempt has been made to set them.  That's about the limit of what I
 think you could do statically, in the general case.

 [...]

 This still doesn't address the issue of ctor argument proliferation,
 though

It wasn't supposed to :)  create-set-call ctors have no arguments.

 if each level of the class hierarchy adds 1-2 additional
 parameters, you still need to write tons of boilerplate in your derived
 classes to percolate those additional parameters up the inheritance
 tree.

In the create-set-call style additional required 'arguments' would appear  
as setter member functions whose underlying data member is verified in the  
invariant and would therefore be enforced by the syntax I detailed.

 Now imagine if at some point you need to change some base class ctor
 parameters. Now instead of making a single change to the base class, you
 have to update every single derived class to make the same change to
 every ctor, so that the new version of the parameter (or new parameter)
 is properly percolated up the inheritance tree.

This is one reason why create-set-call might be desirable, no ctor  
arguments, no problem.

So, to take my idea a little further - WRT class inheritance.  The  
compiler, for a derived class, would need to inspect the invariants of all  
classes involved (these are and-ed already), inspect the constructors of  
the derived classes (for calls to initialise members), and the  
initialisation block I described and verify statically that an attempt was  
made to initialise all the members which appear in all the invariants.

 I think my approach of using builder structs with a parallel inheritance
 tree is still better

It may be, it certainly looked quite neat but I haven't had a detailed  
look at it TBH.  I think you've missunderstood my idea however, or rather,  
the issues it was intended to solve :)  Perhaps my idea is too limiting  
for you?  I could certainly understand that point of view.

I think another interesting idea is using the builder pattern with  
create-set-call objects.

For example, a builder template class could inspect the object for UDA's  
indicating a data member which is required during initialisation.  It  
would contain a bool[] to flag each member as not/initialised and expose a  
setMember() method which would call the underlying object setMember() and  
return a reference to itself.

At some point, these setMember() method would want to return another  
template class which contained just a build() member.  I'm not sure how/if  
this is possible in D.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 17 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
 On Tue, 16 Jul 2013 23:01:57 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
On Tue, Jul 16, 2013 at 06:17:48PM +0100, Regan Heath wrote:


[...]
class Foo
{
  string name;
  int age;

  invariant
  {
    assert(name != null);
    assert(age > 0);
  }

  property string Name...
  property int Age...
}

void main()
{
  Foo f = new Foo() {
    Name = "test",    // calls property Name setter
    Age = 12          // calls property Age setter
  };
}



Maybe I'm missing something obvious, but isn't this essentially the
same thing as having named ctor parameters?

 
 Yes, if we're comparing this to ctors with named parameters.  I
 wasn't doing that however, I was asking this Q:
 
 "Or, perhaps another way to ask a similar W is.. can the compiler
 statically verify that a create-set-call style object has been
 initialised, or rather that an attempt has at least been made to
 initialise all the required parts."
 
 Emphasis on "create-set-call" :)  The weakness to create-set-call
 style is the desire for a valid object as soon as an attempt can be
 made to use it.  Which implies the need for some sort of enforcement
 of initialisation and as I mentioned in my first post the issue of
 preventing this intialisation being spread out, or intermingled with
 others and thus making the semantics of it harder to see.

Ah, I see. So basically, you need some kind of enforcement of a
two-state object, pre-initialization and post-initialization. Basically,
the ctor is empty, so you allocate the object first, then set some
values into it, then it "officially" becomes a full-fledged instance of
the class. To prevent problems with consistency, a sharp transition
between setting values and using the object is enforced. Am I right?

I guess my point was that if we boil this down to the essentials, it's
basically the same idea as a builder pattern, just implemented slightly
differently. In the builder pattern, a separate object (or struct, or
whatever) is used to encapsulate the state of the object that we'd like
it to be in, which we then pass to the ctor to create the object in that
state. The idea is the same, though: set up a bunch of values
representing the desired initial state of the object, then, to borrow
Perl's terminology, "bless" it into a full-fledged class instance.


 My idea here attempted to solve those issues with create-set-call only.

Fair enough. I guess my approach was from the angle of trying to address
the problem from the confines of the current language. So, same idea,
different implementation. :)


[...]
The idea was to /use/ the code in the invariant to determine which
member fields should be set during the initialisation statement and
then statically verify that a call was made to some member function
to set them.  The actual values set aren't important, just that some
attempt has been made to set them.  That's about the limit of what I
think you could do statically, in the general case.

[...]

This still doesn't address the issue of ctor argument proliferation,
though

 
 It wasn't supposed to :)  create-set-call ctors have no arguments.

True. But if the ctor call requires a code block that initializes
mandatory initial values, then isn't it essentially the same thing as
ctors that have arguments? If the class hierarchy is deep, and base
classes have mandatory fields to be set, then you still have the same
problem, just in a different manifestation.


if each level of the class hierarchy adds 1-2 additional parameters,
you still need to write tons of boilerplate in your derived classes
to percolate those additional parameters up the inheritance tree.

 
 In the create-set-call style additional required 'arguments' would
 appear as setter member functions whose underlying data member is
 verified in the invariant and would therefore be enforced by the
 syntax I detailed.

What happens when base classes also have required setter member
functions that you must call?


Now imagine if at some point you need to change some base class ctor
parameters. Now instead of making a single change to the base class,
you have to update every single derived class to make the same change
to every ctor, so that the new version of the parameter (or new
parameter) is properly percolated up the inheritance tree.

 
 This is one reason why create-set-call might be desirable, no ctor
 arguments, no problem.

Right.


 So, to take my idea a little further - WRT class inheritance.  The
 compiler, for a derived class, would need to inspect the invariants
 of all classes involved (these are and-ed already), inspect the
 constructors of the derived classes (for calls to initialise
 members), and the initialisation block I described and verify
 statically that an attempt was made to initialise all the members
 which appear in all the invariants.

I see. So basically the user still has to set up all required values
before you can use the object, the advantage being that you don't have
to manually percolate these values up the inheritance tree in the ctors.

It seems to be essentially the same thing as my approach, just
implemented differently. :) In my approach, ctor arguments are
encapsulated inside a struct, currently called Args by convention. So if
you have, say, a class hierarchy where class B inherits from class A,
and A.this() has 5 parameters and B.this() adds another 5 parameters,
then B.Args would have 10 fields. To create an instance of B, the user
would do this:

	B.Args args;
	args.field1 = 10;
	args.field2 = 20;
	...
	auto obj = new B(args);

So in a sense, this isn't that much different from your approach, in
that the user sets a bunch of values desired for the initial state of
the object, then gets a full-fledged object out of it at the end.

In my case, all ctors in the class hierarchy would take a single struct
argument encapsulating all ctor arguments for that class (including
arguments to its respective base class ctors, etc.). So ctors would look
like this:

	class B : A {
		struct Args { ... }
		this(Args args) {
			super(...);
			... // set up object based on values in args
		}
	}

The trick here, then, is that call to super(...). The naïve way of doing
this is to (manually) include base class ctor arguments as part of
B.Args, then in B's ctor, we collect those arguments together in A.Args,
and hand that over to A's ctor. But we can do better. Since A.Args is
already defined, there's no need to duplicate all those fields in
B.Args; we can simply do this:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			... // fields specific to B
		}
		this(Args args) {
			super(args.baseClassArgs);
			...
		}
	}

This is ugly, though, 'cos now user code has to know about
B.Args.baseClassArgs:

	B.Args args;
	args.baseClassArgs.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

So the next step is to use alias this to make .baseClassArgs transparent
to user code:

	class B : A {
		struct Args {
			A.Args baseClassArgs;
			alias baseClassArgs this; // <--- N.B.
			... // fields specific to B
		}
		this(Args args) {
			// Nice side-effect of alias this: we can pass
			// args to super without needing to explicitly
			// name .baseClassArgs.
			super(args);
			...
		}
	}

	// Now user code doesn't need to know about .baseClassArgs:
	B.Args args;
	args.baseClassParm1 = 123;
	args.derivedClassParm1 = 234;
	...
	auto obj = new B(args);

This is starting to look pretty good. Now the next step is, having to
type A.Args baseClassArgs each time is a lot of boilerplate, and could
be error-prone. For example, if we accidentally wrote C.Args instead of
A.Args:

	class B : A {
		struct Args {
			C.Args baseClassArgs; // <--- oops!
			alias baseClassArgs this;
			...
		}
		...
	}

So the next step is to make the type of baseClassArgs automatically
inferred, so that no matter how we move B around in the class hierarchy,
it will always be correct:

	class B : A {
		struct Args {
			typeof(super).Args baseClassArgs; // ah, much better!
			alias baseClassArgs this;
			...
		}
		this(Args args) {
			super(args);
			...
		}
	}

This is good, because now, the declaration of B.Args is independent of
whatever base class B has. Similarly, thanks to the alias this
introduced earlier, the call to super(...) is always written
super(args), without any explicit reference to the specific base class.
DRY is good. Of course, this is still a lot of boilerplate: you have to
keep typing out the first 3 lines of the declaration of Args, in every
derived class. But now that we've made this declaration independent of
an explicit base class name, we can factor it into a mixin:

	mixin template CtorArgs(string fields) {
		struct Args {
			typeof(super).Args baseClassArgs;
			alias baseClassArgs this;
			mixin(fields);
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Now we can simply use CtorArgs!(...) in each derived class to
automatically declare the Args struct correctly. The boilerplate is now
minimal. Things continue to work even if we move B around in the class
hierarchy. Say we want to derive B from C instead of A; then we'd simply
write:

	class B : C {	// <-- this is the only line that's different!
		mixin CtorArgs!(q{
			int derivedParm1;
			int derivedParm2;
			...
		});
		this(Args args) {
			super(args);
			...
		}
	}

Finally, we add a little detail to our mixin so that we can use it for
the root of the class hierarchy as well. Right now, we still have to
explicitly declare A.Args (assuming A is the root of our hierarchy),
which is bad, because you may accidentally call it something that
doesn't match what CtorArgs expects. We'd like to be able to
consistently use CtorArgs even in the root base class, so that if we
ever need to re-root the hierarchy, things will continue to Just Work.
So we revise CtorArgs thus:

	mixin template CtorArgs(string fields) {
		struct Args {
			static if (!is(typeof(super)==Object)) {
				typeof(super).Args baseClassArgs;
				alias baseClassArgs this;
			}
			mixin(fields);
		}
	}

Basically, the static if just omits the whole baseClassArgs and alias
this deal ('cos the root of the hierarchy has no superclass that also
has an Args struct). So now we can write:

	class A {
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}

And if we ever re-root the hierarchy, we can simply write:

	class A : B {	// <--- this is the only line that changes
		mixin CtorArgs!(q{ /* ctor fields here */ });
		...
	}


I think my approach of using builder structs with a parallel
inheritance tree is still better

 
 It may be, it certainly looked quite neat but I haven't had a
 detailed look at it TBH.  I think you've missunderstood my idea
 however, or rather, the issues it was intended to solve :)  Perhaps
 my idea is too limiting for you?  I could certainly understand that
 point of view.

Well, I think our approaches are essentially the same thing, just
implemented differently. :)

One thing about your implementation that I found limiting was that you
*have* to declare all required fields on-the-spot before the compiler
will let your 'new' call pass, so if you have to create 5 similar
instances of the class, you have to copy-n-paste most of the set-method
calls:

	auto obj1 = new C() {
		name = "test1",
		age = 12,
		school = "D Burg High School"
	});

	auto obj2 = new C() {
		name = "test2",
		age = 12,
		school = "D Burg High School"
	}

	auto obj3 = new C() {
		name = "test3",
		age = 12,
		school = "D Burg High School"
	}

	auto obj4 = new C() {
		name = "test4",
		age = 12,
		school = "D Burg High School"
	}

	auto obj5 = new C() {
		name = "test5",
		age = 12,
		school = "D Burg High School"
	}

Whereas using my approach, you can simply reuse the Args struct several
times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

You can also have different functions setup different parts of C.Args:

	C createObject(C.Args args) {
		// N.B. only need to set a subset of fields
		args.school = "D Burg High School";
		return new C(args);
	}

	void main() {
		C.Args args;
		args.name = "test1";
		args.age = 12;		// partially setup Args
		auto obj = createObject(args); // createObject fills out rest of the fields.
		...

		args.name = "test2";	// modify a few parameters
		auto obj2 = createObject(args); // createObject doesn't need to know about
this change
	}

This is nice if there are a lot of parameters and you don't want to
collect the setting up of all of them in one place.


 I think another interesting idea is using the builder pattern with
 create-set-call objects.
 
 For example, a builder template class could inspect the object for
 UDA's indicating a data member which is required during
 initialisation.  It would contain a bool[] to flag each member as
 not/initialised and expose a setMember() method which would call the
 underlying object setMember() and return a reference to itself.
 
 At some point, these setMember() method would want to return another
 template class which contained just a build() member.  I'm not sure
 how/if this is possible in D.

[...]

Hmm, this is an interesting idea indeed. I think it may be possible to
implement in the current language. It would solve the problem of
mandatory fields, which is currently a main weakness of my approach (the
user can neglect to setup a field in Args, and there's no way to enforce
that those fields *must* be set -- you could provide sane defaults in
the declaration of Args, but if some fields have no sane default value,
then you're out of luck). One approach is to use Nullable for mandatory
fields (or equivalently, use bool[] as you suggest), then the ctors will
throw an exception if a required field hasn't been set yet. Which isn't
a bad solution, since ctors in theory *should* vet their input values
before creating an instance of the class anyway. But it does require
some amount of boilerplate.

Maybe we can make use of UDAs to indicate which fields are mandatory,
then have a template (or mixin template) uses compile-time reflection to
generate the code that verifies that these fields have indeed been set.
Maybe something like:

	struct RequiredAttr {}

	// Warning: have not tried to compile this yet
	mixin template checkCtorArgs(alias args) {
		alias Args = typeof(args);
		foreach (field; __traits(allMembers, Args)) {
			// (Ugh, __traits syntax is so ugly)
			static if (is(__traits(getAttributes,
				__traits(getMember, args,
				field)[0])==RequiredAttr))
			{
				if (__traits(getMember, args, field) is null)
					throw new Exception("...");
			}
		}
	}

	class B : A {
		mixin CtorArgs!(q{
			int myfield1;	// this one is optional
			 (RequiredAttr) Nullable!int myfield2; // this one is mandatory
		});
		this(Args args) {
			mixin checkCtorArgs!(args);
				// throws if any mandatory fields aren't set
			...
		}
	}

Just a rough idea, haven't actually tried to compile this code yet.

On second thoughts, maybe we could just check for an instantiation of
Nullable instead of using a UDA, since if you forget to use a nullable
value (like int instead of Nullable!int), this code wouldn't work.

Or maybe enhance the CtorArgs template to automatically substitute
Nullable!T when it sees a field of type T that's marked with
 (RequiredAttr). Or maybe your bool[] idea is better, since it avoids
the dependency on Nullable.

In any case, this is an interesting direction to look into.


T

-- 
Тише едешь, дальше будешь.

Jul 17 2013

"w0rp" <devw0rp gmail.com> writes:

I always just avoided confusion by limiting myself to a maximum
of 5 arguments for any function or constructor, maybe with a soft
limit of 3. Preferring composition over inheritance helps too.

Jul 17 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:
 I always just avoided confusion by limiting myself to a maximum
 of 5 arguments for any function or constructor, maybe with a soft
 limit of 3. Preferring composition over inheritance helps too.

My original motivation for trying to tackle this problem was when I was
experimenting with maze generation algorithms. I had a base class
representing all maze generators, and various derived classes
representing specific algorithms. Some of these algorithms have quite a
large number of configurable parameters, and the algorithms themselves
have different flavors, so some classes that already have many
parameters would have derived classes that introduce a few more.

Encapsulating all of these parameters inside structs was the only sane
way I could think of to manage the large sets of parameters involved.

Also, I agree that 3-5 parameters per function/ctor is about the max for
a clean interface -- any more than that and it's a sign that you aren't
organizing your code properly.  But in the case of ctors, it's not so
much the 3-5 parameters required for the class itself that's the
problem, but the fact that these parameters *accumulate* in all derived
classes. If you have a 4-level class hierarchy and each level adds 5
more parameters, that's 20 parameters in total, which is clearly
unmanageable.


T

-- 
Designer clothes: how to cover less by paying more.

Jul 17 2013

"eles" <eles eles.com> writes:

On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
 On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:

This is how it is done in Ecere SDK and the eC language:

"However, constructors particularly do not play a role a
important as in C++, for example. Neither constructors nor 
destructors
can take in any parameters, and only a single one of each can be
defined within a class."

"Instead, members can be directly assigned a
value through the instantiation syntax initializers (either 
through the
data members, or the properties which we will describe in next
chapter)."

"They cannot be specified a return type either. A constructor
should never fail, but returning false(they have an implicit bool
return type) will result in a the object instantiated to be null."

Jul 17 2013

"eles" <eles eles.com> writes:

On Wednesday, 17 July 2013 at 21:59:14 UTC, eles wrote:
 On Wednesday, 17 July 2013 at 21:42:16 UTC, H. S. Teoh wrote:
 On Wed, Jul 17, 2013 at 11:19:21PM +0200, w0rp wrote:

 This is how it is done in Ecere SDK and the eC language:

Example:

import"ecere"
classForm1 : Window
{
text = "Form1";
background = activeBorder;
borderStyle = sizable;
hasMaximize = true;
hasMinimize = true;
hasClose = true;
clientSize = { 400, 300 };
}
Form1 form1 {};

Basically, you assign needed fields first, then call an unique 
constructor on tthat skeleton.

Jul 17 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:
 On Wed, Jul 17, 2013 at 11:00:38AM +0100, Regan Heath wrote:
 Emphasis on "create-set-call" :)  The weakness to create-set-call
 style is the desire for a valid object as soon as an attempt can be
 made to use it.  Which implies the need for some sort of enforcement
 of initialisation and as I mentioned in my first post the issue of
 preventing this intialisation being spread out, or intermingled with
 others and thus making the semantics of it harder to see.

 Ah, I see. So basically, you need some kind of enforcement of a
 two-state object, pre-initialization and post-initialization. Basically,
 the ctor is empty, so you allocate the object first, then set some
 values into it, then it "officially" becomes a full-fledged instance of
 the class. To prevent problems with consistency, a sharp transition
 between setting values and using the object is enforced. Am I right?

Yes, that's basically it.

 I guess my point was that if we boil this down to the essentials, it's
 basically the same idea as a builder pattern, just implemented slightly
 differently. In the builder pattern, a separate object (or struct, or
 whatever) is used to encapsulate the state of the object that we'd like
 it to be in, which we then pass to the ctor to create the object in that
 state. The idea is the same, though: set up a bunch of values
 representing the desired initial state of the object, then, to borrow
 Perl's terminology, "bless" it into a full-fledged class instance.

It achieves the same ends, but does it differently.  My idea requires  
compiler support (which makes it unlikely to happen) and doesn't require  
separate objects (which I think is a big plus).

 So, to take my idea a little further - WRT class inheritance.  The
 compiler, for a derived class, would need to inspect the invariants
 of all classes involved (these are and-ed already), inspect the
 constructors of the derived classes (for calls to initialise
 members), and the initialisation block I described and verify
 statically that an attempt was made to initialise all the members
 which appear in all the invariants.

 I see. So basically the user still has to set up all required values
 before you can use the object, the advantage being that you don't have
 to manually percolate these values up the inheritance tree in the ctors.

Exactly.

 It seems to be essentially the same thing as my approach, just
 implemented differently. :)[...]

Thanks for the description of your idea.

As I understand it, in your approach all the mandatory parameters for all  
classes in the hierarchy are /always/ passed to the final child  
constructor.  In my idea a constructor in the hierarchy could chose to set  
some of the mandatory members of it's parents, and the compiler would  
detect that and would not require the initialisation block to contain  
those members.

Also, in your approach there isn't currently any enforcement that the user  
sets all the mandatory parameters of Args, and this is kinda the main  
issue my idea solves.

 One thing about your implementation that I found limiting was that you
 *have* to declare all required fields on-the-spot before the compiler
 will let your 'new' call pass, so if you have to create 5 similar
 instances of the class, you have to copy-n-paste most of the set-method
 calls:

 	auto obj1 = new C() {
 		name = "test1",
 		age = 12,
 		school = "D Burg High School"
 	});

 [...]

 Whereas using my approach, you can simply reuse the Args struct several
 times:

 	C.Args args;
 	args.name = "test1";
 	args.age = 12;
 	args.school = "D Burg High School";
 	auto obj1 = new C(args);

 	args.name = "test2";
 	auto obj2 = new C(args);

 	args.name = "test3";
 	auto obj3 = new C(args);

 	... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or .dup  
method to your class to duplicate it :)

 You can also have different functions setup different parts of C.Args:

 	C createObject(C.Args args) {
 		// N.B. only need to set a subset of fields
 		args.school = "D Burg High School";
 		return new C(args);
 	}

 	void main() {
 		C.Args args;
 		args.name = "test1";
 		args.age = 12;		// partially setup Args
 		auto obj = createObject(args); // createObject fills out rest of the  
 fields.
 		...

 		args.name = "test2";	// modify a few parameters
 		auto obj2 = createObject(args); // createObject doesn't need to know  
 about this change
 	}

 This is nice if there are a lot of parameters and you don't want to
 collect the setting up of all of them in one place.

In my case you can call different functions in the initialisation block,  
e.g.

void defineObject(C c)
{
   c.school = "...);
}

C c = new C() {
   defineObject()
}

:)

 I think another interesting idea is using the builder pattern with
 create-set-call objects.

 For example, a builder template class could inspect the object for
 UDA's indicating a data member which is required during
 initialisation.  It would contain a bool[] to flag each member as
 not/initialised and expose a setMember() method which would call the
 underlying object setMember() and return a reference to itself.

 At some point, these setMember() method would want to return another
 template class which contained just a build() member.  I'm not sure
 how/if this is possible in D.

 [...]

 Hmm, this is an interesting idea indeed. I think it may be possible to
 implement in the current language.

The issue I think is the step where you want to mutate the return type  
 from the type with setX members to the type with build().

 Maybe we can make use of UDAs to indicate which fields are mandatory

That was what I was thinking.

 [...]
 Just a rough idea, haven't actually tried to compile this code yet.

Worth a go, it doesn't require compiler support like my idea so it's far  
more likely you'll get something at the end of it.. I can just sit on my  
hands and/or try to promote my idea.

I still prefer my idea :P.  I think it's cleaner and simpler, this is in  
part because it requires compiler support and that hides the gory details,  
but also because create-set-call is a simpler style in itself.  Provided  
the weaknesses of create-set-call can be addressed I might be tempted to  
use that style.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 18 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
 On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

[...]
I guess my point was that if we boil this down to the essentials,
it's basically the same idea as a builder pattern, just implemented
slightly differently. In the builder pattern, a separate object (or
struct, or whatever) is used to encapsulate the state of the object
that we'd like it to be in, which we then pass to the ctor to create
the object in that state. The idea is the same, though: set up a
bunch of values representing the desired initial state of the object,
then, to borrow Perl's terminology, "bless" it into a full-fledged
class instance.

 
 It achieves the same ends, but does it differently.  My idea requires
 compiler support (which makes it unlikely to happen) and doesn't
 require separate objects (which I think is a big plus).

Why would requiring separate objects be a problem?


[...]
 Thanks for the description of your idea.
 
 As I understand it, in your approach all the mandatory parameters
 for all classes in the hierarchy are /always/ passed to the final
 child constructor.  In my idea a constructor in the hierarchy could
 chose to set some of the mandatory members of it's parents, and the
 compiler would detect that and would not require the initialisation
 block to contain those members.

In my case, the derived class ctor could manually set some of the fields
in Args before handing to the superclass. Of course, it's not as ideal,
since if user code already sets said fields, then they get silently
overridden.


 Also, in your approach there isn't currently any enforcement that
 the user sets all the mandatory parameters of Args, and this is
 kinda the main issue my idea solves.

True. One workaround is to use Nullable and check that in the ctor. But
I suppose it's not as great as a compile-time check.


One thing about your implementation that I found limiting was that
you *have* to declare all required fields on-the-spot before the
compiler will let your 'new' call pass, so if you have to create 5
similar instances of the class, you have to copy-n-paste most of the
set-method calls:

	auto obj1 = new C() {
		name = "test1",
		age = 12,
		school = "D Burg High School"
	});

[...]

Whereas using my approach, you can simply reuse the Args struct
several times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

 
 Or.. you use a mixin, or better still you add a copy-constructor or
 .dup method to your class to duplicate it :)

But then you end up with the problem of needing to call set methods
after the .dup, which may complicate things if the set methods need to
do non-trivial initialization of internal structures (caches or internal
representations, etc.). Whereas if you hadn't needed to .dup, you could
have gotten by without writing any set methods for your class, but now
you have to.


[...]
 In my case you can call different functions in the initialisation
 block, e.g.
 
 void defineObject(C c)
 {
   c.school = "...);
 }
 
 C c = new C() {
   defineObject()
 }
 
 :)

So the compiler has to recursively traverse function calls in the
initialization block in order to check that all required fields are set?
That could have entail some implementational issues, if said function
calls can be arbitrarily complex. (If you have complex control logic in
said functions, the compiler can't in general determine whether or not
some paths will/will not be taken that may assignment statements to the
object's fields, since that would be equivalent to the halting problem.
Worse, the compiler would have to track aliases of the object being set,
in order to know which assignment statements are setting fields in the
object, and which are just computations on the side.)

Furthermore, what if defineObject tries to do something with C other
than setting up fields? The object would be in an illegal state since it
hasn't been fully constructed yet.


I think another interesting idea is using the builder pattern with
create-set-call objects.

For example, a builder template class could inspect the object for
UDA's indicating a data member which is required during
initialisation.  It would contain a bool[] to flag each member as
not/initialised and expose a setMember() method which would call the
underlying object setMember() and return a reference to itself.

At some point, these setMember() method would want to return another
template class which contained just a build() member.  I'm not sure
how/if this is possible in D.

[...]

Hmm, this is an interesting idea indeed. I think it may be possible to
implement in the current language.

 
 The issue I think is the step where you want to mutate the return
 type from the type with setX members to the type with build().

I'm not sure I understand that sentence. Could you rephrase it?


Maybe we can make use of UDAs to indicate which fields are mandatory

 
 That was what I was thinking.
 
[...]
Just a rough idea, haven't actually tried to compile this code yet.

 
 Worth a go, it doesn't require compiler support like my idea so it's
 far more likely you'll get something at the end of it.. I can just
 sit on my hands and/or try to promote my idea.
 
 I still prefer my idea :P.  I think it's cleaner and simpler, this
 is in part because it requires compiler support and that hides the
 gory details, but also because create-set-call is a simpler style in
 itself.  Provided the weaknesses of create-set-call can be addressed
 I might be tempted to use that style.

[...]

One thing I like about your idea is that you can reuse the same chunk of
memory that the eventual object is going to sit in. With my approach,
the ctors still have to copy the struct fields into the object fields,
so there is some overhead there. (Having said that though, that overhead
shouldn't be anything worse than the ctor-with-arguments calls it
replaces; you're basically just abstracting away the ctor parameters on
the stack into a struct. In machine code it's pretty much equivalent.)

Requiring compiler support, though, as you said, makes your idea less
likely to actually happen. I still see it as essentially equivalent to
my approach; the syntax is different and the usage pattern differs, but
at the end of the day, it amounts to the same thing: basically your
objects have two phases, a post-creation, pre-usage stage where you set
things up, and a post-setup stage where you actually start using it.

Anyway, now that I'm thinking about this problem again, I'd like to take
a step back and consider if any other good approaches may exist to
tackle this issue. I'm thinking of the general case where the
initialization of an object may be arbitrarily complex, such that
neither a struct of ctor arguments nor an initialization block may be
sufficient.

The problem with the struct approach is, what if you need a complex
setup process, say constructing a graph with complex interconnections
between nodes? In order to express such a thing, you have to essentially
already create the object before you can pass the struct to the ctor,
which kinda defeats the purpose. Similarly, your approach of an
initialization block suffers from the limitation that the initialization
is confined to that block, and you can't allow arbitrary code in that
block (otherwise you could end up using an object that hasn't been fully
constructed yet -- like the defineObject problem I pointed out above).

Keeping in mind the create-set-call pattern and Perl's approach of
"blessing" an object into a full-fledged class instance, I wonder if a
more radical approach might be to have the language acknowledge that
objects have two phases, a preinitialized state, and a fully-initialized
state. These two would have distinct types *in the type system*, such
that you cannot, for example, call post-init methods on a
pre-initialization object, and you can't call an init method on a
post-initialization object. The ctor would be the unique transition
point which takes a preinitialized object, verifies compliance with
class invariants, and returns a post-initialization object.

In pseudo-code, this might look something like this:

	class MyClass {
	public:
		 preinit void setName(string name);
		 preinit void setAge(int age);

		this() {
			if (!validateFields())
				throw new Exception(...);
		}

		// The following are "normal" methods that cannot be
		// called in a preinit state.
		void computeStatistics();
		void dotDotDotMagic();
	}

	void main() {
		auto obj = new MyClass();
		assert(typeof(obj) == MyClass.preinit);
		/* MyClass.preinit is a special type indicating that the
		 * object isn't fully initialized yet */

		// Compile error: cannot call non- preinit method on
		//  preinit object.
		//obj.computeStatistics();

		obj.setName(...);	// OK
		obj.setAge(...);	// OK

		// Transition object to full-fledged state
		obj.this();		// not sure about this syntax yet

		assert(typeof(obj) == MyClass);
		/* Now obj is a full-fledged member of the class */

		// Compile error: can't call  preinit method on
		// non-preinit object
		//obj.setName(...);

		obj.computeStatistics();	// OK
	}

MyClass.preinit would be a separate type in the type system, so that you
can pass it around without any risk that someone will try to perform
illegal operations on it before it's fully initialized:

	void doSetup(MyClass.preinit obj) {
		obj.setName(...);		// OK
		//obj.computeStatistics();	// compile error
	}
	void main() {
		auto obj = new MyClass();
		doSetup(obj);		// OK
		obj.this();		// "promote" to full-fledged object

		// Illegal: can't implicitly convert MyClass into
		// MyClass.preinit.
		//doSetup(obj);

		obj.computeStatistics(); // OK
	}

Maybe "obj.this()" is not a good syntax, perhaps "obj.promote()"?

In any case, this is a rather radical idea which requires language
support; I'm not sure how practical it is. :)


T

-- 
"Uhh, I'm still not here." -- KD, while "away" on ICQ.

Jul 18 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Thu, 18 Jul 2013 19:00:44 +0100, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:
 On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:

 [...]
I guess my point was that if we boil this down to the essentials,
it's basically the same idea as a builder pattern, just implemented
slightly differently. In the builder pattern, a separate object (or
struct, or whatever) is used to encapsulate the state of the object
that we'd like it to be in, which we then pass to the ctor to create
the object in that state. The idea is the same, though: set up a
bunch of values representing the desired initial state of the object,
then, to borrow Perl's terminology, "bless" it into a full-fledged
class instance.

 It achieves the same ends, but does it differently.  My idea requires
 compiler support (which makes it unlikely to happen) and doesn't
 require separate objects (which I think is a big plus).

 Why would requiring separate objects be a problem?

It's not a problem, it's just better not to, if at all possible. K.I.S.S.  
:)

 In my case, the derived class ctor could manually set some of the fields
 in Args before handing to the superclass. Of course, it's not as ideal,
 since if user code already sets said fields, then they get silently
 overridden.

That's the problem I was imagining.

 Also, in your approach there isn't currently any enforcement that
 the user sets all the mandatory parameters of Args, and this is
 kinda the main issue my idea solves.

 True. One workaround is to use Nullable and check that in the ctor. But
 I suppose it's not as great as a compile-time check.

Yeah, I was angling for a static/compile time check, if at all possible.

Whereas using my approach, you can simply reuse the Args struct
several times:

	C.Args args;
	args.name = "test1";
	args.age = 12;
	args.school = "D Burg High School";
	auto obj1 = new C(args);

	args.name = "test2";
	auto obj2 = new C(args);

	args.name = "test3";
	auto obj3 = new C(args);

	... // etc.

 Or.. you use a mixin, or better still you add a copy-constructor or
 .dup method to your class to duplicate it :)

 But then you end up with the problem of needing to call set methods
 after the .dup

Which is no different to setting args.name beforehand, the same number of  
assignments.  In the example above it's N+1 assignments, N args or dup'ed  
members and 1 more for 'name' before or after the construction.

 which may complicate things if the set methods need to
 do non-trivial initialization of internal structures (caches or internal
 representations, etc.).

Ahh, yes, and in this case you'd want to use the idea below, where you  
call a method to set the common parts and manually set the differences.

 Whereas if you hadn't needed to .dup, you could
 have gotten by without writing any set methods for your class, but now
 you have to.

create-set-call <- 'set' is kinda an integral part of the whole thing :P

 [...]
 In my case you can call different functions in the initialisation
 block, e.g.

 void defineObject(C c)
 {
   c.school = "...);
 }

 C c = new C() {
   defineObject()
 }

 :)

 So the compiler has to recursively traverse function calls in the
 initialization block in order to check that all required fields are set?

Yes.  This was an off the cuff idea, but it /is/ a natural extension of  
the idea for the compiler to traverse the setters called inside the  
initialisation block, and ctors in the hierarchy, etc.

 That could have entail some implementational issues, if said function
 calls can be arbitrarily complex. (If you have complex control logic in
 said functions, the compiler can't in general determine whether or not
 some paths will/will not be taken that may assignment statements to the
 object's fields, since that would be equivalent to the halting problem.

All true.  The compiler has a couple of options to (re)solve these issues:
1. It could simply baulk at the complexity and error.
2. It could take the safe route and assume those member assignments it  
cannot verify are uninitialised, forcing manual init.

In fact, erroring at complexity might make for better code in many ways.   
You would have to perform your complex initialisation beforehand, store  
the result in a variable, and then construct/initblock your object.

It does limit your choice of style, but create-set-call already does that  
.. and I'm not immediately against style limitations assuming they  
actually result in better code.

 Worse, the compiler would have to track aliases of the object being set,
 in order to know which assignment statements are setting fields in the
 object, and which are just computations on the side.)

No, aliasing would simply be ignored.  In fact, calling a setter on  
another object in an initblock should probably be an error.  Part of the  
whole "don't mix initialisation" goal I started with.  It does require  
strict properties.

 Furthermore, what if defineObject tries to do something with C other
 than setting up fields? The object would be in an illegal state since it
 hasn't been fully constructed yet.

That's an error.  This is why in my initial post I stated that we'd need  
explicit/well defined properties.  All you would be allowed to call in an  
initialisation block, on the object being initialised, are setter  
properties.. and possibly methods or free function which only call setter  
properties.

I think another interesting idea is using the builder pattern with
create-set-call objects.

For example, a builder template class could inspect the object for
UDA's indicating a data member which is required during
initialisation.  It would contain a bool[] to flag each member as
not/initialised and expose a setMember() method which would call the
underlying object setMember() and return a reference to itself.

At some point, these setMember() method would want to return another
template class which contained just a build() member.  I'm not sure
how/if this is possible in D.

[...]

Hmm, this is an interesting idea indeed. I think it may be possible to
implement in the current language.

 The issue I think is the step where you want to mutate the return
 type from the type with setX members to the type with build().

 I'm not sure I understand that sentence. Could you rephrase it?

I am imagining using a template to create a type which wraps the original  
object.  The created type would expose setter properties for all the  
mandatory members, and nothing else.  The user would call these setters,  
using UFCS/chain style, however, only after setting all the mandatory  
properties do we want to expose an additional member called build() which  
returns the constructed/initialised object.

So, an example:

class Foo {...}

auto f = Builder!(Foo)().setName("Regan").setAge(33).build();

The type of the object returned from the Builder!(Foo) is our first  
created type, which exposes setName() and setAge(), however the type  
returned from setAge (or whichever member assignment is done last) is the  
second created type, which either has all the set.. members plus build()  
or only build().  The build() method returns a Foo.

So, the type of 'f' above is Foo.

The goal here is to make build() statically available when Foo is  
completely initialised and not before.  Of course we could simplify all  
this by making it available immediately and throwing if some members are  
uninitialised - but that is a runtime check and I was angling for a  
compile time one.

If you wanted to enforce a specific init ordering you could even produce a  
separate type containing only the next member to init, and from each  
setter return the next type in sequence - like a type state machine :p

The template bloat however..

 The problem with the struct approach is, what if you need a complex
 setup process, say constructing a graph with complex interconnections
 between nodes? In order to express such a thing, you have to essentially
 already create the object before you can pass the struct to the ctor,
 which kinda defeats the purpose. Similarly, your approach of an
 initialization block suffers from the limitation that the initialization
 is confined to that block, and you can't allow arbitrary code in that
 block (otherwise you could end up using an object that hasn't been fully
 constructed yet -- like the defineObject problem I pointed out above).

Yes, neither idea works for all possible use-cases.  Yours is naturally  
broader and less limiting because I was starting from a limited  
create-set-call style and imposing further limitation on how it can be  
used.

 Keeping in mind the create-set-call pattern and Perl's approach of
 "blessing" an object into a full-fledged class instance, I wonder if a
 more radical approach might be to have the language acknowledge that
 objects have two phases, a preinitialized state, and a fully-initialized
 state. These two would have distinct types *in the type system*, such
 that you cannot, for example, call post-init methods on a
 pre-initialization object, and you can't call an init method on a
 post-initialization object.

That is essentially the same idea as the builder template solution I talk  
about above :)

 The ctor would be the unique transition
 point which takes a preinitialized object, verifies compliance with
 class invariants, and returns a post-initialization object.

AKA build() above :)

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 19 2013

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

Regan Heath wrote:
 Or, perhaps another way to ask a similar W is.. can the compiler =20
 statically verify that a create-set-call style object has been =20
 initialised, or rather that an attempt has at least been made to =20
 initialise all the required parts.
=20

	Here's a way to do it in Scala:
http://blog.rafaelferreira.net/2008/07/type-safe-builder-pattern-in-scala=
=2Ehtml

	Basically, the builder object is a generic that has a boolean
parameter for each mandatory parameter. Setting a parameter casts
the builder object to the same generic with the corresponding
boolean set to true. And the "build" method is only available when
the type system recognizes that all the booleans are true.

	Note however that this will not work if you try to mutate the
builder instance. IOW, this will work (assuming you only need to
specify foo and bar):

 auto instance =3D builder().withFoo (1).withBar ("abc").build();

but this won't work:

 auto b =3D builder();
 b.withFoo (1);
 b.withBar ("abc");
 auto instance =3D b.build();

	Something similar should be doable in D (although I'm a bit afraid
of the template bloat it might create=E2=80=A6)

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Jul 16 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 16 Jul 2013 18:54:06 +0100, J=C3=A9r=C3=B4me M. Berger <jeberger=
 free.fr>  =

wrote:

 Regan Heath wrote:
 Or, perhaps another way to ask a similar W is.. can the compiler
 statically verify that a create-set-call style object has been
 initialised, or rather that an attempt has at least been made to
 initialise all the required parts.

 	Here's a way to do it in Scala:
 http://blog.rafaelferreira.net/2008/07/type-safe-builder-pattern-in-sc=

ala.html

I saw the builder pattern mentioned in the original thread..

 	Basically, the builder object is a generic that has a boolean
 parameter for each mandatory parameter. Setting a parameter casts
 the builder object to the same generic with the corresponding
 boolean set to true. And the "build" method is only available when
 the type system recognizes that all the booleans are true.

But I hadn't realised it could enforce things statically, this is a cool=
  =

idea.

 	Note however that this will not work if you try to mutate the
 builder instance. IOW, this will work (assuming you only need to
 specify foo and bar):

 auto instance =3D builder().withFoo (1).withBar ("abc").build();


This looks like good D style, to me, in keeping with the UFCS chains etc=
.

 but this won't work:

 auto b =3D builder();
 b.withFoo (1);
 b.withBar ("abc");
 auto instance =3D b.build();


But, you could create a separate variable for each with, couldn't you - =
v/  =

inefficient, but possible.  I don't think this syntax/style is a  =

requirement, and I prefer the chain style above it.

 	Something similar should be doable in D (although I'm a bit afraid
 of the template bloat it might create=E2=80=A6)

Indeed.  The issue I have with the builder is the requirement for more  =

classes/templates/etc in addition to the original objects.  D could like=
ly  =

define them in the standard library, but as you say there would be  =

template bloat.

R

-- =

Using Opera's revolutionary email client: http://www.opera.com/mail/

Jul 17 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Interesting Research Paper on Constructors in OO Languages