digitalmars.D - Whole source-tree statefull preprocessing, notion of a whole program

Boris-Barboris (131/131) Apr 08 2017 Hello! It's a bit long one, I guess, but I'd like to have some

Vladimir Panteleev (48/73) Apr 08 2017 Template mixins' scope is the expansion scope, not the

Boris-Barboris (21/90) Apr 08 2017 Am i mistaken in assumption that such simple getter property will

Vladimir Panteleev (29/67) Apr 08 2017 I don't know the type of CONF.root, but from the usage syntax in

Boris-Barboris (78/104) Apr 08 2017 ...

Boris-Barboris <ismailsiege gmail.com> writes:

Hello! It's a bit long one, I guess, but I'd like to have some 
discussion of topic. I'll start with a concrete use case:

For the sake of entertainment, I tried to wrote generic 
configuration management class. I was inspired by oslo_config 
python package that I have to deal with on work. I started with:



class Config
{
     this(string filename) { ... }
     void save() abstract;
     void load() abstract;
     // Some implementation, json for example
}

abstract class ConfigGroup(string group_name_par)
{
     static immutable group_name = group_name_par;
     protected Config CONF;
     this(Config config) { CONF = config; }

     mixin template ConfigField(T, string opt_name)
	{
		mixin(" property " ~ T.stringof ~ " " ~ opt_name ~
			  "() { return CONF.root[\"" ~ group_name ~ "\"][\"" ~ 
opt_name ~
			  "\"]." ~ json_type(T.stringof) ~ "; }");
		mixin(" property " ~ T.stringof ~ " " ~ opt_name ~
			  "(" ~ T.stringof ~ " value) { return CONF.root[\"" ~ 
group_name
               ~ "\"][\"" ~ opt_name ~ "\"]." ~ 
json_type(T.stringof) ~ "(value); }");
	}
}



private class TestConfigGroup: ConfigGroup!("testGroup")
{
	this(Config config) { super(config); }

	mixin ConfigField!(string, "some_string_option");
	mixin ConfigField!(double, "some_double_option");
}


... aand I stopped. And here are the blockers I saw:

1). I had to save template parameter group_name_par into 
group_name. Looks like template mixin doesn't support closures. A 
minor inconvenience, I would say, and it's not what I would like 
to talk about.
2). After preprocessing I wish to have fully-typed, safe and fast 
Config class, that contains all the groups I defined for it in 
it's body, and not as references. I don't want pointer lookup 
during runtime to get some field. This is actually quite a 
problem for D:
     2.1). Looks like mixin is the only instrument to extend class 
body. Obvious solution would be a loop that mixins some 
definitions from some compile-time known array, meybe even string 
array. And the pretties of all ways - so that such array will 
contain module names and class names of all ConfigGroup 
derivatives defined in whole program (absolute madman). Said 
array could be appended in compile-time by every derivative of 
ConfigGroup.
     2.2) Sweet dream of 2.1 is met with absence of tools to 
create and manipulate state during preprocessing. For example:

     immutable string[] primordial = [];  // maybe some special 
qualifier instead
                                          // of immutable will be 
better. Even
                                          // better if it shifts 
to immutable
                                          // during run-time
     premixin template (string toAdd) { primordial ~= toAdd; } // 
for example
     mixin template Populate
     {
         foreach (s; primordial)
             mixin("int " ~ s);    // create some int field
     }
     class Populated { mixin Populate; }
     // another module
     premixin("field1")  // evaluated in preprocessor in
     premixin("field2")  // order of definition
     Populated p = new Populated;
     p.field1 = 3;

         By "premixin" I mean that all such operations are 
performed in our special preprocessor stage, that is completed 
before mixins we already have now start to do their jobs.

     2.3) There is strong C ancestry in D. The one regarding 
compilation being performed on translation units (.d source 
files) is, in my opinion, quite devastating. I don't know about 
you guys, but in 2017 I compile programs. I don't care about 
individual object files and linker shenanigans, for me it's the 
whole program that matters, and object files are just the way C 
does it's thing. You definetly must respect it while interfacing 
with it, but that's about it. Correct me if I'm wrong, but 
departure from C's compile process (CLI is not the cause here I 

wonderfull concept - class can be extended only volunteeringly 
(like in D, where we need to willingly write mixin to change 
definition), and localized source code changes: when project 
functionality is extended, old code base can sometimes remain 
completely untouched (this is huge for very big projects IMO). I 
will not deny, however, that readability of such code suffers. As 
a counter-argument, relationships between portion of the class 
and other code are usually local, in a way that this class part's 
fields are used by source code in this folder and basically 
nowhere else.
         But what's done is done, I understand. However, I 
believe, preprocessor still has hope for it, and can be 
generalized to whole source tree without throwing old toolchain 
out of the window. In the way that would allow "primordial" 
string array from the example above to be the same for all 
translation units after preprocessing is done.
     2.4) Original configuration management example would also 
require the ability to import definitions cyclically. Module A 
containing ConfigGroupConcrete instantiation imports module B 
where Config is defined, wich will require B to import A in order 
to access ConfigGroupConcrete definition. Yet another stone in 
C's garden, yes. You could, for example, pass whole 
ConfigGroupConcrete body as a string and mixin it there, but then 
you would require to automatically build such string, and at this 
point you're better off with some kind of templating language. 
And templating languages make thing even less readable IMO, while 
simply being a crutches to replace language preprocessors, that 
don't follow industry needs. I do believe such case is out of 
reach until preprocessing is done on whole program united.


To conclude, I'll summarize my questions:
1). Is there a compiled language that is capable of the 
abovementiond tricks, without resorting to external templating 
meta-languages?
2). How deep the rabbit hole goes in terms of complexity of 
preprocessor modifications required? And for DMD in general?
3). How are cyclic module imports handled currently in D?
4). Is there hope that it's possible to do in, say, a year? I 
don't mind trying to implement it myself, but I don't want to 
invest time in thing that is so conceptually out of plane that 
will simply be too destructive for current compiler environment.

Apr 08 2017

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:
 1). I had to save template parameter group_name_par into 
 group_name. Looks like template mixin doesn't support closures. 
 A minor inconvenience, I would say, and it's not what I would 
 like to talk about.

Template mixins' scope is the expansion scope, not the 
declaration scope. (This is useful in some situations, but 
recently I have been mostly avoiding template mixins.)

 2). After preprocessing I wish to have fully-typed, safe and 
 fast Config class, that contains all the groups I defined for 
 it in it's body, and not as references. I don't want pointer 
 lookup during runtime to get some field.

Looks like your current implementation does not go in that 
direction, seeing as it uses properties for field access.

For such tasks, I would suggest to split the representations 
(native data, DOM tree, JSON strings etc.) from the 
transformations (serialization and parsing/emitting JSON). E.g. 
your example could be represented as:

struct Config
{
     struct TestGroup
     {
         string some_string_option;
         double some_double_option;
     }
     TestGroup testGroup;
}

Then, separate code for serializing/deserializing this to/from a 
DOM or directly to/from JSON.

Individual components' configuration can be delegated to their 
components; their modules could contain public struct definitions 
that you can add to the global Config struct, which describes the 
configuration of the entire application. I've used this pattern 
successfully in some projects, incl. Digger: 
https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36

     2.2) Sweet dream of 2.1 is met with absence of tools to 
 create and manipulate state during preprocessing. For example:

I understand that you seem to be looking for a way to change 
types (definitions in general) inside modules you import. This is 
problematic from several aspects, such as other modules depending 
on that module may find that the definitions "change under their 
feet". In D, once a type is declared and its final curly brace is 
closed, you will know that its definition will remain the same 
from anywhere in the program.

D's answer to partial classes is UFCS, however this does not 
allow "adding" fields, only methods.

     2.4) Original configuration management example would also 
 require the ability to import definitions cyclically. Module A 
 containing ConfigGroupConcrete instantiation imports module B 
 where Config is defined, wich will require B to import A in 
 order to access ConfigGroupConcrete definition.

I don't really understand what you mean here, but D does allow 
cyclic module imports. It is only forbidden when more than one 
module inside any cycle has static constructors, because then it 
is not possible to determine the correct initialization order.

 To conclude, I'll summarize my questions:
 1). Is there a compiled language that is capable of the 
 abovementiond tricks, without resorting to external templating 
 meta-languages?

I don't know of any. For D, I suggest trying different approaches 
/ paradigms.

 2). How deep the rabbit hole goes in terms of complexity of 
 preprocessor modifications required? And for DMD in general?

So far, I had not heard of any D project that requires 
preprocessing of D code. I think D's metaprogramming has enough 
solutions to choose from for the vast majority of conceivable 
situations where other languages would call for a preprocessor.

 4). Is there hope that it's possible to do in, say, a year? I 
 don't mind trying to implement it myself, but I don't want to 
 invest time in thing that is so conceptually out of plane that 
 will simply be too destructive for current compiler environment.

I suggest that you examine how established D projects deal with 
similar situations.

Apr 08 2017

Boris-Barboris <ismailsiege gmail.com> writes:

On Saturday, 8 April 2017 at 13:09:59 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:

 2). After preprocessing I wish to have fully-typed, safe and 
 fast Config class, that contains all the groups I defined for 
 it in it's body, and not as references. I don't want pointer 
 lookup during runtime to get some field.

 Looks like your current implementation does not go in that 
 direction, seeing as it uses properties for field access.

Am i mistaken in assumption that such simple getter property will 
be optimized to direct field access? Anyways, that's minor detail.

 For such tasks, I would suggest to split the representations 
 (native data, DOM tree, JSON strings etc.) from the 
 transformations (serialization and parsing/emitting JSON). E.g. 
 your example could be represented as:

 struct Config
 {
     struct TestGroup
     {
         string some_string_option;
         double some_double_option;
     }
     TestGroup testGroup;
 }

 Then, separate code for serializing/deserializing this to/from 
 a DOM or directly to/from JSON.

 Individual components' configuration can be delegated to their 
 components; their modules could contain public struct 
 definitions that you can add to the global Config struct, which 
 describes the configuration of the entire application. I've 
 used this pattern successfully in some projects, incl. Digger: 
 https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36

Ok, that's nice, but it still requires manual inclusion of such 
field into global config struct. Some "compile-time callback" 
system still would scale better in my opinion.

     2.2) Sweet dream of 2.1 is met with absence of tools to 
 create and manipulate state during preprocessing. For example:

 I understand that you seem to be looking for a way to change 
 types (definitions in general) inside modules you import. This 
 is problematic from several aspects, such as other modules 
 depending on that module may find that the definitions "change 
 under their feet".

As expected since class that allows itself to be modified in 
compile-time, always does so explicitly via mixin. Most of the 
times such manipulation is used to extend functionality (add 
field, plugin, method) without removing or modifying existing 
ones. And if the names conflict, we get nice compile-time error 
anyways.

 In D, once a type is declared and its final curly brace is 
 closed, you will know that its definition will remain the same 
 from anywhere in the program.

That's kinda my point - definition needs to stay the same because 
it's built by compiler as many times as there are transtaltion 
units, because evil old C grandpa.

 D's answer to partial classes is UFCS, however this does not 
 allow "adding" fields, only methods.

Adding fields, or, generally, objects \ collections of objects, 
is the main use case. Adding methods in my experience is rare 
scenario.

     2.4) Original configuration management example would also 
 require the ability to import definitions cyclically. Module A 
 containing ConfigGroupConcrete instantiation imports module B 
 where Config is defined, wich will require B to import A in 
 order to access ConfigGroupConcrete definition.

 I don't really understand what you mean here, but D does allow 
 cyclic module imports. It is only forbidden when more than one 
 module inside any cycle has static constructors, because then 
 it is not possible to determine the correct initialization 
 order.

Exactly what I wanted to know, thank you.

 To conclude, I'll summarize my questions:
 1). Is there a compiled language that is capable of the 
 abovementiond tricks, without resorting to external templating 
 meta-languages?

 I don't know of any. For D, I suggest trying different 
 approaches / paradigms.

 2). How deep the rabbit hole goes in terms of complexity of 
 preprocessor modifications required? And for DMD in general?

 So far, I had not heard of any D project that requires 
 preprocessing of D code. I think D's metaprogramming has enough 
 solutions to choose from for the vast majority of conceivable 
 situations where other languages would call for a preprocessor.

 4). Is there hope that it's possible to do in, say, a year? I 
 don't mind trying to implement it myself, but I don't want to 
 invest time in thing that is so conceptually out of plane that 
 will simply be too destructive for current compiler 
 environment.

 I suggest that you examine how established D projects deal with 
 similar situations.

Thank you.

Apr 08 2017

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Saturday, 8 April 2017 at 14:20:49 UTC, Boris-Barboris wrote:
 Looks like your current implementation does not go in that 
 direction, seeing as it uses properties for field access.

 Am i mistaken in assumption that such simple getter property 
 will be optimized to direct field access? Anyways, that's minor 
 detail.

I don't know the type of CONF.root, but from the usage syntax in 
your example, it looks like an associative array. Associative 
array lookup will be slower than simply accessing a variable.

 Individual components' configuration can be delegated to their 
 components; their modules could contain public struct 
 definitions that you can add to the global Config struct, 
 which describes the configuration of the entire application. 
 I've used this pattern successfully in some projects, incl. 
 Digger: 
 https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36

 Ok, that's nice, but it still requires manual inclusion of such 
 field into global config struct.

Yes; in my opinion, I think that's desirable because it is 
aligned with the unidirectional flow of information from 
higher-level components to lower-level ones, and does not impose 
a particular configuration framework onto the lower-level 
components (they only need to declare their configuration in 
terms of a POD type).

 Some "compile-time callback" system still would scale better in 
 my opinion.

A similar effect can be achieved by allowing components to 
register themselves in a static constructor (not at compile-time, 
but at program start-up).

 I understand that you seem to be looking for a way to change 
 types (definitions in general) inside modules you import. This 
 is problematic from several aspects, such as other modules 
 depending on that module may find that the definitions "change 
 under their feet".

 As expected since class that allows itself to be modified in 
 compile-time, always does so explicitly via mixin. Most of the 
 times such manipulation is used to extend functionality (add 
 field, plugin, method) without removing or modifying existing 
 ones. And if the names conflict, we get nice compile-time error 
 anyways.

Then you have problems such as the instance size of a class 
changing depending on whether the code that requires the instance 
size is seen by the compiler before the code that modifies the 
instance size. I think it would cause complicated design problems 
that limit the scalability of the language. Even without such 
features, DMD had to go through a number of bugs to iron out the 
correct semantics of evaluating types (e.g. with 
"typeof(this).sizeof" inside a struct declaration, or recursive 
struct template instantiations).

 In D, once a type is declared and its final curly brace is 
 closed, you will know that its definition will remain the same 
 from anywhere in the program.

 That's kinda my point - definition needs to stay the same 
 because it's built by compiler as many times as there are 
 transtaltion units, because evil old C grandpa.

I think this is not about technical limitations, but intentional 
design choices. Allowing types to be modified post-declaration 
invalidates many contracts and assumptions that code may have, 
and make it harder to reason about the program as a whole. 
Compare with e.g. INTERCAL's COMEFROM instruction.

 D's answer to partial classes is UFCS, however this does not 
 allow "adding" fields, only methods.

 Adding fields, or, generally, objects \ collections of objects, 
 is the main use case. Adding methods in my experience is rare 
 scenario.

UFCS is widely used in D for component programming:

http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321

Apr 08 2017

Boris-Barboris <ismailsiege gmail.com> writes:

On Saturday, 8 April 2017 at 17:57:11 UTC, Vladimir Panteleev 
wrote:

 Yes; in my opinion, I think that's desirable because it is 
 aligned with the unidirectional flow of information from 
 higher-level components to lower-level ones, and does not 
 impose a particular configuration framework onto the 
 lower-level components (they only need to declare their 
 configuration in terms of a POD type).

...
 A similar effect can be achieved by allowing components to 
 register themselves in a static constructor (not at 
 compile-time, but at program start-up).

   That is definetly possible and, I would say, trivial, and this 
is the most popular way. However, any run-time registration 
implies run-time collections to iterate over, with obvious 
performance drawbacks (minor ones in this case). We are not using 
the information we have a-priory, in compile time, and make CPU 
pay for it instead (either because we are too lazy (too busy) to 
update sources of higher-level components (while making them a 
mess), or just because our language lacks expressibility, wich is 
my point).
   I side with another set of virtues. Source code consists of 
files. Files contain related data, concepts, functionality, 
whatever. Relations between those entities must by no means be 
unidirectional. What direction can you impose to concept "For 
every configurable entity, be that package, module, or class, I 
need to have fields in global configuration singleton"?
   IMO program has good architecture, when during extensive 
development for arbitrary group of programmers it takes little 
time and effort to make usefull changes. It is achieved by 
extensive problem field research, use of abstraction to fight 
complexities, yada yada... Part of it is to make sure, that you 
can extend functionality easily. Adding new subclass in one file 
and registering it in two others is not hard. But there is no 
fundamental reason for it to not be easier: just add subclass and 
slap some fancy attribute on it, or add some preprocessor-related 
field or function in it's body.
   Onde-directional flow is a consequence, not a principle. It's 
because languages were made this way we are used to it. When 
high-level concept or idea willingly implies feedback from it's 
users, there is little reason to forbid it. Especially when it 
actually improves development iteration times, lowers risks of 
merge conflicts etc.

   Look at this mess:
https://github.com/Boris-Barboris/AtmosphereAutopilot/blob/master/AtmosphereAutopilot/GUI/AutoGui.cs#L190

time ago, that defines attribute to mark class fields with in 
order to draw them in pretty little debug window. Why do I have 
to do this? I've got all information right in the source. All 
classes that will be drawn using this GUI module are there, in 
text, accessible to build system. Why can't I just write clean 
code, that doesn't involve double or tripple associative array 
dispatch on runtime-reflected list of subclasses and 
attribute-marked fields? Answer is simple - language lacks 
expressibility. I provide drawing functionality in my module. It 
is generic, it is virtuous, it is concentrated in one file, it 
speeds up development. However, it needs to see it's client, 
beneficient, in order to draw him. And it just doesn't. Because 

throwing away information you already have and making CPU 
reconstruct it again during runtime, over and over.
   Yes, all things I describe can be done efficiently by writing a 
lot of boilerplate code or using some text-templating magic. I 
just don't see why languages can't have that functionality 
built-in.

 Then you have problems such as the instance size of a class 
 changing depending on whether the code that requires the 
 instance size is seen by the compiler before the code that 
 modifies the instance size. I think it would cause complicated 
 design problems that limit the scalability of the language. 
 Even without such features, DMD had to go through a number of 
 bugs to iron out the correct semantics of evaluating types 
 (e.g. with "typeof(this).sizeof" inside a struct declaration, 
 or recursive struct template instantiations).

I agree. I think such mechanisms must be applied very early, 
hence "premixin".

 I think this is not about technical limitations, but 
 intentional design choices. Allowing types to be modified 
 post-declaration invalidates many contracts and assumptions 
 that code may have, and make it harder to reason about the 
 program as a whole. Compare with e.g. INTERCAL's COMEFROM 
 instruction.

   I still don't see the problem. Declaration will contain 
constructs that indicate that it will be changed, like, for 
example, "premixin" that iterates over array and adds fields. I 
have no doubt human can read this allright.
   Indeed, question of ordering is important.
   Well, we have goto, it's not like sky dropped down on us. 
COMEFROM breaks logical time flow. I only want two-staged 
preprocessing, when in first stage I can create and manipulate 
some simple, visible to preprocessor state, even if it consists 
of only immutable base types and arrays of those, and then being 
able to use that state as immutable variables in next stages we 
already have. We already can populate class with fields from 
immutable string array. All I'm wanting is ability to populate 
this array using preprocessor directives from across whole 
compiled program, before all other complex stuff starts. I think 
that would be beautiful.

 UFCS is widely used in D for component programming:

 http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321

I'm not stating the opposite, just sharing what I encountered on 
work or during programming for fun - I mostly needed to add 
fields. People may feel otherwise, but I don't see this concept 
harming them in any way.

Apr 08 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Whole source-tree statefull preprocessing, notion of a whole program