www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Whole source-tree statefull preprocessing, notion of a whole program

reply Boris-Barboris <ismailsiege gmail.com> writes:
Hello! It's a bit long one, I guess, but I'd like to have some 
discussion of topic. I'll start with a concrete use case:

For the sake of entertainment, I tried to wrote generic 
configuration management class. I was inspired by oslo_config 
python package that I have to deal with on work. I started with:



class Config
{
     this(string filename) { ... }
     void save() abstract;
     void load() abstract;
     // Some implementation, json for example
}

abstract class ConfigGroup(string group_name_par)
{
     static immutable group_name = group_name_par;
     protected Config CONF;
     this(Config config) { CONF = config; }

     mixin template ConfigField(T, string opt_name)
	{
		mixin(" property " ~ T.stringof ~ " " ~ opt_name ~
			  "() { return CONF.root[\"" ~ group_name ~ "\"][\"" ~ 
opt_name ~
			  "\"]." ~ json_type(T.stringof) ~ "; }");
		mixin(" property " ~ T.stringof ~ " " ~ opt_name ~
			  "(" ~ T.stringof ~ " value) { return CONF.root[\"" ~ 
group_name
               ~ "\"][\"" ~ opt_name ~ "\"]." ~ 
json_type(T.stringof) ~ "(value); }");
	}
}



private class TestConfigGroup: ConfigGroup!("testGroup")
{
	this(Config config) { super(config); }

	mixin ConfigField!(string, "some_string_option");
	mixin ConfigField!(double, "some_double_option");
}


... aand I stopped. And here are the blockers I saw:

1). I had to save template parameter group_name_par into 
group_name. Looks like template mixin doesn't support closures. A 
minor inconvenience, I would say, and it's not what I would like 
to talk about.
2). After preprocessing I wish to have fully-typed, safe and fast 
Config class, that contains all the groups I defined for it in 
it's body, and not as references. I don't want pointer lookup 
during runtime to get some field. This is actually quite a 
problem for D:
     2.1). Looks like mixin is the only instrument to extend class 
body. Obvious solution would be a loop that mixins some 
definitions from some compile-time known array, meybe even string 
array. And the pretties of all ways - so that such array will 
contain module names and class names of all ConfigGroup 
derivatives defined in whole program (absolute madman). Said 
array could be appended in compile-time by every derivative of 
ConfigGroup.
     2.2) Sweet dream of 2.1 is met with absence of tools to 
create and manipulate state during preprocessing. For example:

     immutable string[] primordial = [];  // maybe some special 
qualifier instead
                                          // of immutable will be 
better. Even
                                          // better if it shifts 
to immutable
                                          // during run-time
     premixin template (string toAdd) { primordial ~= toAdd; } // 
for example
     mixin template Populate
     {
         foreach (s; primordial)
             mixin("int " ~ s);    // create some int field
     }
     class Populated { mixin Populate; }
     // another module
     premixin("field1")  // evaluated in preprocessor in
     premixin("field2")  // order of definition
     Populated p = new Populated;
     p.field1 = 3;

         By "premixin" I mean that all such operations are 
performed in our special preprocessor stage, that is completed 
before mixins we already have now start to do their jobs.

     2.3) There is strong C ancestry in D. The one regarding 
compilation being performed on translation units (.d source 
files) is, in my opinion, quite devastating. I don't know about 
you guys, but in 2017 I compile programs. I don't care about 
individual object files and linker shenanigans, for me it's the 
whole program that matters, and object files are just the way C 
does it's thing. You definetly must respect it while interfacing 
with it, but that's about it. Correct me if I'm wrong, but 
departure from C's compile process (CLI is not the cause here I 

wonderfull concept - class can be extended only volunteeringly 
(like in D, where we need to willingly write mixin to change 
definition), and localized source code changes: when project 
functionality is extended, old code base can sometimes remain 
completely untouched (this is huge for very big projects IMO). I 
will not deny, however, that readability of such code suffers. As 
a counter-argument, relationships between portion of the class 
and other code are usually local, in a way that this class part's 
fields are used by source code in this folder and basically 
nowhere else.
         But what's done is done, I understand. However, I 
believe, preprocessor still has hope for it, and can be 
generalized to whole source tree without throwing old toolchain 
out of the window. In the way that would allow "primordial" 
string array from the example above to be the same for all 
translation units after preprocessing is done.
     2.4) Original configuration management example would also 
require the ability to import definitions cyclically. Module A 
containing ConfigGroupConcrete instantiation imports module B 
where Config is defined, wich will require B to import A in order 
to access ConfigGroupConcrete definition. Yet another stone in 
C's garden, yes. You could, for example, pass whole 
ConfigGroupConcrete body as a string and mixin it there, but then 
you would require to automatically build such string, and at this 
point you're better off with some kind of templating language. 
And templating languages make thing even less readable IMO, while 
simply being a crutches to replace language preprocessors, that 
don't follow industry needs. I do believe such case is out of 
reach until preprocessing is done on whole program united.


To conclude, I'll summarize my questions:
1). Is there a compiled language that is capable of the 
abovementiond tricks, without resorting to external templating 
meta-languages?
2). How deep the rabbit hole goes in terms of complexity of 
preprocessor modifications required? And for DMD in general?
3). How are cyclic module imports handled currently in D?
4). Is there hope that it's possible to do in, say, a year? I 
don't mind trying to implement it myself, but I don't want to 
invest time in thing that is so conceptually out of plane that 
will simply be too destructive for current compiler environment.
Apr 08 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:
 1). I had to save template parameter group_name_par into 
 group_name. Looks like template mixin doesn't support closures. 
 A minor inconvenience, I would say, and it's not what I would 
 like to talk about.
Template mixins' scope is the expansion scope, not the declaration scope. (This is useful in some situations, but recently I have been mostly avoiding template mixins.)
 2). After preprocessing I wish to have fully-typed, safe and 
 fast Config class, that contains all the groups I defined for 
 it in it's body, and not as references. I don't want pointer 
 lookup during runtime to get some field.
Looks like your current implementation does not go in that direction, seeing as it uses properties for field access. For such tasks, I would suggest to split the representations (native data, DOM tree, JSON strings etc.) from the transformations (serialization and parsing/emitting JSON). E.g. your example could be represented as: struct Config { struct TestGroup { string some_string_option; double some_double_option; } TestGroup testGroup; } Then, separate code for serializing/deserializing this to/from a DOM or directly to/from JSON. Individual components' configuration can be delegated to their components; their modules could contain public struct definitions that you can add to the global Config struct, which describes the configuration of the entire application. I've used this pattern successfully in some projects, incl. Digger: https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36
     2.2) Sweet dream of 2.1 is met with absence of tools to 
 create and manipulate state during preprocessing. For example:
I understand that you seem to be looking for a way to change types (definitions in general) inside modules you import. This is problematic from several aspects, such as other modules depending on that module may find that the definitions "change under their feet". In D, once a type is declared and its final curly brace is closed, you will know that its definition will remain the same from anywhere in the program. D's answer to partial classes is UFCS, however this does not allow "adding" fields, only methods.
     2.4) Original configuration management example would also 
 require the ability to import definitions cyclically. Module A 
 containing ConfigGroupConcrete instantiation imports module B 
 where Config is defined, wich will require B to import A in 
 order to access ConfigGroupConcrete definition.
I don't really understand what you mean here, but D does allow cyclic module imports. It is only forbidden when more than one module inside any cycle has static constructors, because then it is not possible to determine the correct initialization order.
 To conclude, I'll summarize my questions:
 1). Is there a compiled language that is capable of the 
 abovementiond tricks, without resorting to external templating 
 meta-languages?
I don't know of any. For D, I suggest trying different approaches / paradigms.
 2). How deep the rabbit hole goes in terms of complexity of 
 preprocessor modifications required? And for DMD in general?
So far, I had not heard of any D project that requires preprocessing of D code. I think D's metaprogramming has enough solutions to choose from for the vast majority of conceivable situations where other languages would call for a preprocessor.
 4). Is there hope that it's possible to do in, say, a year? I 
 don't mind trying to implement it myself, but I don't want to 
 invest time in thing that is so conceptually out of plane that 
 will simply be too destructive for current compiler environment.
I suggest that you examine how established D projects deal with similar situations.
Apr 08 2017
parent reply Boris-Barboris <ismailsiege gmail.com> writes:
On Saturday, 8 April 2017 at 13:09:59 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 8 April 2017 at 10:11:11 UTC, Boris-Barboris wrote:
 2). After preprocessing I wish to have fully-typed, safe and 
 fast Config class, that contains all the groups I defined for 
 it in it's body, and not as references. I don't want pointer 
 lookup during runtime to get some field.
Looks like your current implementation does not go in that direction, seeing as it uses properties for field access.
Am i mistaken in assumption that such simple getter property will be optimized to direct field access? Anyways, that's minor detail.
 For such tasks, I would suggest to split the representations 
 (native data, DOM tree, JSON strings etc.) from the 
 transformations (serialization and parsing/emitting JSON). E.g. 
 your example could be represented as:

 struct Config
 {
     struct TestGroup
     {
         string some_string_option;
         double some_double_option;
     }
     TestGroup testGroup;
 }

 Then, separate code for serializing/deserializing this to/from 
 a DOM or directly to/from JSON.

 Individual components' configuration can be delegated to their 
 components; their modules could contain public struct 
 definitions that you can add to the global Config struct, which 
 describes the configuration of the entire application. I've 
 used this pattern successfully in some projects, incl. Digger: 
 https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36
Ok, that's nice, but it still requires manual inclusion of such field into global config struct. Some "compile-time callback" system still would scale better in my opinion.
     2.2) Sweet dream of 2.1 is met with absence of tools to 
 create and manipulate state during preprocessing. For example:
I understand that you seem to be looking for a way to change types (definitions in general) inside modules you import. This is problematic from several aspects, such as other modules depending on that module may find that the definitions "change under their feet".
As expected since class that allows itself to be modified in compile-time, always does so explicitly via mixin. Most of the times such manipulation is used to extend functionality (add field, plugin, method) without removing or modifying existing ones. And if the names conflict, we get nice compile-time error anyways.
 In D, once a type is declared and its final curly brace is 
 closed, you will know that its definition will remain the same 
 from anywhere in the program.
That's kinda my point - definition needs to stay the same because it's built by compiler as many times as there are transtaltion units, because evil old C grandpa.
 D's answer to partial classes is UFCS, however this does not 
 allow "adding" fields, only methods.
Adding fields, or, generally, objects \ collections of objects, is the main use case. Adding methods in my experience is rare scenario.
     2.4) Original configuration management example would also 
 require the ability to import definitions cyclically. Module A 
 containing ConfigGroupConcrete instantiation imports module B 
 where Config is defined, wich will require B to import A in 
 order to access ConfigGroupConcrete definition.
I don't really understand what you mean here, but D does allow cyclic module imports. It is only forbidden when more than one module inside any cycle has static constructors, because then it is not possible to determine the correct initialization order.
Exactly what I wanted to know, thank you.
 To conclude, I'll summarize my questions:
 1). Is there a compiled language that is capable of the 
 abovementiond tricks, without resorting to external templating 
 meta-languages?
I don't know of any. For D, I suggest trying different approaches / paradigms.
 2). How deep the rabbit hole goes in terms of complexity of 
 preprocessor modifications required? And for DMD in general?
So far, I had not heard of any D project that requires preprocessing of D code. I think D's metaprogramming has enough solutions to choose from for the vast majority of conceivable situations where other languages would call for a preprocessor.
 4). Is there hope that it's possible to do in, say, a year? I 
 don't mind trying to implement it myself, but I don't want to 
 invest time in thing that is so conceptually out of plane that 
 will simply be too destructive for current compiler 
 environment.
I suggest that you examine how established D projects deal with similar situations.
Thank you.
Apr 08 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Saturday, 8 April 2017 at 14:20:49 UTC, Boris-Barboris wrote:
 Looks like your current implementation does not go in that 
 direction, seeing as it uses properties for field access.
Am i mistaken in assumption that such simple getter property will be optimized to direct field access? Anyways, that's minor detail.
I don't know the type of CONF.root, but from the usage syntax in your example, it looks like an associative array. Associative array lookup will be slower than simply accessing a variable.
 Individual components' configuration can be delegated to their 
 components; their modules could contain public struct 
 definitions that you can add to the global Config struct, 
 which describes the configuration of the entire application. 
 I've used this pattern successfully in some projects, incl. 
 Digger: 
 https://github.com/CyberShadow/Digger/blob/master/config.d#L31-L36
Ok, that's nice, but it still requires manual inclusion of such field into global config struct.
Yes; in my opinion, I think that's desirable because it is aligned with the unidirectional flow of information from higher-level components to lower-level ones, and does not impose a particular configuration framework onto the lower-level components (they only need to declare their configuration in terms of a POD type).
 Some "compile-time callback" system still would scale better in 
 my opinion.
A similar effect can be achieved by allowing components to register themselves in a static constructor (not at compile-time, but at program start-up).
 I understand that you seem to be looking for a way to change 
 types (definitions in general) inside modules you import. This 
 is problematic from several aspects, such as other modules 
 depending on that module may find that the definitions "change 
 under their feet".
As expected since class that allows itself to be modified in compile-time, always does so explicitly via mixin. Most of the times such manipulation is used to extend functionality (add field, plugin, method) without removing or modifying existing ones. And if the names conflict, we get nice compile-time error anyways.
Then you have problems such as the instance size of a class changing depending on whether the code that requires the instance size is seen by the compiler before the code that modifies the instance size. I think it would cause complicated design problems that limit the scalability of the language. Even without such features, DMD had to go through a number of bugs to iron out the correct semantics of evaluating types (e.g. with "typeof(this).sizeof" inside a struct declaration, or recursive struct template instantiations).
 In D, once a type is declared and its final curly brace is 
 closed, you will know that its definition will remain the same 
 from anywhere in the program.
That's kinda my point - definition needs to stay the same because it's built by compiler as many times as there are transtaltion units, because evil old C grandpa.
I think this is not about technical limitations, but intentional design choices. Allowing types to be modified post-declaration invalidates many contracts and assumptions that code may have, and make it harder to reason about the program as a whole. Compare with e.g. INTERCAL's COMEFROM instruction.
 D's answer to partial classes is UFCS, however this does not 
 allow "adding" fields, only methods.
Adding fields, or, generally, objects \ collections of objects, is the main use case. Adding methods in my experience is rare scenario.
UFCS is widely used in D for component programming: http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321
Apr 08 2017
parent Boris-Barboris <ismailsiege gmail.com> writes:
On Saturday, 8 April 2017 at 17:57:11 UTC, Vladimir Panteleev 
wrote:

 Yes; in my opinion, I think that's desirable because it is 
 aligned with the unidirectional flow of information from 
 higher-level components to lower-level ones, and does not 
 impose a particular configuration framework onto the 
 lower-level components (they only need to declare their 
 configuration in terms of a POD type).
...
 A similar effect can be achieved by allowing components to 
 register themselves in a static constructor (not at 
 compile-time, but at program start-up).
That is definetly possible and, I would say, trivial, and this is the most popular way. However, any run-time registration implies run-time collections to iterate over, with obvious performance drawbacks (minor ones in this case). We are not using the information we have a-priory, in compile time, and make CPU pay for it instead (either because we are too lazy (too busy) to update sources of higher-level components (while making them a mess), or just because our language lacks expressibility, wich is my point). I side with another set of virtues. Source code consists of files. Files contain related data, concepts, functionality, whatever. Relations between those entities must by no means be unidirectional. What direction can you impose to concept "For every configurable entity, be that package, module, or class, I need to have fields in global configuration singleton"? IMO program has good architecture, when during extensive development for arbitrary group of programmers it takes little time and effort to make usefull changes. It is achieved by extensive problem field research, use of abstraction to fight complexities, yada yada... Part of it is to make sure, that you can extend functionality easily. Adding new subclass in one file and registering it in two others is not hard. But there is no fundamental reason for it to not be easier: just add subclass and slap some fancy attribute on it, or add some preprocessor-related field or function in it's body. Onde-directional flow is a consequence, not a principle. It's because languages were made this way we are used to it. When high-level concept or idea willingly implies feedback from it's users, there is little reason to forbid it. Especially when it actually improves development iteration times, lowers risks of merge conflicts etc. Look at this mess: https://github.com/Boris-Barboris/AtmosphereAutopilot/blob/master/AtmosphereAutopilot/GUI/AutoGui.cs#L190 time ago, that defines attribute to mark class fields with in order to draw them in pretty little debug window. Why do I have to do this? I've got all information right in the source. All classes that will be drawn using this GUI module are there, in text, accessible to build system. Why can't I just write clean code, that doesn't involve double or tripple associative array dispatch on runtime-reflected list of subclasses and attribute-marked fields? Answer is simple - language lacks expressibility. I provide drawing functionality in my module. It is generic, it is virtuous, it is concentrated in one file, it speeds up development. However, it needs to see it's client, beneficient, in order to draw him. And it just doesn't. Because throwing away information you already have and making CPU reconstruct it again during runtime, over and over. Yes, all things I describe can be done efficiently by writing a lot of boilerplate code or using some text-templating magic. I just don't see why languages can't have that functionality built-in.
 Then you have problems such as the instance size of a class 
 changing depending on whether the code that requires the 
 instance size is seen by the compiler before the code that 
 modifies the instance size. I think it would cause complicated 
 design problems that limit the scalability of the language. 
 Even without such features, DMD had to go through a number of 
 bugs to iron out the correct semantics of evaluating types 
 (e.g. with "typeof(this).sizeof" inside a struct declaration, 
 or recursive struct template instantiations).
I agree. I think such mechanisms must be applied very early, hence "premixin".
 I think this is not about technical limitations, but 
 intentional design choices. Allowing types to be modified 
 post-declaration invalidates many contracts and assumptions 
 that code may have, and make it harder to reason about the 
 program as a whole. Compare with e.g. INTERCAL's COMEFROM 
 instruction.
I still don't see the problem. Declaration will contain constructs that indicate that it will be changed, like, for example, "premixin" that iterates over array and adds fields. I have no doubt human can read this allright. Indeed, question of ordering is important. Well, we have goto, it's not like sky dropped down on us. COMEFROM breaks logical time flow. I only want two-staged preprocessing, when in first stage I can create and manipulate some simple, visible to preprocessor state, even if it consists of only immutable base types and arrays of those, and then being able to use that state as immutable variables in next stages we already have. We already can populate class with fields from immutable string array. All I'm wanting is ability to populate this array using preprocessor directives from across whole compiled program, before all other complex stuff starts. I think that would be beautiful.
 UFCS is widely used in D for component programming:

 http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321
I'm not stating the opposite, just sharing what I encountered on work or during programming for fun - I mostly needed to add fields. People may feel otherwise, but I don't see this concept harming them in any way.
Apr 08 2017