digitalmars.D - xdc: A hypothetical D cross-compiler and AST manipulation tool.

Chad Joan (503/503) Jul 17 2013 I'd like to present my vision for a new D compiler. I call it

Chad Joan (2/2) Jul 17 2013 Crud, I miscalculated the line wrap on the web reader a lot.
Chad Joan (446/446) Jul 17 2013 For the web forum interface users, I have re-wrapped the text in

Kagamin (5/5) Jul 18 2013 llvm should be platform-independent enough, so you have ldc. But

Chad Joan (74/79) Jul 19 2013 I think a C backend would get us farther than an LLVM backend.

Tove (5/9) Jul 19 2013 I love the idea behind xdc, but I would go with C99 instead, even

Chad Joan (54/64) Jul 19 2013 I am, in fact, totally willing to make --emit=C99 do cool things

Kagamin (2/5) Jul 23 2013 A standard _Align attribute? You need it, right?

Chad Joan (34/39) Jul 23 2013 I didn't know that was standard in C99.

Kagamin (2/2) Jul 25 2013 It's probably C11. It allows only enlarging the alignment,

Joakim (8/14) Jul 19 2013 A small correction, the Android NDK added clang/llvm support last

Chad Joan (4/18) Jul 19 2013 Cool, thanks for the info.

Kai Nacke (9/10) Nov 11 2013 Hi,

Chad Joan (9/19) Nov 11 2013 Is there any built-in support for using this C++ backend in LDC

David Nadlinger (8/26) Nov 12 2013 Careful: The "cpp" LLVM backend actually creates C++ code that

cal (4/4) Jul 19 2013 On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:

Chad Joan (8/12) Jul 19 2013 The latter.

deadalnix (2/15) Jul 19 2013 I'm not sure how you'll handle all compile time features.

Chad Joan (390/407) Jul 21 2013 To be honest, I hadn't yet written down what these would look

Tofu Ninja (5/12) Jul 25 2013 If you started a kick starter, I would put some money up, the

Chad Joan (6/10) Jul 25 2013 Cool, thanks!

Etienne (4/566) Nov 08 2013 Many vendors would have their processors supported in D if we had

Chad Joan (24/29) Nov 11 2013 No, not yet I'm afraid. At least not for xdc.

Kelly (23/53) Nov 11 2013 Hey Chad,

Chad Joan (5/16) Nov 12 2013 Hi Kelly,

nazriel (12/17) Nov 09 2013 I think C backend is a good idea.

Andrei Alexandrescu (4/10) Nov 09 2013 I think C is not a good back-end language. Other backend generators

Daniel Murphy (9/20) Nov 09 2013 That is true in general, but D actually maps quite well onto C.

deadalnix (2/6) Nov 09 2013 Out of curiosity, how do you handle exceptions ?

Daniel Murphy (4/11) Nov 09 2013 I didn't. This was focussed on a subset suitable for microcontrollers. ...

Andrei Alexandrescu (3/15) Nov 09 2013 That doesn't quite rhyme with C being a good backend language :o).

Daniel Murphy (6/24) Nov 10 2013 I guess it's not for the full language, but if you can't use gdc or llvm...

Chad Joan (71/96) Nov 11 2013 My ideal is to have exceptions in C anyways. I don't understand

Walter Bright (4/11) Nov 10 2013 Exceptions is one big problem. Another is COMDATs - C compilers don't em...

Chad Joan (24/40) Nov 11 2013 This seems like it matters when linking D code to D code. Other

Walter Bright (2/4) Nov 11 2013 That's not really an option if you intend to use C as a back end.

Jacob Carlborg (4/8) Nov 11 2013 What about the EDG C++ compiler, doesn't that output C code?

Iain Buclaw (5/31) Nov 10 2013 Especially gdc. Cross-platform support needs all the love it can get. ;-...

Chad Joan (20/33) Nov 11 2013 What would you suggest as an alternative for targeting disparate

Andrei Alexandrescu (6/36) Nov 11 2013 Fine with me. I have no stake in this. I don't see how you reach the

Chad Joan (15/52) Nov 12 2013 I call it "awesome" because you seem to have objections to the

Dejan Lekic (6/6) Nov 11 2013 I will definitely back up this project on kickstarter, if the

"Chad Joan" <chadjoan gmail.com> writes:

I'd like to present my vision for a new D compiler.  I call it 
xdc, a loose abbreviation for "Cross D Compiler" (if confused, 
see 
http://english.stackexchange.com/questions/37394/why-do-some-words-have
x-as-a-substitute). 
  It could also mean other fun things like "Crossbone D Compiler" 
(I imagine a logo with some crossbones having a metal D atop 
where the skull normally goes), "eXperimental D Compiler", or 
something sounding like "ectasy" ;)

We usually think of language features as being rewritten into 
simpler features.  The simple features eventually get rewritten 
into machine instructions.  Compilers are, fundamentally, 
responsible for performing "lowering" operations.

It makes sense to me, then, to make a compiler whose internals 
/look/ like a bunch of these rewrites and lowering operations.  
There should be a bunch of input patterns matched to the desired 
results.  This has the happy side-effect of giving us a pleasant 
way to do AST manipulations from within D code.

I've also had a long-standing desire to see D on many more 
platforms.  It should make an appearance on console platforms and 
on smartphones.  I've tried doing this with a retargetable 
compiler like GCC before, and the work was surprisingly large.  
Even if the compiler already emits code for the target system's 
CPU, there are still a large number of details involving calling 
conventions, executable format, and any number of CPU/OS specific 
tweaks to object file output.  It makes a lot more sense to me to 
just output C/C++ code and feed that to a native toolchain.  That 
would skip a lot of the platform-specific nonsense that creates a 
barrier to entry for people who, say, just want to write a simple 
game for Android/iPhone/PS(3|4)/etc in D, and don't want to 
become compiler experts first.  Ideally, some day, this compiler 
would also emit code or bytecode for Javascript, AS3/Flash, Java, 
and any other popular targets that the market desires.  This can 
probably be done with DMD, but I'd like to make the process more 
approachable, and make backend authoring as easy as possible.  It 
should be possible (and easy) to tell the compiler exactly what 
lowerings should be applied before the AST hits the backend.

xdc should bring all of that cross-platform targeting together 
with a compiler infrastructure that can blow everything else away 
(I hope!).

xdc is my dream for a D compiler that gives us our first step (of 
few) towards having what haXe has already (http://haxe.org/) : a 
compiler that can land code just about anywhere.

What follows is a collection of my thoughts written down as notes.

== Ideal Outcomes ==

.- D to C/C++ compiler that can easily reach target platforms 
that are
.    currently either unsupported or poorly supported by current D
.    compilers.
.   - Useful for game developers wishing to write D code on the
.       next big console platform.
.   - Useful for embedded systems developers wishing to write D 
code
.       on obscure or potentially proprietary microcontrollers.

.- Other backends (ex: llvm, Java bytecode, AS3/Flash bytecode, 
etc)
.    possible in the future.  Community desires considered when
.    selecting new backend targets.

.- Interpreter backend: a notable backend that would be 
implemented as
.    a necessity for making CTFE work.  A dedicated interpreter
.    backend would hopefully be much faster and easier on memory 
than
.    DMD's souped-up constant folder.  (Even though DMD has become
.    considerably better at this in the past year or two.)

.- Abstract Syntax Tree (AST) manipulation toolchain, possibly 
usable
.    in CTFE.  It would work like this:
.    (1) Pass some D code or Domain Specific Language (DSL) of 
your
.          choice (as text) into xdc at compile-time.
.    (2) xdc returns an AST.
.    (3) Use xdc's pattern-matching and substitution DSL to
.          manipulate the AST.
.    (4) xdc consumes the AST and emits modified D code.
.    (5) mixin(...) the result.
.   - If xdc is the compiler used to execute itself in CTFE, then
.       it might be possible to optimize this by having it expose
.       itself as a set of intrinsics.

.- Reference counting available by default on all platforms.
.   - Gets you into the action with minimal effort and little or 
no
.       compiler hacking. (More complete GC tends to require 
platform
.       specific ASM and/or operating system API support).

.- Full garbage collection available if supported.
.   - Ex: The C backends would default to ref-counting until the 
ASM
.       and OS-level code is written to support full GC.
.   - Ex: A Java backend would probably use the Java JVM by 
default.

.- Threading model determined by compiler configuration or 
available
.    platform hints.
.   - Ex: The user may have a posix-threads implementation 
available,
.       but know little other details about the target system.  It
.       should be possible for xdc to use pthreads to emulate the
.       TLS and synchronization mechanisms needed to make D tick.
.       (Or at least emulate as many features as possible.)
.   - Ex: Possible "no threading" target for people who don't need
.       threading but DO need other D features to be available NOW
.       on an alien platform.  Errors when the source code passed
.       into xdc assumes that threading features are present.

.- D compiler that is easy to hack on.
.   - "Looks like the problem it solves."
.       (To quote Walter's DConf2013 keynote.)
.   - Made of a bunch of patterns that describe
.       code rewrites/lowerings.
.   - Few or no null value checks necessary.
.      - null checks don't look like pattern matching or lowering.
.   - Few or no convoluted if-while-if-for-etc nests.
.      - These also don't look like pattern matching or lowering.
.   - It should be largely made of "pattern handlers" (see below).
.   - Each pattern handler will have one part that closely 
resembles
.       the AST fragment for the D code that it recognizes, and
.       another part that resembles the lowered form that it 
outputs.
.   - Dependency analysis that prevents your AST manipulation from
.       happening either too early or too late.
.   - Because the code that actually does lowerings is generated 
from
.       a DSL, it is possible to make it automate a lot of tedious
.       tasks, like updating the symbol table when nodes are 
added or
.       removed from the AST.
.   - This makes it easier to test experimental features.

.- A step-by-step view of what the compiler is doing to your code.
.   - Since the semantic analysis of xdc would be composed of
.      "pattern handlers" (see below), then each time one of them
.      completes the compiler could output the result of calling
.      .toString() (or .toDCode() or whatever) on the entire AST.
.   - This could be attached to an ncurses interface that would be
.      activated by passing a flag to the compiler, which would 
then
.      proceed to show the AST at every stage of compilation.
.      Press ENTER to see the next step, etc.
.   - This could also be exposed as API functionality that IDEs 
could
.      use to show developers how the compiler sees their code.

.- D code analysis engine that might be usable to automatically
.    translate D1 code into D2 code, or maybe D2 into D3 in the 
far
.    future.

== Architectural Overview ==

.- xdc will largely consist of "pattern handlers" that recognize
.    patterns in its AST and replace them with AST fragments that
.    contain successively fewer high-level features (lowering).
.   - These pattern handlers would feature a DSL that should make
.       the whole task fairly easy.
.   - The DSL proposed would be similar to regular expressions in
.       semantics but different in syntax.
.      - It will have operators for choice, repetition, optional
.          matches, capturing, and so on.
.      - The DSL must support nested structures well.
.      - The DSL must support vertical layout of patterns well.
.      - Because of the vertical patterns, most operators will 
either
.          be prefix or will be written in block style:
.          some_block_header { block_stmt1; block_stmt2; etc; }
.      - Actions like entering and leaving nodes are given their 
own
.          special syntax.  The machine will treat them like 
tokens
.          that can be matched the same as any AST node.  Notably,
.          node-entry and node-exit do not require introducing
.          non-regular elements to the DSL.  node-entry and 
node-exit
.          may be subsumed into Deterministic Finite Automatons 
(DFAs).
.   - An example pattern handler might look like this:

const lowerWhileStatement =
{
	// Apologies in advance if this isn't actually valid D code:
	//   This is a design sketch and I currently don't have a way to 
compile it.
	//
	// The Pattern template, PatternMatch template, and 
PatternHandler class
	//   have not yet been written.  This is an example of how I 
might expect
	//   them to be used.
	//

	auto consumes = "while_statement";
	auto produces = "if_statement","goto","label");
	
	auto recognizer = Pattern!
		"WhileStatement has
		{
			// Capture the conditional expression (call it \"expr\") and
			//   capture the loop body (call it \"statement\").
			.expression $expr;
			.statement  $statement has
			{
				// Capture any continue/break statements.
				any_amount_of {
					any_amount_of .; // Same as .* in regexes.
					one_of
					{
						ContinueStatement $continues;
						BreakStatement    $breaks;
					}
				}
				any_amount_of .;
			}
		}";
	
	auto action = (PatternMatch!(recognizer) m)
	{
		m.captures.add("uniqLoopAgain", 
getUniqueLabel(syntaxNode.enclosingScope))
		m.captures.add("uniqExitLoop", 
getUniqueLabel(syntaxNode.enclosingScope))
		
		// The "recognizes" clause defines m.getCapture!"continues" 
with:
		//   "ContinueStatement $continues;"
		// That line appears in a repitition context ("any_amount_of") 
and is
		//   therefore typed as an array.
		foreach( ref node; m.getCapture!"continues" )
			node.replaceWith( m, "GotoStatement has $uniqLoopAgain" )
		
		// Ditto for m.getCapture!"breaks" and "BreakStatement 
$breaks;".
		foreach( ref node; m.getCapture!"breaks" )
			node.replaceWith( m, "GotoStatement has $uniqExitLoop" )
	};
	
	auto synthesizer = Pattern!
		"Label has $uniqLoopAgain
		IfStatement has
		{
			OpNegate has $expr
			GotoStatement has $uniqExitLoop
		}
		$statement
		GotoStatement has $uniqLoopAgain
		Label has $uniqExitLoop
		";

	return new PatternHandler(produces, consumes, recognizer, 
action, synthesizer);
};

(Also available at: http://pastebin.com/0mBQxhLs )

.- Dispatch to pattern handlers is performed by the execution of a
.    DFA/Packrat hybrid instead of the traditional OOP inheritance
.    with method calls.
.   - Each pattern handler's recognizer gets treated like a regex
.       or Parsing Expression Grammar (PEG) fragment.
.   - All of the recognizers in the same semantic pass are pasted
.       together in an ordered-choice expression.  The ordering is
.       determined by dependency analysis.
.   - A recognizer's pattern handler is invoked when the 
recognizer's
.       AST expression is matched.
.   - Once any substitutions are completed, then the machine 
executing
.       the pattern engine will set its cursor to the beginning of
.       the newly substituted AST nodes and continue running.
.   - Executing potentially hundreds of pattern handlers in a 
single
.       ordered-choice expression would be obnoxious for a packrat
.       parser (packrat machine?).  Thankfully, ordered-choice is
.       possible in regular grammars, so it can be lowered into 
regex
.       operations and the whole thing turned into a DFA.
.   - If pattern recognizers end up needing recursive elements,
.       then they will probably not appear at the very beginning 
of
.       the pattern.  Patterns with enough regular elements at the
.       start will be able to merge those regular elements into 
the
.       DFA with the rest of the pattern recognizers, and it all
.       becomes very fast table lookups in small tables.

.- This compiler would involve the creation of a parser-generator
.    API that allows code to programmatically create grammars, and
.    to do so without a bunch of clumsy string formatting and 
string
.    concatenation.
.   - These grammars could be written such that things like AST 
nodes
.       are seen as terminals.  This expands possibilities and 
allows
.       all of the pattern handlers to be coalesced into a grammar
.       that operates on ASTs and fires off semantic actions 
whenever
.       one of the recognizer patterns gets tickled by the right 
AST
.       fragment.
.   - Using strings as terminals is still cool; and necessary for
.       xdc's text/D-code parser.
.   - A simple parser-generator API example:

---------------------------------------
string makeParser()
{
	auto builder = new ParserBuilder!char;
	builder.pushSequence();
		builder.literal('x');
		builder.pushMaybe();
			builder.literal('y');
		builder.pop();
	builder.pop();
	return builder.toDCode("callMe");
}

const foo = makeParser();

pragma(msg, foo);
---------------------------------------
Current output:
http://pastebin.com/V3E0Ubbc
---------------------------------------

.   - Humans would probably never directly write grammars using 
this
.       API; it is intended for use by code that needs to write
.       grammars.  xdc would be such code: it's given a bunch of
.       pattern handlers and needs to turn them into a grammar.
.   - This API could also make it easier to write the parser
.       generators that humans /would/ use. For example, it could 
be
.       used as an experimental backend for a regular expression
.       engine that can handle limited recursion.
.   - The packrats usually generated from PEGs are nice and all, 
but
.       I'd really like to generate DFAs whenever possible, 
because
.       those seem to be regarded as being /very fast/.
.   - DFAs can't handle the recursive elements of PEGs, but they
.       should be able to handle everything non-recursive that
.       precedes or follows the recursive elements.
.   - The parser-generator API would be responsible for 
aggressively
.       converting PEG-like elements into regex/DFA elements 
whenever
.       possible.
.   - Regular expressions can be embedded in PEGs as long as you 
tell
.       them how much text to match.  You have to give them 
concrete
.       success/failure conditions that can be determined without
.       help from the rest of the PEG: things like "match as many
.       characters as possible" or "match as few characters as
.       possible".  Without that, the regex's backtracking (DFA'd
.       or otherwise) won't mesh with the PEG.  Give it a concrete
.       win/fail condition, however, and the embedded regex 
becomes
.       just another PEG building block that chews through some
.       source material and yields a yes/no result.  Such regular
.       expressions allow DFAs to be introduced into a recursive
.       descent or packrat parser.
.   - Many PEG elements can be converted into these well-behaved
.       regular expressions.
.      - PEG repetition is just regular expression repetition with
.          a wrapper around it that says "match as many characters
.          as possible".
.      - PEG ordered choice can be lowered into regular expression
.          unordered choice, which can then be converted into 
DFAs:
.          I suspect that this is true: (uv/xy)c == 
(uv|(^(uv)&xy))c
.          (or, by De Morgan's law: (uv/xy)c == 
(uv|(^(uv|^(xy))))c )
.          & is intersection.
.          ^ is negation.
.          Each letter (u,v,x,y,c) can be a subexpression
.            (non-recursive).
.      - PEG label matching can be inlined up to the point where
.          recursion occurs, thus allowing more elements to be
.          considered for DFA conversion.
.      - etc.

.- The parser would be defined using a PEG (most likely using 
Pegged
.    specifically).
.   - Although Pegged is an awesome achievement, I suspect its 
output
.       could be improved considerably.  The templated code it
.       generates is slow to compile and ALWAYS allocates parse
.       tree nodes at every symbol.
.   - I want to experiment with making Pegged (or a branch of it) 
emit
.       DFA/Packrat parser hybrids.  This could be done by making 
a
.       version of Pegged that uses the aforementioned
.       parser-generator API to create its parsers.
.   - Design principle:  avoid memory allocations like the plague.
.       The output should be a well-pruned AST, and not just a 
parse
.       tree that causes a bunch of allocations and needs 
massaging to
.       become useful.
.   - I really like Pegged and would contribute this stuff 
upward, if
.       accepted.

.- When hacking on xdc, you don't need to be aware of WHEN your 
code
.    code gets executed in semantic analysis.  The dependency 
analysis
.    will guarantee that it always gets performed both
.    (a) when it's needed, and (b) when it has what it needs.
.   - This is what the "consumes" and "produces" variables are all
.       about in the above example.

.- Successfully lowering a D AST into the target backend's input 
will
.    almost certainly require multiple passes.  xdc's dependency
.    analyzer would automatically minimize the number of passes by
.    looking for patterns that are "siblings" in the dependency 
graph
.    (eg. neither depends on the other) and bunching as many such
.    patterns as possible into each pass.
.   - It really shouldn't generate very many more than the number 
of
.       passes that DMD has coded into it.  Ideally: no more than 
DMD,
.       if not fewer.
.   - I'd like to make the dependency analyzer output a graph that
.       can be used to track which patterns cause which passes to
.       exist, and show which patterns are in which passes.

.- Planned availability of backends.
.   - My first pick for a backend would be an ANSI C89 target.  I 
feel
.       that this would give it the most reach.
.   - The interpreter backend is along for the ride, as mentioned.
.   - Because the semantic analysis is composed of distinct and
.       loosely-coupled patterns, it is possible for xdc to 
generate
.       an analysis chain with the minimum number of lowerings 
needed
.       for a given backend.
.      - The interpreter backend would benefit from having the 
most
.          lowerings.  By requiring a lot of lowering, the 
interpreter
.          would only need to support a small number of 
constructs:
.         - if statements
.         - gotos
.         - function calls
.         - arithmetic expression evaluation
.         - builtin types (byte, short, int, long, float, double, 
etc)
.         - pointers
.         - Even structs are unnecessary: they can be seen as
.             typed dereferencing of untyped pointers.
.      - The C backend would benefit from slightly less lowering 
than
.         the interpreter backend.  It is useful for debugging if
.         you can mostly-sorta read the resulting C code, and your
.         C compiler will appreciate the extra optimization
.         opportunities.
.         - Looping constructs like while and for are welcome 
here.
.         - structs would be more readable.

.          different set of lowerings in later passes.
.         - Pointers are no longer considered "low".
.         - Classes should be kept as long as possible;
.             I'm pretty sure they bytecode (at least for Java)
.             has opcodes dedicated to classes.  Removing them
.             may cause pessimisation.
.      - The backend writer should not have to worry about 
rewriting
.          the semantic analysis to suit their needs.  They just 
define
.          some features and say which ones they need available 
in the
.          AST, and xdc's semantic-analysis-generator will handle 
the
.          rest.
.   - Notably, a backend should just be more lowerings, with the
.       result being text or binary code instead of AST nodes.
.      - Backends are essentially defined by the set of 
AST/language
.          features that they consume and any special lowerings 
needed
.          to convert generic AST/language features into
.          backend-specific AST/language features.


== Closing Thoughts ==

I am realizing that there are multiple reasons that compel me to 
write this document:
- To share my ideas with others, on the off-chance that someone 
else might see this vision too and be better equipped to deliver.
- To suggest capabilities that any community-endorsed compiler 
tool (ex: compiler-as-a-ctfe-library) should have.
- To see if I might be able to get the help I need to make it a 
reality.

I just can't decide which reasons are more important.  But there 
is a common thread: I want this vision to become reality and do 
really cool things while filling a bunch of missing links in D's 
ecosystem.

I have to ask:

Would you pay for this?
If so, then I might be able to do a kickstarter at some point.
I am not independently wealthy or retired (or both?) like Walter, 
nor am I able to survive on zero hours of sleep each night like 
Andrei, and this would be a big project.  I think it would need 
full-time attention or it would never become useful in a 
reasonable timeframe.

Also, assuming you understand the design, are there any gaping 
holes in this?
This is my first attempt to share these ideas with a larger 
group, and thus an opportunity to anticipate troubles.

...

Well, I'm anxious to see how well the venerable D community 
receives this bundle of ideas.  Be chatty.  I'll try to keep up.

Thank you for reading.

Jul 17 2013

"Chad Joan" <chadjoan gmail.com> writes:

Crud, I miscalculated the line wrap on the web reader a lot.  
Sorry about that.

Jul 17 2013

"Chad Joan" <chadjoan gmail.com> writes:

For the web forum interface users, I have re-wrapped the text in 
my outline.  Hopefully this will look better!  If not, please try 
this pastebin version: http://pastebin.com/Twc9ZUnQ

I'd like to present my vision for a new D compiler.  I call it 
xdc, a loose abbreviation for "Cross D Compiler" (if confused, 
see 
http://english.stackexchange.com/questions/37394/why-do-some-words-have
x-as-a-substitute). 
  It could also mean other fun things like "Crossbone D Compiler" 
(I imagine a logo with some crossbones having a metal D atop 
where the skull normally goes), "eXperimental D Compiler", or 
something sounding like "ectasy" ;)

We usually think of language features as being rewritten into 
simpler features.  The simple features eventually get rewritten 
into machine instructions.  Compilers are, fundamentally, 
responsible for performing "lowering" operations.

It makes sense to me, then, to make a compiler whose internals 
/look/ like a bunch of these rewrites and lowering operations.  
There should be a bunch of input patterns matched to the desired 
results.  This has the happy side-effect of giving us a pleasant 
way to do AST manipulations from within D code.

I've also had a long-standing desire to see D on many more 
platforms.  It should make an appearance on console platforms and 
on smartphones.  I've tried doing this with a retargetable 
compiler like GCC before, and the work was surprisingly large.  
Even if the compiler already emits code for the target system's 
CPU, there are still a large number of details involving calling 
conventions, executable format, and any number of CPU/OS specific 
tweaks to object file output.  It makes a lot more sense to me to 
just output C/C++ code and feed that to a native toolchain.  That 
would skip a lot of the platform-specific nonsense that creates a 
barrier to entry for people who, say, just want to write a simple 
game for Android/iPhone/PS(3|4)/etc in D, and don't want to 
become compiler experts first.  Ideally, some day, this compiler 
would also emit code or bytecode for Javascript, AS3/Flash, Java, 
and any other popular targets that the market desires.  This can 
probably be done with DMD, but I'd like to make the process more 
approachable, and make backend authoring as easy as possible.  It 
should be possible (and easy) to tell the compiler exactly what 
lowerings should be applied before the AST hits the backend.

xdc should bring all of that cross-platform targeting together 
with a compiler infrastructure that can blow everything else away 
(I hope!).

xdc is my dream for a D compiler that gives us our first step (of 
few) towards having what haXe has already (http://haxe.org/) : a 
compiler that can land code just about anywhere.

What follows is a collection of my thoughts written down as notes.

== Ideal Outcomes ==

.- D to C/C++ compiler that can easily reach target platforms that
.    are currently either unsupported or poorly supported by
.    current D compilers.
.   - Useful for game developers wishing to write D code on the
.       next big console platform.
.   - Useful for embedded systems developers wishing to write
.       D code on obscure or potentially proprietary
.       microcontrollers.

.- Other backends (ex: llvm, Java bytecode, AS3/Flash bytecode,
.    etc) possible in the future.  Community desires considered
.    when selecting new backend targets.

.- Interpreter backend: a notable backend that would be
.    implemented as a necessity for making CTFE work.  A dedicated
.    interpreter backend would hopefully be much faster and easier
.    on memory than DMD's souped-up constant folder.  (Even though
.    DMD has become considerably better at this in the past year
.    or two.)

.- Abstract Syntax Tree (AST) manipulation toolchain, possibly
.    usable in CTFE.  It would work like this:
.    (1) Pass some D code or Domain Specific Language (DSL) of
.          your choice (as text) into xdc at compile-time.
.    (2) xdc returns an AST.
.    (3) Use xdc's pattern-matching and substitution DSL to
.          manipulate the AST.
.    (4) xdc consumes the AST and emits modified D code.
.    (5) mixin(...) the result.
.   - If xdc is the compiler used to execute itself in CTFE, then
.       it might be possible to optimize this by having it expose
.       itself as a set of intrinsics.

.- Reference counting available by default on all platforms.
.   - Gets you into the action with minimal effort and little or
.       no compiler hacking. (More complete GC tends to require
.       platform specific ASM and/or operating system API
.       support).

.- Full garbage collection available if supported.
.   - Ex: The C backends would default to ref-counting until the
.       ASM and OS-level code is written to support full GC.
.   - Ex: A Java backend would probably use the Java JVM by
.       default.

.- Threading model determined by compiler configuration or
.    available platform hints.
.   - Ex: The user may have a posix-threads implementation
.       available, but know little other details about the target
.       system.  It should be possible for xdc to use pthreads to
.       emulate the TLS and synchronization mechanisms needed to
.       make D tick.  (Or at least emulate as many features as
.       possible.)
.   - Ex: Possible "no threading" target for people who don't need
.       threading but DO need other D features to be available NOW
.       on an alien platform.  Errors when the source code passed
.       into xdc assumes that threading features are present.

.- D compiler that is easy to hack on.
.   - "Looks like the problem it solves."
.       (To quote Walter's DConf2013 keynote.)
.   - Made of a bunch of patterns that describe
.       code rewrites/lowerings.
.   - Few or no null value checks necessary.
.      - null checks don't look like pattern matching or lowering.
.   - Few or no convoluted if-while-if-for-etc nests.
.      - These also don't look like pattern matching or lowering.
.   - It should be largely made of "pattern handlers" (see below).
.   - Each pattern handler will have one part that closely
.       resembles the AST fragment for the D code that it
.       recognizes, and another part that resembles the lowered
.       form that it outputs.
.   - Dependency analysis that prevents your AST manipulation from
.       happening either too early or too late.
.   - Because the code that actually does lowerings is generated
.       from a DSL, it is possible to make it automate a lot of
.       tedious tasks, like updating the symbol table when nodes
.       are added or removed from the AST.
.   - This makes it easier to test experimental features.

.- A step-by-step view of what the compiler is doing to your code.
.   - Since the semantic analysis of xdc would be composed of
.      "pattern handlers" (see below), then each time one of them
.      completes the compiler could output the result of calling
.      .toString() (or .toDCode() or whatever) on the entire AST.
.   - This could be attached to an ncurses interface that would be
.      activated by passing a flag to the compiler, which would
.      then proceed to show the AST at every stage of compilation.
.      Press ENTER to see the next step, etc.
.   - This could also be exposed as API functionality that IDEs
.      could use to show developers how the compiler sees their
.      code.

.- D code analysis engine that might be usable to automatically
.    translate D1 code into D2 code, or maybe D2 into D3 in the
.    far future.

== Architectural Overview ==

.- xdc will largely consist of "pattern handlers" that recognize
.    patterns in its AST and replace them with AST fragments that
.    contain successively fewer high-level features (lowering).
.   - These pattern handlers would feature a DSL that should make
.       the whole task fairly easy.
.   - The DSL proposed would be similar to regular expressions in
.       semantics but different in syntax.
.      - It will have operators for choice, repetition, optional
.          matches, capturing, and so on.
.      - The DSL must support nested structures well.
.      - The DSL must support vertical layout of patterns well.
.      - Because of the vertical patterns, most operators will
.          either be prefix or will be written in block style:
.          some_block_header { block_stmt1; block_stmt2; etc; }
.      - Actions like entering and leaving nodes are given their
.          own special syntax.  The machine will treat them like
.          tokens that can be matched the same as any AST node.
.          Notably, node-entry and node-exit do not require
.          introducing non-regular elements to the DSL.
.          node-entry and node-exit may be subsumed into
.          Deterministic Finite Automatons (DFAs).
.   - An example pattern handler might look like this:

const lowerWhileStatement =
{
	// Apologies in advance if this isn't actually valid D code:
	//   This is a design sketch and I currently don't have a way to 
compile it.
	//
	// The Pattern template, PatternMatch template, and 
PatternHandler class
	//   have not yet been written.  This is an example of how I 
might expect
	//   them to be used.
	//

	auto consumes = "while_statement";
	auto produces = "if_statement","goto","label");
	
	auto recognizer = Pattern!
		"WhileStatement has
		{
			// Capture the conditional expression (call it \"expr\") and
			//   capture the loop body (call it \"statement\").
			.expression $expr;
			.statement  $statement has
			{
				// Capture any continue/break statements.
				any_amount_of {
					any_amount_of .; // Same as .* in regexes.
					one_of
					{
						ContinueStatement $continues;
						BreakStatement    $breaks;
					}
				}
				any_amount_of .;
			}
		}";
	
	auto action = (PatternMatch!(recognizer) m)
	{
		m.captures.add("uniqLoopAgain", 
getUniqueLabel(syntaxNode.enclosingScope))
		m.captures.add("uniqExitLoop", 
getUniqueLabel(syntaxNode.enclosingScope))
		
		// The "recognizes" clause defines m.getCapture!"continues" 
with:
		//   "ContinueStatement $continues;"
		// That line appears in a repitition context ("any_amount_of") 
and is
		//   therefore typed as an array.
		foreach( ref node; m.getCapture!"continues" )
			node.replaceWith( m, "GotoStatement has $uniqLoopAgain" )
		
		// Ditto for m.getCapture!"breaks" and "BreakStatement 
$breaks;".
		foreach( ref node; m.getCapture!"breaks" )
			node.replaceWith( m, "GotoStatement has $uniqExitLoop" )
	};
	
	auto synthesizer = Pattern!
		"Label has $uniqLoopAgain
		IfStatement has
		{
			OpNegate has $expr
			GotoStatement has $uniqExitLoop
		}
		$statement
		GotoStatement has $uniqLoopAgain
		Label has $uniqExitLoop
		";

	return new PatternHandler(produces, consumes, recognizer, 
action, synthesizer);
};

(Also available at: http://pastebin.com/0mBQxhLs )

.- Dispatch to pattern handlers is performed by the execution of a
.    DFA/Packrat hybrid instead of the traditional OOP inheritance
.    with method calls.
.   - Each pattern handler's recognizer gets treated like a regex
.       or Parsing Expression Grammar (PEG) fragment.
.   - All of the recognizers in the same semantic pass are pasted
.       together in an ordered-choice expression.  The ordering is
.       determined by dependency analysis.
.   - A recognizer's pattern handler is invoked when the
.       recognizer's AST expression is matched.
.   - Once any substitutions are completed, then the machine
.       executing the pattern engine will set its cursor to the
.       beginning of the newly substituted AST nodes and continue
.       running.
.   - Executing potentially hundreds of pattern handlers in a
.       single ordered-choice expression would be obnoxious for a
.       packrat parser (packrat machine?).  Thankfully, ordered-
.       choice is possible in regular grammars, so it can be
.       lowered into regex operations and the whole thing turned
.       into a DFA.
.   - If pattern recognizers end up needing recursive elements,
.       then they will probably not appear at the very beginning
.       of the pattern.  Patterns with enough regular elements at
.       the start will be able to merge those regular elements
.       into the DFA with the rest of the pattern recognizers, and
.       it all becomes very fast table lookups in small tables.

.- This compiler would involve the creation of a parser-generator
.    API that allows code to programmatically create grammars, and
.    to do so without a bunch of clumsy string formatting and
.    string concatenation.
.   - These grammars could be written such that things like AST
.       nodes are seen as terminals.  This expands possibilities
.       and allows all of the pattern handlers to be coalesced
.       into a grammar that operates on ASTs and fires off
.       semantic actions whenever one of the recognizer patterns
.       gets tickled by the right AST fragment.
.   - Using strings as terminals is still cool; and necessary for
.       xdc's text/D-code parser.
.   - A simple parser-generator API example:

---------------------------------------
string makeParser()
{
	auto builder = new ParserBuilder!char;
	builder.pushSequence();
		builder.literal('x');
		builder.pushMaybe();
			builder.literal('y');
		builder.pop();
	builder.pop();
	return builder.toDCode("callMe");
}

const foo = makeParser();

pragma(msg, foo);
---------------------------------------
Current output:
http://pastebin.com/V3E0Ubbc
---------------------------------------

.   - Humans would probably never directly write grammars using
.       this API; it is intended for use by code that needs to
.       write grammars.  xdc would be such code: it's given a
.       bunch of pattern handlers and needs to turn them into a
.       grammar.
.   - This API could also make it easier to write the parser
.       generators that humans /would/ use. For example, it could
.       be used as an experimental backend for a regular
.       expression engine that can handle limited recursion.
.   - The packrats usually generated from PEGs are nice and all,
.       but I'd really like to generate DFAs whenever possible,
.       because those seem to be regarded as being /very fast/.
.   - DFAs can't handle the recursive elements of PEGs, but they
.       should be able to handle everything non-recursive that
.       precedes or follows the recursive elements.
.   - The parser-generator API would be responsible for
.       aggressively converting PEG-like elements into regex/DFA
.       elements whenever possible.
.   - Regular expressions can be embedded in PEGs as long as you
.       tell them how much text to match.  You have to give them
.       concrete success/failure conditions that can be determined
.       without help from the rest of the PEG: things like "match
.       as many characters as possible" or "match as few
.       characters as possible".  Without that, the regex's
.       backtracking (DFA'd or otherwise) won't mesh with the PEG.
.       Give it a concrete win/fail condition, however, and the
.       embedded regex becomes just another PEG building block
.       that chews through some source material and yields a
.       yes/no result.  Such regular expressions allow DFAs to be
.       introduced into a recursive descent or packrat parser.
.   - Many PEG elements can be converted into these well-behaved
.       regular expressions.
.      - PEG repetition is just regular expression repetition with
.          a wrapper around it that says "match as many characters
.          as possible".
.      - PEG ordered choice can be lowered into regular expression
.          unordered choice, which can then be converted into
.          DFAs:
.          I suspect that this is true:
.             (uv/xy)c == (uv|(^(uv)&xy))c
.          or, by De Morgan's law:
.             (uv/xy)c == (uv|(^(uv|^(xy))))c
.          & is intersection.
.          ^ is negation.
.          Each letter (u,v,x,y,c) can be a subexpression
.            (non-recursive).
.      - PEG label matching can be inlined up to the point where
.          recursion occurs, thus allowing more elements to be
.          considered for DFA conversion.
.      - etc.

.- The parser would be defined using a PEG (most likely using
.    Pegged specifically).
.   - Although Pegged is an awesome achievement, I suspect its
.       output could be improved considerably.  The templated code
.       it generates is slow to compile and ALWAYS allocates parse
.       tree nodes at every symbol.
.   - I want to experiment with making Pegged (or a branch of it)
.       emit DFA/Packrat parser hybrids.  This could be done by
.       making a version of Pegged that uses the aforementioned
.       parser-generator API to create its parsers.
.   - Design principle:  avoid memory allocations like the plague.
.       The output should be a well-pruned AST, and not just a
.       parse tree that causes a bunch of allocations and needs
.       massaging to become useful.
.   - I really like Pegged and would contribute this stuff upward,
.       if accepted.

.- When hacking on xdc, you don't need to be aware of WHEN your
.    code gets executed in semantic analysis.  The dependency
.    analysis will guarantee that it always gets performed both
.    (a) when it's needed, and (b) when it has what it needs.
.   - This is what the "consumes" and "produces" variables are all
.       about in the above example.

.- Successfully lowering a D AST into the target backend's input
.    will almost certainly require multiple passes.  xdc's
.    dependency  analyzer would automatically minimize the number
.    of passes by looking for patterns that are "siblings" in the
.    dependency graph (eg. neither depends on the other) and
.    bunching as many such patterns as possible into each pass.
.   - It really shouldn't generate very many more than the number
.       of passes that DMD has coded into it.  Ideally: no more
.       than DMD, if not fewer.
.   - I'd like to make the dependency analyzer output a graph that
.       can be used to track which patterns cause which passes to
.       exist, and show which patterns are in which passes.

.- Planned availability of backends.
.   - My first pick for a backend would be an ANSI C89 target.
.       I feel that this would give it the most reach.
.   - The interpreter backend is along for the ride, as mentioned.
.   - Because the semantic analysis is composed of distinct and
.       loosely-coupled patterns, it is possible for xdc to
.       generate an analysis chain with the minimum number of
.       lowerings needed for a given backend.
.      - The interpreter backend would benefit from having the
.          most lowerings.  By requiring a lot of lowering, the
.          interpreter would only need to support a small number
.          of constructs:
.         - if statements
.         - gotos
.         - function calls
.         - arithmetic expression evaluation
.         - builtin types (byte, short, int, long, float, etc)
.         - pointers
.         - Even structs are unnecessary: they can be seen as
.             typed dereferencing of untyped pointers.
.      - The C backend would benefit from slightly less
.         than the interpreter backend.  It is useful for
.         debugging if you can mostly-sorta read the resulting
.         C code, and your C compiler will appreciate the extra
.         optimization opportunities.
.         - Looping constructs like while and for are welcome
.             here.
.         - structs would be more readable.

.          different set of lowerings in later passes.
.         - Pointers are no longer considered "low".
.         - Classes should be kept as long as possible;
.             I'm pretty sure they bytecode (at least for Java)
.             has opcodes dedicated to classes.  Removing them
.             may cause pessimisation.
.      - The backend writer should not have to worry about
.          rewriting the semantic analysis to suit their needs.
.          They just define some features and say which ones they
.          need available in the AST, and xdc's semantic-analysis-
.          generator will handle the rest.
.   - Notably, a backend should just be more lowerings, with the
.       result being text or binary code instead of AST nodes.
.      - Backends are essentially defined by the set of
.          AST/language features that they consume and any special
.          lowerings needed to convert generic AST/language
.          features into backend-specific AST/language features.


== Closing Thoughts ==

I am realizing that there are multiple reasons that compel me to 
write this document:
- To share my ideas with others, on the off-chance that someone 
else might see this vision too and be better equipped to deliver.
- To suggest capabilities that any community-endorsed compiler 
tool (ex: compiler-as-a-ctfe-library) should have.
- To see if I might be able to get the help I need to make it a 
reality.

I just can't decide which reasons are more important.  But there 
is a common thread: I want this vision to become reality and do 
really cool things while filling a bunch of missing links in D's 
ecosystem.

I have to ask:

Would you pay for this?
If so, then I might be able to do a kickstarter at some point.
I am not independently wealthy or retired (or both?) like Walter, 
nor am I able to survive on zero hours of sleep each night like 
Andrei, and this would be a big project.  I think it would need 
full-time attention or it would never become useful in a 
reasonable timeframe.

Also, assuming you understand the design, are there any gaping 
holes in this?
This is my first attempt to share these ideas with a larger 
group, and thus an opportunity to anticipate troubles.

...

Well, I'm anxious to see how well the venerable D community 
receives this bundle of ideas.  Be chatty.  I'll try to keep up.

Thank you for reading.

Jul 17 2013

"Kagamin" <spam here.lot> writes:

llvm should be platform-independent enough, so you have ldc. But 
the problem with D is that it has more features than C/C++, so if 
you output C code you can only use C features, or you have to 
implement druntime+phobos for the target platform, which doesn't 
come for free.

Jul 18 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Thursday, 18 July 2013 at 10:26:24 UTC, Kagamin wrote:
 llvm should be platform-independent enough, so you have ldc. 
 But the problem with D is that it has more features than C/C++, 
 so if you output C code you can only use C features, or you 
 have to implement druntime+phobos for the target platform, 
 which doesn't come for free.

I think a C backend would get us farther than an LLVM backend.  
Imagine targeting one of the ubiquitous ARM targets like Android, 
and its gazillion variants.  LLVM has an ARM target I'm pretty 
sure, but the Android community uses GCC as their compiler.  This 
puts LLVM/LDC on Android into a "may or may not work" category 
(unless they've done it already while I wasn't looking).  
Emitting C/C++ code and feeding it to the Android NDK though: 
that will work for sure, or at least land incredibly close.  
Rinse and repeat for PIC, Cypress, iPhone, XBox (any of them), 
Wii, Ouya, PS3, PS4, 3DS, and whatever else might come out a year 
from now.  I suspect that LLVM will be missing support for a 
bunch of those, and lag new platforms significantly; not unless 
it just happens to exist in their "ecosystem".

I think that LLVM might have a maintained C backend again.  It 
had been unmaintained/unsupported for a while.  If someone thinks 
that using LLVM's C backend makes more sense, then they should do 
it with LDC.  I'd be really happy about it, because I want that 
functionality.

I am not interested in using LLVM as my /first/ backend in xdc 
for a combination of the above reasons and their implications:
.- LLVM emitting native code for xdc will require someone
.    knowledgable with LLVM to set things up and potentially
.    spend time on it if the target platform requires tweaking.
.    Getting handed a file full of C code requires only knowledge
.    of how to use the C compiler on the target system.
.- LLVM emitting C code should probably be done by ldc instead.
.- LLVM emitting C code would also add unnecessary layers:
.    D->AST->LLVM_IR->C vs D->AST->C.
.    This extra layer can lose information.

You are right that druntime+phobos don't come for free.  Doing 
complete ports of those to platform X is outside the scope of 
this project.  I do intend to salvage whatever parts of them I 
can in a portable manner.  Moreover, I intend xdc to have the 
capability to toggle druntime/phobos features on and off easily 
based on existing support for the intended platform.  The 
C-Windows target would probably have more druntime/phobos 
features enabled than C-Android target.  I'd like to make sure 
that anyone can use what's available, and if not, still be 
allowed to access the platform.  I don't think it makes sense 
that I'd be completely unable to write iPhone code just because 
there isn't GC/threading code available for it in the runtime 
yet; I should be able to do some non-trivial things without 
those.  Note that a reference-counting implementation would be 
portable everywhere that doesn't have a native GC, so you'd never 
have anything less than that (unless you intentionally ditched 
it, ex: memory limited microcontrollers where things are 
statically allocated anyways).

To be more concrete, I intend to make it possible to invoke xdc 
with "hints", like so:
xdc --emit=C89 --hostc=gcc --os=windows --threads=pthreads main.d











Alternatively, it could be invoked like this:
xdc --emit=C89 main.d





Even with a conservative target like C89-only, there are still an 
incredibly large number of extremely useful D features (OOP, 
templates, scope(exit), CTFE, mixins, ranges, op-overloading, 
etc) that DO come for free.  These can be lowered into C code 
that does the same thing but looks uglier.  Such lowered code 
would take much longer for a human to write and become tedious to 
maintain, but a compiler can do it tirelessly.

Jul 19 2013

"Tove" <tove fransson.se> writes:

On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 Even with a conservative target like C89-only, there are still 
 an incredibly large number of extremely useful D features (OOP, 
 templates, scope(exit), CTFE, mixins, ranges, op-overloading, 
 etc) that DO come for free.

I love the idea behind xdc, but I would go with C99 instead, even 
MS as the last vendor(?), with VS2013 now finally committed to 
supporting C99, variable length arrays alone would make it worth 
it.

Jul 19 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Friday, 19 July 2013 at 15:39:36 UTC, Tove wrote:
 On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 Even with a conservative target like C89-only, there are still 
 an incredibly large number of extremely useful D features 
 (OOP, templates, scope(exit), CTFE, mixins, ranges, 
 op-overloading, etc) that DO come for free.

 I love the idea behind xdc, but I would go with C99 instead, 
 even MS as the last vendor(?), with VS2013 now finally 
 committed to supporting C99, variable length arrays alone would 
 make it worth it.

I am, in fact, totally willing to make --emit=C99 do cool things 
once I discover what those cool things are.  Otherwise it will 
probably emit code that is both C89 and C99 compliant.

I feel that most of the D features that can be implemented with a 
C99 compiler can be implemented with C89 as well, and C89 might 
give more reach into esoteric targets like microcontrollers or 
legacy systems.

Maybe I should ask this: what D features are you afraid of losing 
due to lowering into C89?

...

I hate to sour whatever cheerful expectations you might have, 
but, AFAIK, D doesn't have VLAs.  Consequently, I don't think xdc 
would need them.  I hear that VLA support is becoming optional in 
C11 as well, which doesn't bode well for its potential existance 
in the future, see 
http://en.wikipedia.org/wiki/Variable-length_array

"Programming languages that support VLAs include [...] C99 (and 
subsequently in C11 relegated to a conditional feature which 
implementations aren't required to support; ..."

It is important to note that even if D /did/ have VLAs, then it 
would be possible to lower them into other constructs:

void foo(int n)
{
	char[n] vla;
	...
}

-= can be emulated by =-

void foo(int n)
{
         char[] vla = (cast(char*)std.c.stdlib.malloc(n))[0..n];
         scope(exit) std.c.stdlib.free(vla.ptr);
	...
}

This would be important for emitting to any language without 
VLAs.  It could be used if someone wanted to add VLAs to xdc as 
an (off-by-default) experimental feature.  And of course it 
should use "new T[n]" with core.memory.GC.free if the array 
contains pointers, or stdlib is unavailable (ex: Java).  I would 
also expect --emit=C99 to avoid the heap allocation.  alloca 
might be usable with C89 in some cases, but is very dangerous (in 
my experience).

----

I had a friend of mine who is an expert C programmer poke me 
about this C89 vs C99 thing as well.  It's strange to me because 
I thought that emitting C89 was actually a strong selling point: 
you're getting D which is so much more than C99, AND it will 
reach all of the old C89 compilers.

If there are still things that you (community inclusive) are 
afraid of missing, then I am pretty willing to do C99 instead and 
skip C89.  I will need to know what they are though, or it won't 
make a difference anyway (I can't implement what I don't know 
about!).

HTH.

Jul 19 2013

"Kagamin" <spam here.lot> writes:

On Saturday, 20 July 2013 at 04:18:41 UTC, Chad Joan wrote:
 If there are still things that you (community inclusive) are 
 afraid of missing, then I am pretty willing to do C99 instead 
 and skip C89.

A standard _Align attribute? You need it, right?

Jul 23 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Tuesday, 23 July 2013 at 15:24:39 UTC, Kagamin wrote:
 On Saturday, 20 July 2013 at 04:18:41 UTC, Chad Joan wrote:
 If there are still things that you (community inclusive) are 
 afraid of missing, then I am pretty willing to do C99 instead 
 and skip C89.

 A standard _Align attribute? You need it, right?

I didn't know that was standard in C99.

I'm looking through ISO/IEC 9899:1999 (n1256) and not finding it. 
  That'd be cool to know about; any chance you can point it out?

At any rate, I'm actually not sure if you mean member alignment 
or memory alignment, but I'm pretty sure both are doable using 
char pointer arithmetic and casting.

Hmmm, member alignment would be annoying, but still doable:

// D
struct Foo
{
align(1):
	ubyte a;
	ushort b;
	ubyte c;
}

int main()
{
	Foo f;
	f.a = 1;
	f.b = 2;
	f.c = 3;
	return 1;
}

/* C89 */
int main()
{
	char f[4];
	*((uint8_t*)(f+0)) = 1;
	*((uint16_t*)(f+1)) = 2;
	*((uint8_t*)(f+3)) = 3;
	return 1;
}

Caveat: untested code written in a couple minutes.

Jul 23 2013

"Kagamin" <spam here.lot> writes:

It's probably C11. It allows only enlarging the alignment, 
because it's not cross-platform the other way.

Jul 25 2013

"Joakim" <joakim airpost.net> writes:

On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 Imagine targeting one of the ubiquitous ARM targets like 
 Android, and its gazillion variants.  LLVM has an ARM target 
 I'm pretty sure, but the Android community uses GCC as their 
 compiler.  This puts LLVM/LDC on Android into a "may or may not 
 work" category (unless they've done it already while I wasn't 
 looking).

A small correction, the Android NDK added clang/llvm support last 
November in revision 8c:

http://developer.android.com/tools/sdk/ndk/index.html

gcc is still the default, but clang is an alternative option.  I 
don't think ldc has been ported to use whatever llvm libraries 
the NDK is providing, but there is some official support for llvm 
on Android now.

Jul 19 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Friday, 19 July 2013 at 16:42:32 UTC, Joakim wrote:
 On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 Imagine targeting one of the ubiquitous ARM targets like 
 Android, and its gazillion variants.  LLVM has an ARM target 
 I'm pretty sure, but the Android community uses GCC as their 
 compiler.  This puts LLVM/LDC on Android into a "may or may 
 not work" category (unless they've done it already while I 
 wasn't looking).

 A small correction, the Android NDK added clang/llvm support 
 last November in revision 8c:

 http://developer.android.com/tools/sdk/ndk/index.html

 gcc is still the default, but clang is an alternative option.  
 I don't think ldc has been ported to use whatever llvm 
 libraries the NDK is providing, but there is some official 
 support for llvm on Android now.

Cool, thanks for the info.

It's been a while since I've looked at the Android NDK.  It's 
just not fun to me without D being there ;)

Jul 19 2013

"Kai Nacke" <kai redstar.de> writes:

On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 I think a C backend would get us farther than an LLVM backend.

Hi,

LLVM has a C++ backend in the git tree. The old C backend is 
still maintained outside the git tree (search the dev mailing 
list for an url).

So if you like C-output, you can start with LDC today. For sure, 
you have to port druntime to this new environemnt...

Regards,
Kai

Nov 11 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Monday, 11 November 2013 at 08:11:06 UTC, Kai Nacke wrote:
 On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 I think a C backend would get us farther than an LLVM backend.

 Hi,

 LLVM has a C++ backend in the git tree. The old C backend is 
 still maintained outside the git tree (search the dev mailing 
 list for an url).

 So if you like C-output, you can start with LDC today. For 
 sure, you have to port druntime to this new environemnt...

 Regards,
 Kai

Is there any built-in support for using this C++ backend in LDC 
right now?  Something like "ldc --target=c++ main.d -o main.cpp"?

This seems very promising.

I would still want to write xdc for other reasons, but I have 
more immediate use for a D compiler that can output C/C++ code.  
If it works, I would probably down-prioritize C/C++ output in xdc 
and instead retarget on something more needed (like Java bytecode 
or Javascript).

Nov 11 2013

"David Nadlinger" <code klickverbot.at> writes:

On Tuesday, 12 November 2013 at 01:32:16 UTC, Chad Joan wrote:
 On Monday, 11 November 2013 at 08:11:06 UTC, Kai Nacke wrote:
 On Friday, 19 July 2013 at 13:38:12 UTC, Chad Joan wrote:
 I think a C backend would get us farther than an LLVM backend.

 Hi,

 LLVM has a C++ backend in the git tree. The old C backend is 
 still maintained outside the git tree (search the dev mailing 
 list for an url).

 So if you like C-output, you can start with LDC today. For 
 sure, you have to port druntime to this new environemnt...

 Regards,
 Kai

 Is there any built-in support for using this C++ backend in LDC 
 right now?  Something like "ldc --target=c++ main.d -o 
 main.cpp"?

Careful: The "cpp" LLVM backend actually creates C++ code that 
constructs the corresponding LLVM IR and is mostly useful for 
developers working on LLVM-based compilers.

But as Kai mentioned, there also is backend that emits equivalent 
C. Last time I checked, it was still being worked on, even though 
it isn't in the official LLVM source tree.

David

Nov 12 2013

"cal" <callumenator gmail.com> writes:

On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:
[...]

Is the input to xdc a semantically-analyzed D AST, or does 
semantic analysis occur during pattern-matching/lowering?

Jul 19 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Saturday, 20 July 2013 at 04:38:20 UTC, cal wrote:
 On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:
 [...]

 Is the input to xdc a semantically-analyzed D AST, or does 
 semantic analysis occur during pattern-matching/lowering?

The latter.

xdc would accept D code as text input (.d files) and parse it to 
produce its own AST.  Semantic analysis is then done by matching 
patterns in the AST and doing substitutions until all that's left 
are the AST nodes the backend wants.  The backend then matches 
patterns and emits the desired output (instead of substituting 
AST nodes).

Jul 19 2013

"deadalnix" <deadalnix gmail.com> writes:

On Saturday, 20 July 2013 at 04:44:38 UTC, Chad Joan wrote:
 On Saturday, 20 July 2013 at 04:38:20 UTC, cal wrote:
 On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:
 [...]

 Is the input to xdc a semantically-analyzed D AST, or does 
 semantic analysis occur during pattern-matching/lowering?

 The latter.

 xdc would accept D code as text input (.d files) and parse it 
 to produce its own AST.  Semantic analysis is then done by 
 matching patterns in the AST and doing substitutions until all 
 that's left are the AST nodes the backend wants.  The backend 
 then matches patterns and emits the desired output (instead of 
 substituting AST nodes).

I'm not sure how you'll handle all compile time features.

Jul 19 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Saturday, 20 July 2013 at 06:28:03 UTC, deadalnix wrote:
 On Saturday, 20 July 2013 at 04:44:38 UTC, Chad Joan wrote:
 On Saturday, 20 July 2013 at 04:38:20 UTC, cal wrote:
 On Thursday, 18 July 2013 at 03:26:10 UTC, Chad Joan wrote:
 [...]

 Is the input to xdc a semantically-analyzed D AST, or does 
 semantic analysis occur during pattern-matching/lowering?

 The latter.

 xdc would accept D code as text input (.d files) and parse it 
 to produce its own AST.  Semantic analysis is then done by 
 matching patterns in the AST and doing substitutions until all 
 that's left are the AST nodes the backend wants.  The backend 
 then matches patterns and emits the desired output (instead of 
 substituting AST nodes).

 I'm not sure how you'll handle all compile time features.

To be honest, I hadn't yet written down what these would look 
like.  However, I did have an idea of what I wanted them to look 
like.

So, I will try to write down my thoughts on the subject.  Here, 
have a wall of text ;)

For Templates:

Note that the machine running the pattern-match-and-replace is 
really only constrained to a DFA/Packrat when it is recognizing.  
Once a pattern is recognized, D code may be invoked to do a 
(hopefully minimal) amount of computation.  Also, when a pattern 
is substituted, then it may return to the beginning of the 
substitution.

The set-the-cursor-at-begging-of-substitution thing is something 
I'm not entirely sure of yet, but it seems like a good way to 
avoid an explosion in the number of passes:
<cursor>auto foo = Templ!T();
<cursor>Templ!T foo = Templ!T();
<cursor>_D5TemplMangleMangle foo = Templ!T();
_D5TemplMangleMangle foo = <cursor>Templ!T();
_D5TemplMangleMangle foo = <cursor>Templ!T.__ctor();
_D5TemplMangleMangle foo = 
<cursor>_D5TemplMangleMangle6__ctorMangle();
...
The real deal wouldn't omit so many steps, but hopefully this 
conveys the usefulness.
Now, that /may/ be useful in template instantiation.  It is 
mostly for convenience though.  Still, it is noteworthy in that 
it is not in the conventional realm of 
DFAs/Packrats/Formal-Language-Theory: those things usually do not 
talk about what happens when the input is modified.  In other 
words, the use of DFAs/packrats in semantic analysis does not 
limit its computational power.

The more important thing for templates though is this: that part 
where the D code may be invoked after pattern recognition.  And 
it's important because it allows you to do more 
recognize-and-substitute without losing your place.  It allows a 
kind of recursion.  See the insides of the example 
"lowerValueTemplateInstantiation" handler I wrote in this post 
later on.

To be more concrete, I present a step-by-step followed by some 
examples of what I think the code might look like.

=================================================================

So imagine you have a template to be instantiated:
template Fib(uint i)
{
     static if ( i <= 1 )
         const Fib = i;
     else
         const Fib = Fib!(i-1) + Fib!(i-2);
}

Somewhere else, this appears:
     writefln("Fib == %s", Fib!4);

Suppose the parameter "i" is 4.  Then we really want to end up 
substituting this template with the following line:
   const _D4main11__T3FibVi4Z3Fibxk = 3;
and it should also emit these, as a side effect:
   const _D4main11__T3FibVi0Z3Fibxk = 0;
   const _D4main11__T3FibVi1Z3Fibxk = 1;
   const _D4main11__T3FibVi2Z3Fibxk = 1;
   const _D4main11__T3FibVi3Z3Fibxk = 2;

To save time and not bore you, I will cheat and set the cursor
at the template instantiation and skip all the writefln stuff.
Our lowering proceeds like so:

// We begin!

writefln("Fib == %s", <cursor>Fib!4 );

// "lowerValueTemplateInstantiation" catches the Fib!4.
// Make new xdc context; jump to the template declaration.
// Initial state:

template Fib(uint i)
{
     static if ( i <= 1 )
         const Fib = i;
     else
         const Fib = Fib!(i-1) + Fib!(i-2);
}

// Substitute i using "substituteTemplateParams"

template Fib(uint i)
{
     static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// Invoke "lowerTemplateDecl"

<cursor>template Fib(uint i)
{
     static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

/****************
But wait, "lowerTemplateDecl" doesn't meet it's constraint yet. 
consumes = "!static_if, !static_foreach" is not satisfied.
****************/
// Invoke "lowerStaticIf" (sorry, not written yet)

template Fib(uint i)
{
     <cursor>static if ( 4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

...

template Fib(uint i)
{
     static if ( <cursor>4 <= 1 )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerStaticIf" needs a literal here, not an expression.
// Invoke "constantFold" (sorry, not written yet. Uses CTFE.)

template Fib(uint i)
{
     <cursor>static if ( false )
         const Fib = 4;
     else
         const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerStaticIf" may now proceed and finish.

template Fib(uint i)
{
     const Fib = Fib!(4-1) + Fib!(4-2);
}

// "lowerTemplateDecl"'s !static_if constraint is now satisfied
// It mangles the constant identifier and moves it to the root.

const _D4main11__T3FibVi4Z3Fibxk = Fib!(4-1) + Fib!(4-2);

/****************
The newly substituted declaration gets subjected to further 
reductions, as that's what happens after a substitution.  Another 
pattern, let's call it "lowerConstDecl", notices the constant 
declaration sitting there with an /expression/ (oh dear) instead 
of a literal.  It invokes "constantFold".
****************/
// This starts whole process over again.  Repeatedly.

const _D4main11__T3FibVi0Z3Fibxk = 0;
const _D4main11__T3FibVi1Z3Fibxk = 1;

const _D4main11__T3FibVi2Z3Fibxk =
     _D4main11__T3FibVi1Z3Fibxk + _D4main11__T3FibVi0Z3Fibxk;

const _D4main11__T3FibVi3Z3Fibxk =
     _D4main11__T3FibVi2Z3Fibxk + _D4main11__T3FibVi1Z3Fibxk;

const _D4main11__T3FibVi4Z3Fibxk =
     _D4main11__T3FibVi3Z3Fibxk + _D4main11__T3FibVi2Z3Fibxk;

// Constant folding continues.

const _D4main11__T3FibVi0Z3Fibxk = 0;
const _D4main11__T3FibVi1Z3Fibxk = 1;
const _D4main11__T3FibVi2Z3Fibxk = 1;
const _D4main11__T3FibVi3Z3Fibxk = 2;
const _D4main11__T3FibVi4Z3Fibxk = 3;

/****************
There is nothing left to do here, so we return to the previous 
context.
****************/

writefln("Fib == %s", <cursor>Fib!4 );

// The instantiation figures out the mangling.

writefln("Fib == %s", <cursor>_D4main11__T3FibVi4Z3Fibxk );

// Done.  (for now)

writefln("Fib == %s", _D4main11__T3FibVi4Z3Fibxk );

=================================================================
== The more central code in all of this might look like so:

const lowerValueTemplateInstantiation =
{
     auto consumes = "value_template_instantiation";
     auto produces = "";

     auto recognizer = Pattern!
         // Ex: main.Fib!(4)
         "ValueTemplateInstantiation $valueTemplateInstatiation has
         {
             // Ex: main.Fib
             IdentifierPath $path;

             // Ex: (4)
             // We ask for LiteralExpr here to coerce the engine
             //   into doing constant folding on whatever
             //   expressions where in the argument tuple.
             ArgsTuple has
                 any_amount_of LiteralExpr $args;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         auto templateDeclNode =
             m.xdcContext.symbolLookup(m.getCapture!"path");

         // This is where the recursion happens:

         // First, we create a context that we can scope the
         //   template parameters into.
         auto context = m.xdcContext.push();
         scope(exit) m.xdcContext.pop();

         // Last, we use the new context to jump to another
         //   location in the AST and tell it to conquer some
         //   template declarations for us.
         context.declare("args", m.args);
         auto tmpNode =
             context.invoke!substituteTemplateParams(
                 templateDeclNode);
         context.invoke!lowerTemplateDecl(tmpNode);

         // Mangling will require its own recursive joy ride.
         // It will probably be simpler though, because it doesn't
         //   require any substitutions.
         // It helps that we've already ensured that all of the
         //   arguments to the template instantiation have been
         //   reduced to literals by this point, which will be
         //   possible to mangle.  (Attempting to mangle an
         //   arbitrary expression would probably be a throwable
         //   offense, or better yet, forces xdc to not compile.)
         VarExpr e = new VarExpr(
             m.getCapture!"valueTemplateInstatiation".mangle );

         m.captures.add("mangledSymbolReference", e);
     };

     auto synthesizer = Pattern!"$mangledSymbolReference";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
};

/* This does NOT get registered with the rest of the global
match/replace patterns.  It should only be invoked by template
instantiation handlers, not the xdc engine itself.
In particular, it needs the "args" context to be defined.
*/
const substituteTemplateParams =
{
     // It gets manually invoked, so no need to mention depends.
     auto consumes = "";
     auto produces = "";

     auto recognizer = Pattern!
         // Ex: main.Fib( uint i ) { ... }
         "TemplateDecl $template has
         {
             .parameterList $params;

             any_amount_of
             {
                 any_amount_of .;
                 VarExpr $var;
             }
             any_amount_of .;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         // Cop out: This might seem like stuff that would be
         //   suited to the pattern-DSL, but I am not yet sure
         //   that I want to even attempt to teach it how to
         //   generate lookup tables to accomplish this kind of
         //   substitute-from-backreference type of work.
         //   The syntax for that might be nasty anyways.
         //   For now, I'll implement it with this D code.

         AstNode[string] nameLookup;
         auto params = m.getCapture!"params";
         auto args = m.xdcContext.get!"args";

         foreach( i, ref param; params )
         {
             // Populate the lookup table.
             nameLookup[param.identifier] = i;

             // ... aaaand ...
             // Turn this copy of the template into an extremely
             //   specialized one where all of the parameters
             //   already have default values (or types).
             // This will probably be needed for mangling later.
             param.initializer = args[i].deepCopy();
         }

         // Substitute parameter names appearing in the template
         //   with the corresponding literal from the instatiating
         //   code.
         foreach( ref varExpr; m.getCapture!"var" )
         {
             size_t i = nameLookup[varExpr.identifier];
             varExpr.replaceWith( m, args[i].deepCopy() );
         }
     };

     // The necessary substitutions were too complicated for the
     //   pattern language.  Thus, they have already been handled
     //   in the action phase.  We leave the original template
     //   untouched.
     auto synthesizer = Pattern!"$template";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}

/* This does NOT get registered with the rest of the global
match/replace patterns.  It should only be invoked by template
instantiation handlers, not the xdc engine itself.
In particular, it needs the template parameters to have already
been substituted.
*/
const lowerTemplateDecl =
{
     // Rejecting static-if statements and static-foreach
     //   will force the invoking context to lower those into
     //   declarations before proceeding with this match attempt.
     auto consumes = "!static_if, !static_foreach";
     auto produces = "";

     auto recognizer = Pattern!
         "TemplateDecl $template has
         {
             any_amount_of { DeclStatements $decls; };
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         AstRootNode root = m.xdcContext.getRoot();
         foreach( ref decl; m.getCapture!"decls" )
         {
             decl.identifier = decl.mangle;
             decl.moveTo(root);
         }
     };

     auto synthesizer = Pattern!"";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}

// This pattern handler goes in the global ("all_semantic") set
//   of pattern handlers and will cause all template declarations
//   to disappear once all of the instantiations have been
//   completed.  This is the end of the line for all templates!
const cleanupTemplateDecls =
{
     auto consumes = "template_decl";
     auto produces = "";

     auto recognizer = Pattern!
         "Root $root has
         {
             any_amount_of
             {
                 any_amount_of not TemplateInstatiation;
                 TemplateDecl $templates;
             }
             any_amount_of not TemplateInstatiation;
         }";

     auto action = (PatternMatch!(recognizer) m)
     {
         foreach( ref template; m.getCapture!"templates" )
             template.removeFromTree();
     }

     auto synthesizer = Pattern!"$root";

     return new PatternHandler(
         produces, consumes, recognizer, action, synthesizer);
}


Of course I'm leaving out some things like template parameter 
specialization and template constraints.  I imagine that things 
like this (ex: overloading) will probably require some D code to 
handle.  This will likely be natural, since these kinds of things 
are usually described as a sequence of logical rules or some kind 
of filter.

=================================================================

As for CTFE... I'll have to write that down later.
As it is, it already took me a while to write down all of my 
thoughts on templates.

The original post already dropped a lot of hints though.  It 
pretty much involves lowering the to-be-executed code down to 
something that the interpreter backend can handle, and then 
invoking the interpreter on it.

=================================================================
Other notes and rambling:

I am actually going to go after CTFE very early on in xdc's 
development, specifically because it will be useful for constant 
folding, template instantiation, and (indirectly) maybe even 
strings.  The process might look like this:
- Implement simple builtin types and expressions (char, int,
     float, int*, 3+4, *(foo + 4), etc.).  No arrays, no strings.
- Implement CTFE using a very simple interpreter.
     This gives us an invokable constant folder.
- Implement structs.
- Implement templates.  Templates requiring strings will throw
     exceptions and fail to compile at this point.
     (remember: no strings!)
- Implement operator overloading.
- Implement arrays as a struct-template:
     struct Array(T) { T* ptr; private size_t len; ... }
     This code will only be visible to the compiler, and will be
     used in lowerings whenever needed.  This gets us strings.
- Implement string literals.  (This might get interesting, and
     may even depend on platform, but it should ultimately do
     something similar to calling
     Array!char.__ctor(char *data, size_t len).)
- Templates that use strings will now work.
- Implement reference counting so the whole thing doesn't leak
     memory like a sieve.

It'd look very different from D's actual history.  This is 
because I consider features like templates and CTFE to be very 
"low": we can rewrite a lot of other language features into them.

For situations where things like operator overloading can't 
accomplish what the original builtins could do, then there will 
probably be some necessary compiler magic.  I would apply it as 
conservatively as possible.

This might also get dynamically reconfigured depending on 
platform.  Some platforms might already have builtin string types 
that can be efficiently coerced into behaving like D strings.  In 
those cases you would want to avoid lowering strings into a 
ptr+length struct, and let the backend grab them first.  It may 
even be efficient/helpful to have the interpreter backend behave 
like such a platform and just operate on strings directly without 
first lowering them into a struct.

...

Hope that helps.

Jul 21 2013

"Tofu Ninja" <emmons0 purdue.edu> writes:

On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 Would you pay for this?
 If so, then I might be able to do a kickstarter at some point.
 I am not independently wealthy or retired (or both?) like 
 Walter, nor am I able to survive on zero hours of sleep each 
 night like Andrei, and this would be a big project.  I think it 
 would need full-time attention or it would never become useful 
 in a reasonable timeframe.

If you started a kick starter, I would put some money up, the 
problem with it is I am not sure you could get enough 
contributions for something like this unless the whole D 
community got behind it.

Jul 25 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Thursday, 25 July 2013 at 20:28:31 UTC, Tofu Ninja wrote:
 If you started a kick starter, I would put some money up, the 
 problem with it is I am not sure you could get enough 
 contributions for something like this unless the whole D 
 community got behind it.

Cool, thanks!

I'm willing to throw up a kickstarter and see how well supported 
it is at that point.  It'll just have to wait until I finish any 
commitments at my job.  Even if it doesn't get enough support, 
it'll be no harm trying.

Jul 25 2013

"Etienne" <etcimon gmail.com> writes:

Many vendors would have their processors supported in D if we had
a D to C compiler. I feel like it would be simpler than going for
native code directly. Did this idea follow-through?

On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
I'd like to present my vision for a new D compiler. I call it
xdc, a loose abbreviation for "Cross D Compiler" (if confused,
see
http://english.stackexchange.com/questions/37394/why-do-some-words-have-x-as-a-substitute).
It could also mean other fun things like "Crossbone D
Compiler" (I imagine a logo with some crossbones having a metal
D atop where the skull normally goes), "eXperimental D
Compiler", or something sounding like "ectasy" ;)

We usually think of language features as being rewritten into
simpler features. The simple features eventually get rewritten
into machine instructions. Compilers are, fundamentally,
responsible for performing "lowering" operations.

It makes sense to me, then, to make a compiler whose internals
/look/ like a bunch of these rewrites and lowering operations.
There should be a bunch of input patterns matched to the
desired results. This has the happy side-effect of giving us a
pleasant way to do AST manipulations from within D code.

I've also had a long-standing desire to see D on many more
platforms. It should make an appearance on console platforms
and on smartphones. I've tried doing this with a retargetable
compiler like GCC before, and the work was surprisingly large.
Even if the compiler already emits code for the target system's
CPU, there are still a large number of details involving
calling conventions, executable format, and any number of
CPU/OS specific tweaks to object file output. It makes a lot
more sense to me to just output C/C++ code and feed that to a
native toolchain. That would skip a lot of the
platform-specific nonsense that creates a barrier to entry for
people who, say, just want to write a simple game for
Android/iPhone/PS(3|4)/etc in D, and don't want to become
compiler experts first. Ideally, some day, this compiler would
also emit code or bytecode for Javascript, AS3/Flash, Java, and
any other popular targets that the market desires. This can
probably be done with DMD, but I'd like to make the process
more approachable, and make backend authoring as easy as
possible. It should be possible (and easy) to tell the
compiler exactly what lowerings should be applied before the
AST hits the backend.

xdc should bring all of that cross-platform targeting together
with a compiler infrastructure that can blow everything else
away (I hope!).

xdc is my dream for a D compiler that gives us our first step
(of few) towards having what haXe has already
(http://haxe.org/) : a compiler that can land code just about
anywhere.

What follows is a collection of my thoughts written down as
notes.

== Ideal Outcomes ==

.- D to C/C++ compiler that can easily reach target platforms
that are
. currently either unsupported or poorly supported by
current D
. compilers.
. - Useful for game developers wishing to write D code on the
. next big console platform.
. - Useful for embedded systems developers wishing to write D
code
. on obscure or potentially proprietary microcontrollers.

.- Other backends (ex: llvm, Java bytecode, AS3/Flash bytecode,
etc)
. possible in the future. Community desires considered when
. selecting new backend targets.

.- Interpreter backend: a notable backend that would be
implemented as
. a necessity for making CTFE work. A dedicated interpreter
. backend would hopefully be much faster and easier on
memory than
. DMD's souped-up constant folder. (Even though DMD has
become
. considerably better at this in the past year or two.)

.- Abstract Syntax Tree (AST) manipulation toolchain, possibly
usable
. in CTFE. It would work like this:
. (1) Pass some D code or Domain Specific Language (DSL) of
your
. choice (as text) into xdc at compile-time.
. (2) xdc returns an AST.
. (3) Use xdc's pattern-matching and substitution DSL to
. manipulate the AST.
. (4) xdc consumes the AST and emits modified D code.
. (5) mixin(...) the result.
. - If xdc is the compiler used to execute itself in CTFE,
then
. it might be possible to optimize this by having it
expose
. itself as a set of intrinsics.

.- Reference counting available by default on all platforms.
. - Gets you into the action with minimal effort and little
or no
. compiler hacking. (More complete GC tends to require
platform
. specific ASM and/or operating system API support).

.- Full garbage collection available if supported.
. - Ex: The C backends would default to ref-counting until
the ASM
. and OS-level code is written to support full GC.
. - Ex: A Java backend would probably use the Java JVM by
default.

.- Threading model determined by compiler configuration or
available
. platform hints.
. - Ex: The user may have a posix-threads implementation
available,
. but know little other details about the target system.
It
. should be possible for xdc to use pthreads to emulate
the
. TLS and synchronization mechanisms needed to make D
tick.
. (Or at least emulate as many features as possible.)
. - Ex: Possible "no threading" target for people who don't
need
. threading but DO need other D features to be available
NOW
. on an alien platform. Errors when the source code
passed
. into xdc assumes that threading features are present.

.- D compiler that is easy to hack on.
. - "Looks like the problem it solves."
. (To quote Walter's DConf2013 keynote.)
. - Made of a bunch of patterns that describe
. code rewrites/lowerings.
. - Few or no null value checks necessary.
. - null checks don't look like pattern matching or
lowering.
. - Few or no convoluted if-while-if-for-etc nests.
. - These also don't look like pattern matching or
lowering.
. - It should be largely made of "pattern handlers" (see
below).
. - Each pattern handler will have one part that closely
resembles
. the AST fragment for the D code that it recognizes, and
. another part that resembles the lowered form that it
outputs.
. - Dependency analysis that prevents your AST manipulation
from
. happening either too early or too late.
. - Because the code that actually does lowerings is
generated from
. a DSL, it is possible to make it automate a lot of
tedious
. tasks, like updating the symbol table when nodes are
added or
. removed from the AST.
. - This makes it easier to test experimental features.

.- A step-by-step view of what the compiler is doing to your
code.
. - Since the semantic analysis of xdc would be composed of
. "pattern handlers" (see below), then each time one of
them
. completes the compiler could output the result of calling
. .toString() (or .toDCode() or whatever) on the entire
AST.
. - This could be attached to an ncurses interface that would
be
. activated by passing a flag to the compiler, which would
then
. proceed to show the AST at every stage of compilation.
. Press ENTER to see the next step, etc.
. - This could also be exposed as API functionality that IDEs
could
. use to show developers how the compiler sees their code.

.- D code analysis engine that might be usable to automatically
. translate D1 code into D2 code, or maybe D2 into D3 in the
far
. future.

== Architectural Overview ==

.- xdc will largely consist of "pattern handlers" that recognize
. patterns in its AST and replace them with AST fragments
that
. contain successively fewer high-level features (lowering).
. - These pattern handlers would feature a DSL that should
make
. the whole task fairly easy.
. - The DSL proposed would be similar to regular expressions
in
. semantics but different in syntax.
. - It will have operators for choice, repetition, optional
. matches, capturing, and so on.
. - The DSL must support nested structures well.
. - The DSL must support vertical layout of patterns well.
. - Because of the vertical patterns, most operators will
either
. be prefix or will be written in block style:
. some_block_header { block_stmt1; block_stmt2; etc; }
. - Actions like entering and leaving nodes are given
their own
. special syntax. The machine will treat them like
tokens
. that can be matched the same as any AST node.
Notably,
. node-entry and node-exit do not require introducing
. non-regular elements to the DSL. node-entry and
node-exit
. may be subsumed into Deterministic Finite Automatons
(DFAs).
. - An example pattern handler might look like this:

const lowerWhileStatement =
{
// Apologies in advance if this isn't actually valid D code:
// This is a design sketch and I currently don't have a way
to compile it.
//
// The Pattern template, PatternMatch template, and
PatternHandler class
// have not yet been written. This is an example of how I
might expect
// them to be used.
//

auto consumes = "while_statement";
auto produces = "if_statement","goto","label");

auto recognizer = Pattern!
"WhileStatement has
{
// Capture the conditional expression (call it \"expr\") and
// capture the loop body (call it \"statement\").
.expression $expr;
.statement $statement has
{
// Capture any continue/break statements.
any_amount_of {
any_amount_of .; // Same as .* in regexes.
one_of
{
ContinueStatement $continues;
BreakStatement $breaks;
}
}
any_amount_of .;
}
}";

auto action = (PatternMatch!(recognizer) m)
{
m.captures.add("uniqLoopAgain",
getUniqueLabel(syntaxNode.enclosingScope))
m.captures.add("uniqExitLoop",
getUniqueLabel(syntaxNode.enclosingScope))

// The "recognizes" clause defines m.getCapture!"continues"
with:
// "ContinueStatement $continues;"
// That line appears in a repitition context
("any_amount_of") and is
// therefore typed as an array.
foreach( ref node; m.getCapture!"continues" )
node.replaceWith( m, "GotoStatement has $uniqLoopAgain" )

// Ditto for m.getCapture!"breaks" and "BreakStatement
$breaks;".
foreach( ref node; m.getCapture!"breaks" )
node.replaceWith( m, "GotoStatement has $uniqExitLoop" )
};

auto synthesizer = Pattern!
"Label has $uniqLoopAgain
IfStatement has
{
OpNegate has $expr
GotoStatement has $uniqExitLoop
}
$statement
GotoStatement has $uniqLoopAgain
Label has $uniqExitLoop
";

return new PatternHandler(produces, consumes, recognizer,
action, synthesizer);
};

(Also available at: http://pastebin.com/0mBQxhLs )

.- Dispatch to pattern handlers is performed by the execution
of a
. DFA/Packrat hybrid instead of the traditional OOP
inheritance
. with method calls.
. - Each pattern handler's recognizer gets treated like a
regex
. or Parsing Expression Grammar (PEG) fragment.
. - All of the recognizers in the same semantic pass are
pasted
. together in an ordered-choice expression. The ordering
is
. determined by dependency analysis.
. - A recognizer's pattern handler is invoked when the
recognizer's
. AST expression is matched.
. - Once any substitutions are completed, then the machine
executing
. the pattern engine will set its cursor to the beginning
of
. the newly substituted AST nodes and continue running.
. - Executing potentially hundreds of pattern handlers in a
single
. ordered-choice expression would be obnoxious for a
packrat
. parser (packrat machine?). Thankfully, ordered-choice
is
. possible in regular grammars, so it can be lowered into
regex
. operations and the whole thing turned into a DFA.
. - If pattern recognizers end up needing recursive elements,
. then they will probably not appear at the very
beginning of
. the pattern. Patterns with enough regular elements at
the
. start will be able to merge those regular elements into
the
. DFA with the rest of the pattern recognizers, and it all
. becomes very fast table lookups in small tables.

.- This compiler would involve the creation of a
parser-generator
. API that allows code to programmatically create grammars,
and
. to do so without a bunch of clumsy string formatting and
string
. concatenation.
. - These grammars could be written such that things like AST
nodes
. are seen as terminals. This expands possibilities and
allows
. all of the pattern handlers to be coalesced into a
grammar
. that operates on ASTs and fires off semantic actions
whenever
. one of the recognizer patterns gets tickled by the
right AST
. fragment.
. - Using strings as terminals is still cool; and necessary
for
. xdc's text/D-code parser.
. - A simple parser-generator API example:

---------------------------------------
string makeParser()
{
auto builder = new ParserBuilder!char;
builder.pushSequence();
builder.literal('x');
builder.pushMaybe();
builder.literal('y');
builder.pop();
builder.pop();
return builder.toDCode("callMe");
}

const foo = makeParser();

pragma(msg, foo);
---------------------------------------
Current output:
http://pastebin.com/V3E0Ubbc
---------------------------------------

. - Humans would probably never directly write grammars using
this
. API; it is intended for use by code that needs to write
. grammars. xdc would be such code: it's given a bunch of
. pattern handlers and needs to turn them into a grammar.
. - This API could also make it easier to write the parser
. generators that humans /would/ use. For example, it
could be
. used as an experimental backend for a regular expression
. engine that can handle limited recursion.
. - The packrats usually generated from PEGs are nice and
all, but
. I'd really like to generate DFAs whenever possible,
because
. those seem to be regarded as being /very fast/.
. - DFAs can't handle the recursive elements of PEGs, but they
. should be able to handle everything non-recursive that
. precedes or follows the recursive elements.
. - The parser-generator API would be responsible for
aggressively
. converting PEG-like elements into regex/DFA elements
whenever
. possible.
. - Regular expressions can be embedded in PEGs as long as
you tell
. them how much text to match. You have to give them
concrete
. success/failure conditions that can be determined
without
. help from the rest of the PEG: things like "match as
many
. characters as possible" or "match as few characters as
. possible". Without that, the regex's backtracking
(DFA'd
. or otherwise) won't mesh with the PEG. Give it a
concrete
. win/fail condition, however, and the embedded regex
becomes
. just another PEG building block that chews through some
. source material and yields a yes/no result. Such
regular
. expressions allow DFAs to be introduced into a recursive
. descent or packrat parser.
. - Many PEG elements can be converted into these well-behaved
. regular expressions.
. - PEG repetition is just regular expression repetition
with
. a wrapper around it that says "match as many
characters
. as possible".
. - PEG ordered choice can be lowered into regular
expression
. unordered choice, which can then be converted into
DFAs:
. I suspect that this is true: (uv/xy)c ==
(uv|(^(uv)&xy))c
. (or, by De Morgan's law: (uv/xy)c ==
(uv|(^(uv|^(xy))))c )
. & is intersection.
. ^ is negation.
. Each letter (u,v,x,y,c) can be a subexpression
. (non-recursive).
. - PEG label matching can be inlined up to the point where
. recursion occurs, thus allowing more elements to be
. considered for DFA conversion.
. - etc.

.- The parser would be defined using a PEG (most likely using
Pegged
. specifically).
. - Although Pegged is an awesome achievement, I suspect its
output
. could be improved considerably. The templated code it
. generates is slow to compile and ALWAYS allocates parse
. tree nodes at every symbol.
. - I want to experiment with making Pegged (or a branch of
it) emit
. DFA/Packrat parser hybrids. This could be done by
making a
. version of Pegged that uses the aforementioned
. parser-generator API to create its parsers.
. - Design principle: avoid memory allocations like the
plague.
. The output should be a well-pruned AST, and not just a
parse
. tree that causes a bunch of allocations and needs
massaging to
. become useful.
. - I really like Pegged and would contribute this stuff
upward, if
. accepted.

.- When hacking on xdc, you don't need to be aware of WHEN your
code
. code gets executed in semantic analysis. The dependency
analysis
. will guarantee that it always gets performed both
. (a) when it's needed, and (b) when it has what it needs.
. - This is what the "consumes" and "produces" variables are
all
. about in the above example.

.- Successfully lowering a D AST into the target backend's
input will
. almost certainly require multiple passes. xdc's dependency
. analyzer would automatically minimize the number of passes
by
. looking for patterns that are "siblings" in the dependency
graph
. (eg. neither depends on the other) and bunching as many
such
. patterns as possible into each pass.
. - It really shouldn't generate very many more than the
number of
. passes that DMD has coded into it. Ideally: no more
than DMD,
. if not fewer.
. - I'd like to make the dependency analyzer output a graph
that
. can be used to track which patterns cause which passes
to
. exist, and show which patterns are in which passes.

.- Planned availability of backends.
. - My first pick for a backend would be an ANSI C89 target.
I feel
. that this would give it the most reach.
. - The interpreter backend is along for the ride, as
mentioned.
. - Because the semantic analysis is composed of distinct and
. loosely-coupled patterns, it is possible for xdc to
generate
. an analysis chain with the minimum number of lowerings
needed
. for a given backend.
. - The interpreter backend would benefit from having the
most
. lowerings. By requiring a lot of lowering, the
interpreter
. would only need to support a small number of
constructs:
. - if statements
. - gotos
. - function calls
. - arithmetic expression evaluation
. - builtin types (byte, short, int, long, float,
double, etc)
. - pointers
. - Even structs are unnecessary: they can be seen as
. typed dereferencing of untyped pointers.
. - The C backend would benefit from slightly less
lowering than
. the interpreter backend. It is useful for debugging
if
. you can mostly-sorta read the resulting C code, and
your
. C compiler will appreciate the extra optimization
. opportunities.
. - Looping constructs like while and for are welcome
here.
. - structs would be more readable.

entirely
. different set of lowerings in later passes.
. - Pointers are no longer considered "low".
. - Classes should be kept as long as possible;
. I'm pretty sure they bytecode (at least for Java)
. has opcodes dedicated to classes. Removing them
. may cause pessimisation.
. - The backend writer should not have to worry about
rewriting
. the semantic analysis to suit their needs. They
just define
. some features and say which ones they need available
in the
. AST, and xdc's semantic-analysis-generator will
handle the
. rest.
. - Notably, a backend should just be more lowerings, with the
. result being text or binary code instead of AST nodes.
. - Backends are essentially defined by the set of
AST/language
. features that they consume and any special lowerings
needed
. to convert generic AST/language features into
. backend-specific AST/language features.

== Closing Thoughts ==

I am realizing that there are multiple reasons that compel me
to write this document:
- To share my ideas with others, on the off-chance that someone
else might see this vision too and be better equipped to
deliver.
- To suggest capabilities that any community-endorsed compiler
tool (ex: compiler-as-a-ctfe-library) should have.
- To see if I might be able to get the help I need to make it a
reality.

I just can't decide which reasons are more important. But
there is a common thread: I want this vision to become reality
and do really cool things while filling a bunch of missing
links in D's ecosystem.

I have to ask:

Would you pay for this?
If so, then I might be able to do a kickstarter at some point.
I am not independently wealthy or retired (or both?) like
Walter, nor am I able to survive on zero hours of sleep each
night like Andrei, and this would be a big project. I think it
would need full-time attention or it would never become useful
in a reasonable timeframe.

Also, assuming you understand the design, are there any gaping
holes in this?
This is my first attempt to share these ideas with a larger
group, and thus an opportunity to anticipate troubles.

...

Well, I'm anxious to see how well the venerable D community
receives this bundle of ideas. Be chatty. I'll try to keep up.

Thank you for reading.

Nov 08 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Saturday, 9 November 2013 at 04:46:14 UTC, Etienne wrote:
 Many vendors would have their processors supported in D if we 
 had
 a D to C compiler. I feel like it would be simpler than going 
 for
 native code directly. Did this idea follow-through?

No, not yet I'm afraid.  At least not for xdc.

Here's the lowdown:
The future of xdc will be determined by whether or not I can save 
up enough money to reliably support myself between the time I 
would leave my current job and the time I would be compensated by 
means of crowdsourcing.  In the middle there I would need to 
create some kind of working demo, make a good pitch, talk to a 
bunch of writers and programmer communities, etc etc, all while 
burning precious savings.  If, before any of that, I get 
recruited by another company with a non-terrible (and possibly 
/good/) codebase (like Sociomantic or Facebook), then we would be 
able to consider the whole idea effectively cancelled before it 
can start.  As great for /me/ as it would be to write D code for 
a job, I just don't see it being a boon to xdc: companies usually 
hire folks to work on the company's stuff, not the employee's 
stuff.  But, if I end up sticking with my current job, then at 
some point I may just go out on my own and make things happen.  
Time will tell.

That said, if all you want is a C/C++ backend, then Kai's recent 
post on this thread brings up a possibility that seems 
unexplored, as of yet:
http://forum.dlang.org/post/psqajaggngbuctqfrrnc forum.dlang.org
Maybe that'll get you there in more certain terms.

Nov 11 2013

"Kelly" <wilsonk cpsc.ucalgary.ca> writes:

Hey Chad,

It looks like you have put a lot of thought and effort into
this from your posts. Nice work.

I am one of the developers of Amber and we do have a C backend,
as nazriel pointed out earlier in the thread. It supports
exceptions (sjlj and seh depending on the flag and platform).
We support clang, gcc, dmc, tcc and msvc...though I have really
only been testing gcc and clang lately.

I can't say we have perfect coverage of exceptions, and
templates are a little behind with the C backend when compared
to the llvm backend also, but we actually pass more tests in
our testsuite with the CBE than LLVMBE.

Amber is an offshoot of D1, with some small parts of D2 where
it made sense, so it may not be very close to what you are
looking for, but it might be worth checking out. It compiles
best on linux with dmd and ldc 1.074 and needs Tango to compile
(Tango is also our main standard lib for Amber...though we can
only compile about 25-30% of Tango with the Amber compiler right
now).

Good luck with xdc, whichever way you go with it.

Thanks,
Kelly


On Tuesday, 12 November 2013 at 03:51:21 UTC, Chad Joan wrote:
 On Saturday, 9 November 2013 at 04:46:14 UTC, Etienne wrote:
 Many vendors would have their processors supported in D if we 
 had
 a D to C compiler. I feel like it would be simpler than going 
 for
 native code directly. Did this idea follow-through?

 No, not yet I'm afraid.  At least not for xdc.

 Here's the lowdown:
 The future of xdc will be determined by whether or not I can 
 save up enough money to reliably support myself between the 
 time I would leave my current job and the time I would be 
 compensated by means of crowdsourcing.  In the middle there I 
 would need to create some kind of working demo, make a good 
 pitch, talk to a bunch of writers and programmer communities, 
 etc etc, all while burning precious savings.  If, before any of 
 that, I get recruited by another company with a non-terrible 
 (and possibly /good/) codebase (like Sociomantic or Facebook), 
 then we would be able to consider the whole idea effectively 
 cancelled before it can start.  As great for /me/ as it would 
 be to write D code for a job, I just don't see it being a boon 
 to xdc: companies usually hire folks to work on the company's 
 stuff, not the employee's stuff.  But, if I end up sticking 
 with my current job, then at some point I may just go out on my 
 own and make things happen.  Time will tell.

 That said, if all you want is a C/C++ backend, then Kai's 
 recent post on this thread brings up a possibility that seems 
 unexplored, as of yet:
 http://forum.dlang.org/post/psqajaggngbuctqfrrnc forum.dlang.org
 Maybe that'll get you there in more certain terms.

Nov 11 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Tuesday, 12 November 2013 at 06:40:25 UTC, Kelly wrote:
 ...

 Amber is an offshoot of D1, with some small parts of D2 where
 it made sense, so it may not be very close to what you are
 looking for, but it might be worth checking out. It compiles
 best on linux with dmd and ldc 1.074 and needs Tango to compile
 (Tango is also our main standard lib for Amber...though we can
 only compile about 25-30% of Tango with the Amber compiler right
 now).

 Good luck with xdc, whichever way you go with it.

 Thanks,
 Kelly

Hi Kelly,

Thank you for the words of encouragement!

I didn't know about Amber.  I'll have to check it out.

Thanks!

Nov 12 2013

"nazriel" <spam dzfl.pl> writes:

On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call it 
 xdc, a loose abbreviation for "Cross D Compiler" (if confused, 
 see
...
 Thank you for reading.

I think C backend is a good idea.

AFAIK, Amber [1] people do something like that.

They simultaneously wrote support for four backends [2]:
- LLVM
- C
- JSON
- so called NullBackend

I think it worked out quite well.


[1] https://bitbucket.org/larsivi/amber/src
[2] 
https://bitbucket.org/larsivi/amber/src/0cbdb35b8eec458b75572ac457baa9e47d3e76cd/amber?at=default

Nov 09 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/9/13 9:14 AM, nazriel wrote:
 On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call it xdc, a
 loose abbreviation for "Cross D Compiler" (if confused, see
 ...
 Thank you for reading.

 I think C backend is a good idea.

I think C is not a good back-end language. Other backend generators 
usually have a white paper explaining why... http://www.cminusminus.org/

Andrei

Nov 09 2013

"Daniel Murphy" <yebblies nospamgmail.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:l5madp$1p24$1 digitalmars.com...
 On 11/9/13 9:14 AM, nazriel wrote:
 On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call it xdc, a
 loose abbreviation for "Cross D Compiler" (if confused, see
 ...
 Thank you for reading.

 I think C backend is a good idea.

 I think C is not a good back-end language. Other backend generators 
 usually have a white paper explaining why... http://www.cminusminus.org/

 Andrei

That is true in general, but D actually maps quite well onto C.

I did some work on creating a C backend a while back, and it worked quite 
well.

However - most of the work is in creating a runtime that will work correctly 
on the target platform.  If your desired target is anything that llvm or gcc 
supports, I would recommend using ldc/gdc instead of doing it all from 
scratch.

Nov 09 2013

"deadalnix" <deadalnix gmail.com> writes:

On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy wrote:
 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it 
 worked quite
 well.

Out of curiosity, how do you handle exceptions ?

Nov 09 2013

"Daniel Murphy" <yebblies nospamgmail.com> writes:

"deadalnix" <deadalnix gmail.com> wrote in message 
news:juoauplfttovsmbrafzh forum.dlang.org...
 On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy wrote:
 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it worked quite
 well.

 Out of curiosity, how do you handle exceptions ?

I didn't.  This was focussed on a subset suitable for microcontrollers.  I 
would probably emit C++ instead if exceptions were required.

Nov 09 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/9/13 9:37 PM, Daniel Murphy wrote:
 "deadalnix" <deadalnix gmail.com> wrote in message
 news:juoauplfttovsmbrafzh forum.dlang.org...
 On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy wrote:
 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it worked quite
 well.

 Out of curiosity, how do you handle exceptions ?

 I didn't.  This was focussed on a subset suitable for microcontrollers.  I
 would probably emit C++ instead if exceptions were required.

That doesn't quite rhyme with C being a good backend language :o).

Andrei

Nov 09 2013

"Daniel Murphy" <yebblies nospamgmail.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:l5n7iq$2op2$1 digitalmars.com...
 On 11/9/13 9:37 PM, Daniel Murphy wrote:
 "deadalnix" <deadalnix gmail.com> wrote in message
 news:juoauplfttovsmbrafzh forum.dlang.org...
 On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy wrote:
 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it worked 
 quite
 well.

 Out of curiosity, how do you handle exceptions ?

 I didn't.  This was focussed on a subset suitable for microcontrollers. 
 I
 would probably emit C++ instead if exceptions were required.

 That doesn't quite rhyme with C being a good backend language :o).

 Andrei

I guess it's not for the full language, but if you can't use gdc or llvm, 
chances are your platform is too constrained to use exceptions.  I don't 
mean C is capable of representing everything, but it can handle a large and 
useful subset.

Nov 10 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Sunday, 10 November 2013 at 12:24:59 UTC, Daniel Murphy wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in 
 message
 news:l5n7iq$2op2$1 digitalmars.com...
 On 11/9/13 9:37 PM, Daniel Murphy wrote:
 "deadalnix" <deadalnix gmail.com> wrote in message
 news:juoauplfttovsmbrafzh forum.dlang.org...
 Out of curiosity, how do you handle exceptions ?

 I didn't.  This was focussed on a subset suitable for 
 microcontrollers. I
 would probably emit C++ instead if exceptions were required.

 That doesn't quite rhyme with C being a good backend language 
 :o).

 Andrei

 I guess it's not for the full language, but if you can't use 
 gdc or llvm,
 chances are your platform is too constrained to use exceptions.
  I don't
 mean C is capable of representing everything, but it can handle 
 a large and
 useful subset.

My ideal is to have exceptions in C anyways.  I don't understand 
why people are so afraid of this.  It's doable in very portable 
ways, and D's nothrow attribute gives a good hint to the compiler 
that can be used to avoid performance drains in inappropriate 
places.

I think that setjmp/longjmp comes to mind most of the time 
because it is what people would normally use in C if they have to 
write the C code by hand.  This is one approach that I would have 
a compiler optionally emit, controllable by a command-line flag 
(--c-exceptions=sjlj or somesuch).

There is a different approach that I'd want to try first: alter 
the calling convention and always pass an exception object as the 
first argument (but only if the called function can throw).

Given this example:

-----------------------------------------------------------------

float baz(int a);

void foo()
{
     int a = 42;
     // do stuff
     float b = baz(a);
     // do other stuff
}

float bar()
{
     try
         return baz(9);
     catch( Exception e )
         return 0.0;
}

--------------------------------

The D->C compiler would emit code like so:

-----------------------------------------------------------------

float baz(Exception *exception, int a);

void foo(Exception *exception)
{
     int a = 42;
     // do stuff
     float b = baz(exception, a);
     if ( exception->thrown )
         return;
     // do other stuff
}

float bar(Exception *exception)
{
     float result = baz(exception, 9);
     if ( exception->thrown )
         goto ExceptionHandler1;
     return result;

ExceptionHandler1:
     Exception *e = exception;
     (void)e;
     return 0.0;
}

--------------------------------
(Name mangling omitted for the sake of sanity.)

This is not something that you'd want to do by hand when writing 
C-code, though that doesn't stop people from trying to poorly 
approximate it using integer return values ;)

It would integrate nicely with scope, because the compiler would 
know where to put all of the goto statements and labels.

It's also made of pointers, ifs, goto's, and labels: stuff that 
any usable C compiler should have.  Super portable.

The drawback: This would, of course, not link nicely with code 
generated by other D compilers.  I don't mind this at all though, 
because if you're using this then it's probably because there 
aren't any other D compilers supporting your platform anyways.

I've already written a bunch of C code that emulates exception 
handling + scope statements using setjmp/longjmp, and I really 
wish that a compiler could write better optimized C code /for/ me.

Nov 11 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/9/2013 9:27 PM, deadalnix wrote:
 On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy wrote:
 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it worked quite
 well.

 Out of curiosity, how do you handle exceptions ?

Exceptions is one big problem. Another is COMDATs - C compilers don't emit
them. 
COMDATs are needed to support templates (they remove duplicate instances).

And TLS.

Nov 10 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Sunday, 10 November 2013 at 08:21:37 UTC, Walter Bright wrote:
 On 11/9/2013 9:27 PM, deadalnix wrote:
 On Sunday, 10 November 2013 at 04:54:18 UTC, Daniel Murphy 
 wrote:
 That is true in general, but D actually maps quite well onto 
 C.

 I did some work on creating a C backend a while back, and it 
 worked quite
 well.

 Out of curiosity, how do you handle exceptions ?

 Exceptions is one big problem. Another is COMDATs - C compilers 
 don't emit them. COMDATs are needed to support templates (they 
 remove duplicate instances).

 And TLS.

This seems like it matters when linking D code to D code.  Other 
language's wouldn't care about D's templates.  I imagine that in 
most cases it would be possible to just compile the D code 
together.

This whole mess can be done away with by removing the "linking" 
step in compilation, which is what I'd recommend for a compiler 
that is designed to output things that aren't object files.

The compiler should be able to dedup templates internally when 
doing AST manipulation.  I actually /expect/ this.

The only reasons to output object files, that I can think of 
right now, are as follows:
- Obfuscation is desired in the output.
- Incremental compiling.

To meet those needs, the following approaches could be used:
- Obfuscation: A compiler without a linkable output format could 
support an "obfuscation" target that would output obfuscated D 
code for later compiling in a 3rd party's hands.
- Incremental Compiling: This is usually done to help with 
terrible build times.  A compiler without a linkable output 
format could offer a "do as much as you can" target that outputs 
D code that is lowered as far as it can possibly be lowered 
without being fed more information.  At that point, D might be 
nearly as fast as the linker, at least in human terms.

Nov 11 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 11/11/2013 6:34 PM, Chad Joan wrote:
 This whole mess can be done away with by removing the "linking" step in
 compilation,

That's not really an option if you intend to use C as a back end.

Nov 11 2013

Jacob Carlborg <doob me.com> writes:

On 2013-11-10 09:20, Walter Bright wrote:

 Exceptions is one big problem. Another is COMDATs - C compilers don't
 emit them. COMDATs are needed to support templates (they remove
 duplicate instances).

 And TLS.

What about the EDG C++ compiler, doesn't that output C code?

-- 
/Jacob Carlborg

Nov 11 2013

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 10 November 2013 04:54, Daniel Murphy <yebblies nospamgmail.com> wrote:

 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message
 news:l5madp$1p24$1 digitalmars.com...
 On 11/9/13 9:14 AM, nazriel wrote:
 On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call it xdc, a
 loose abbreviation for "Cross D Compiler" (if confused, see
 ...
 Thank you for reading.

 I think C backend is a good idea.

 I think C is not a good back-end language. Other backend generators
 usually have a white paper explaining why... http://www.cminusminus.org/

 Andrei

 That is true in general, but D actually maps quite well onto C.

 I did some work on creating a C backend a while back, and it worked quite
 well.

 However - most of the work is in creating a runtime that will work
 correctly
 on the target platform.  If your desired target is anything that llvm or
 gcc
 supports, I would recommend using ldc/gdc instead of doing it all from
 scratch.

Especially gdc. Cross-platform support needs all the love it can get. ;-)

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Nov 10 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Saturday, 9 November 2013 at 21:45:30 UTC, Andrei Alexandrescu 
wrote:
 On 11/9/13 9:14 AM, nazriel wrote:
 On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call 
 it xdc, a
 loose abbreviation for "Cross D Compiler" (if confused, see
 ...
 Thank you for reading.

 I think C backend is a good idea.

 I think C is not a good back-end language. Other backend 
 generators usually have a white paper explaining why... 
 http://www.cminusminus.org/

 Andrei

What would you suggest as an alternative for targeting disparate 
hardware like microcontrollers (ALL of them), newly released game 
consoles, and legacy platforms that could use D for migration 
tools (like OpenVMS on IA64)?

Oh, and I want instantaneous release times.  I need to be able to 
stick the compiler on a machine it has NEVER seen and say, "Use 
POSIX libraries to fulfill Phobos' deps.  Use reference counting. 
  DO WORK!".  Or maybe I would say, "Ditch Phobos, we in da 
sticks.  Use reference counting.  GOGOGO!"  And I want to be 
running my D program 5 minutes later.

Let me initially dismiss these:
LLVM: not /everywhere/ yet, and missing on many of the targets I 
mentioned.
C--: also not everywhere; this is the first I've heard of it.
Java/Javascript/.NET: Actually also good backends, but a 
different ecosystems.

Thus, I suggest that C is an AWESOME backend (with C++ for 
exceptions, but ONLY if it's available).  Destroy :)

Nov 11 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/11/13 5:53 PM, Chad Joan wrote:
 On Saturday, 9 November 2013 at 21:45:30 UTC, Andrei Alexandrescu wrote:
 On 11/9/13 9:14 AM, nazriel wrote:
 On Thursday, 18 July 2013 at 01:21:44 UTC, Chad Joan wrote:
 I'd like to present my vision for a new D compiler.  I call it xdc, a
 loose abbreviation for "Cross D Compiler" (if confused, see
 ...
 Thank you for reading.

 I think C backend is a good idea.

 I think C is not a good back-end language. Other backend generators
 usually have a white paper explaining why... http://www.cminusminus.org/

 Andrei

 What would you suggest as an alternative for targeting disparate
 hardware like microcontrollers (ALL of them), newly released game
 consoles, and legacy platforms that could use D for migration tools
 (like OpenVMS on IA64)?

 Oh, and I want instantaneous release times.  I need to be able to stick
 the compiler on a machine it has NEVER seen and say, "Use POSIX
 libraries to fulfill Phobos' deps.  Use reference counting.  DO WORK!".
 Or maybe I would say, "Ditch Phobos, we in da sticks.  Use reference
 counting.  GOGOGO!"  And I want to be running my D program 5 minutes later.

 Let me initially dismiss these:
 LLVM: not /everywhere/ yet, and missing on many of the targets I mentioned.
 C--: also not everywhere; this is the first I've heard of it.
 Java/Javascript/.NET: Actually also good backends, but a different
 ecosystems.

 Thus, I suggest that C is an AWESOME backend (with C++ for exceptions,
 but ONLY if it's available).  Destroy :)

Fine with me. I have no stake in this. I don't see how you reach the 
conclusion that C is "awesome" given it makes exceptions tenuous to 
implement. It does have the advantage of being universally available. If 
that's everything you need, sure.

Andrei

Nov 11 2013

"Chad Joan" <chadjoan gmail.com> writes:

On Tuesday, 12 November 2013 at 05:49:23 UTC, Andrei Alexandrescu 
wrote:
 On 11/11/13 5:53 PM, Chad Joan wrote:
 What would you suggest as an alternative for targeting 
 disparate
 hardware like microcontrollers (ALL of them), newly released 
 game
 consoles, and legacy platforms that could use D for migration 
 tools
 (like OpenVMS on IA64)?

 Oh, and I want instantaneous release times.  I need to be able 
 to stick
 the compiler on a machine it has NEVER seen and say, "Use POSIX
 libraries to fulfill Phobos' deps.  Use reference counting.  
 DO WORK!".
 Or maybe I would say, "Ditch Phobos, we in da sticks.  Use 
 reference
 counting.  GOGOGO!"  And I want to be running my D program 5 
 minutes later.

 Let me initially dismiss these:
 LLVM: not /everywhere/ yet, and missing on many of the targets 
 I mentioned.
 C--: also not everywhere; this is the first I've heard of it.
 Java/Javascript/.NET: Actually also good backends, but a 
 different
 ecosystems.

 Thus, I suggest that C is an AWESOME backend (with C++ for 
 exceptions,
 but ONLY if it's available).  Destroy :)

 Fine with me. I have no stake in this. I don't see how you 
 reach the conclusion that C is "awesome" given it makes 
 exceptions tenuous to implement. It does have the advantage of 
 being universally available. If that's everything you need, 
 sure.

 Andrei

I call it "awesome" because you seem to have objections to the 
whole notion, and your objections are usually very interesting.  
So I'm just pulling your chain in the hopes that you bestow 
insights on me :)

Honestly, I look forward to being able to implement exception 
handling in C!  It sounds like a fun couple coding sessions 
waiting to happen.  I already did it with C macros, so giving me 
an entire code generator to work with might make it /too/ easy.  
And it scratches an itch that current compiler's can't (well, 
maybe LDC is catching up).

Perhaps this is just the difference between choosing a good IR 
(which C is not) and choosing a good compilation target (where C 
is needed).

Nov 12 2013

"Dejan Lekic" <dejan.lekic gmail.com> writes:

I will definitely back up this project on kickstarter, if the 
mentioned Java backend is going to be somewhere at the top 
priorities. Being able to target JVM is extremely important to me.

Before you do the kickstarter please make a list of features that 
you plan to be in XDC after the release, and when do you plan the 
release to happen.

Nov 11 2013

D Programming

C/C++ Programming

Other

digitalmars.D - xdc: A hypothetical D cross-compiler and AST manipulation tool.