digitalmars.D - Writing Compilers in D

Kevin A (12/12) Aug 12 2004 Hello! I am an experienced compiler/interpreter writer and I have been

Deja Augustine (12/22) Aug 12 2004 I've written part of one. I wrote a D preprocessor that parsed in the D...
Stephan Wienczny (12/29) Aug 12 2004 I'm trying to do such a thing. I can tell you my experience (so far).

Sampsa Lehtonen (22/32) Aug 16 2004 I've considered making a compiler too, perhaps for D. I've made one for ...

Ilya Minkov (31/49) Aug 16 2004 Hum. D would be a large undertaking, though not as large as C++. C++ is

Sampsa Lehtonen (38/75) Aug 16 2004 Well I plan to implement just a subset of D first. Leave the exceptions,...

Ilya Minkov (55/87) Aug 16 2004 Gotta look at them.

Kevin A <Kevin_member pathlink.com> writes:

Hello!  I am an experienced compiler/interpreter writer and I have been
considering using D instead of C/C++ as the implementation language.  I have a
few questions that I am seeking answers to before I begin writing it:

- Has anyone else here written a compiler in D?
- Is D well-suited to compiler writing? And if so, what features of D are
particularly good for this?
- Is there a visual debugger available for D?  If not, is there any debugger?
How good is the debugger?
- Is there a good IDE for D?

Your help will be greatly appreciated.

Sincerely,
Kevin A

Aug 12 2004

Deja Augustine <deja scratch-ware.net> writes:

Kevin A wrote:
 Hello!  I am an experienced compiler/interpreter writer and I have been
 considering using D instead of C/C++ as the implementation language.  I have a
 few questions that I am seeking answers to before I begin writing it:
 
 - Has anyone else here written a compiler in D?

I've written part of one.  I wrote a D preprocessor that parsed in the D 
code and did some rudimentary semantic analysis.  I was originally going 
to use that as the base for D.NET until I discovered that the front-end 
source was available.

 - Is D well-suited to compiler writing? And if so, what features of D are
 particularly good for this?

It's string handling is definately nicer than C++ as are the dynamic 
arrays.  Most of that can be done in C++ via the STL, however.

 - Is there a visual debugger available for D?  If not, is there any debugger?
 How good is the debugger?

As I understand it, you can use a variety of "3rd party" debuggers. 
I've never tried, though.  Contracts make it pretty easy to code without 
needing a separate debugger.

 - Is there a good IDE for D?

Check out the links page on the D site.

-Deja

Aug 12 2004

Stephan Wienczny <Stephan Wienczny.de> writes:

Kevin A wrote:
 Hello!  I am an experienced compiler/interpreter writer and I have been
 considering using D instead of C/C++ as the implementation language.  I have a
 few questions that I am seeking answers to before I begin writing it:
 
 - Has anyone else here written a compiler in D?
 - Is D well-suited to compiler writing? And if so, what features of D are
 particularly good for this?
 - Is there a visual debugger available for D?  If not, is there any debugger?
 How good is the debugger?
 - Is there a good IDE for D?
 
 Your help will be greatly appreciated.
 
 Sincerely,
 Kevin A
 
 

I'm trying to do such a thing. I can tell you my experience (so far).
D makes implementing something a little bit more easy than C/C++.
You have class definition near its implementation; no redundancy when 
writing a new function. Then you have got some advanced D features, like 
  dynamic arrays with slicing which makes parsers/lexers awful fast

IMHO D is more readable than C/C++ and can be a lot faster...

It should be possible to use a visual debugger. There is non in the 
package, but you should find one on the net.
There have been some affords to write a special IDE for D (dide, leds) 
and there is a config files for others (Eclipse, MS Visual Studio)

  Stephan

Aug 12 2004

"Sampsa Lehtonen" <snlehton cc.hut.fi> writes:

On Thu, 12 Aug 2004 21:04:00 +0200, Stephan Wienczny <Stephan Wienczny.de>  
wrote:

 I'm trying to do such a thing. I can tell you my experience (so far).
 D makes implementing something a little bit more easy than C/C++.
 You have class definition near its implementation; no redundancy when  
 writing a new function. Then you have got some advanced D features, like  
   dynamic arrays with slicing which makes parsers/lexers awful fast

 IMHO D is more readable than C/C++ and can be a lot faster...

 It should be possible to use a visual debugger. There is non in the  
 package, but you should find one on the net.
 There have been some affords to write a special IDE for D (dide, leds)  
 and there is a config files for others (Eclipse, MS Visual Studio)

I've considered making a compiler too, perhaps for D. I've made one for  
MiniJava which is a subset of Java. It produced native code (MIPS). Now I  
would like to try my skills on something more involved. C/C++ syntax seems  
too complex, and Java is a bit too abstract (it isn't meant for native  
code though such compilers exist).

Do you guys have any suggestions which tools to use? I've been thinking  
about making the compiler in Java, as it is easiest and fastest to code  
(using refactoring tools). I don't care about the compilation times at the  
moment, getting the compiler running and producing code is such a task on  
itself. I've used JavaCC and Antlr too, but is there better alternatives?

For industrial compiler I'd choose C++ as development language and x86  
instruction set as output, but making cisc compilers is so much harder  
than risc compilers, so maybe I'll go with the MIPS here too.

My primary goal is to get my hands on different optimization techniques  
and to get familiar with complex flow- and data-analyses.

Btw, if anyone has pointers to some nice documents about OBJ-file  
structure and such, I'd be interested.

-texmex/sampsa lehtonen

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 16 2004

Ilya Minkov <minkov cs.tum.edu> writes:

Sampsa Lehtonen schrieb:
 I've considered making a compiler too, perhaps for D. I've made one for  
 MiniJava which is a subset of Java. It produced native code (MIPS). Now 
 I  would like to try my skills on something more involved. C/C++ syntax 
 seems  too complex, and Java is a bit too abstract (it isn't meant for 
 native  code though such compilers exist).

Hum. D would be a large undertaking, though not as large as C++. C++ is 
a problem more semantically than syntactically, i seem to think.

 Do you guys have any suggestions which tools to use? I've been thinking  
 about making the compiler in Java, as it is easiest and fastest to code  
 (using refactoring tools). I don't care about the compilation times at 
 the  moment, getting the compiler running and producing code is such a 
 task on  itself. I've used JavaCC and Antlr too, but is there better 
 alternatives?

Refractoring tools? whatdoyoumean?

I'm under the imression that ANTLR is the best, but my fav is COCO/R. 


versions generate non-reentrant parsers.

I would not recommand writing a compiler in Java. So far, i have had a 
lot of fun writing my first real lexer, and i came to like the D array 
semantics and slicing, which are probably unique. That is, they can be 
easily emulated in C++, but they are not native in any language. The 
difference from, say, Python, is that you can have slices still refer to 
the original array where the data is stored. For example, one would load 
a text into a large buffer or memory-mapped file, and have a lexeme 
contain a slice into it, instead of a position and length. String 
semantics without overhead. Plus one would insert asserts here and there 
to make sure that such a slice keeps pointing into the loaded text.

 For industrial compiler I'd choose C++ as development language and x86  
 instruction set as output, but making cisc compilers is so much harder  
 than risc compilers, so maybe I'll go with the MIPS here too.

Non-processor targets might also be interesting... e.g. ANDF, the 
architecture- neutral distribution format. There are converters from 
ANDF to target code for different architectures.

 My primary goal is to get my hands on different optimization techniques  
 and to get familiar with complex flow- and data-analyses.

I had just chatted with MadMan/TAP (aka MadenMann) yesterday. Perhaps he 
would also be interested... He wanted to write a custom compiler for 
Sega Mega-Drive.

 Btw, if anyone has pointers to some nice documents about OBJ-file  
 structure and such, I'd be interested.

These are different. Digitalmars, Watcom, Borland: look for OMF. 
Microsoft and some others use COFF. Other operating systems use either 
some variant of COFF or some variant of ELF, or even something 
completely different. The format of object files need not necessarily be 
in correspondence with OS executable format, although i guess it makes 
linker's life easier.

-eye/PaC

Aug 16 2004

"Sampsa Lehtonen" <snlehton cc.hut.fi> writes:

On Mon, 16 Aug 2004 18:34:14 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:

 Hum. D would be a large undertaking, though not as large as C++. C++ is  
 a problem more semantically than syntactically, i seem to think.

Well I plan to implement just a subset of D first. Leave the exceptions,  
templates, mixins for later, just to get the primitive things running.  
Probably the most gratifying thing in compiler construction is the moment  
when you actually get something compiled and it runs!

 Do you guys have any suggestions which tools to use? I've been

 Refractoring tools? whatdoyoumean?

By tools I meant parser generators and such, and refactoring tools are a  
whole different story though they are related to code parsing. Refactoring  
means transformatinos on code that do not break the meaning of the code.  
Pretty fun stuff, really. There are plenty of different refactoring types,  
starting from simple variable renaming to super-class extraction etc.  
Basicly they are tools to aid programming by automating the tedious  
primitive tasks programmers do daily. Check out more at  
http://www.refactoring.com/

BTW, nice thing about D is that D programs can be parsed easily as there  
are no preprocessor. This makes refactoring possible, unlike on C++ where  
the preprocessor can really f*ck up things. Currently I'm waiting a D ide  
that would include refactoring tools ;)

 I'm under the imression that ANTLR is the best, but my fav is COCO/R.  


 versions generate non-reentrant parsers.

COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll go  
with JavaCC (or Antlr, as it generates parsers in C++ too).

 I would not recommand writing a compiler in Java. So far, i have had a  
 lot of fun writing my first real lexer, and i came to like the D array  
 semantics and slicing, which are probably unique. That is, they can be  
 easily emulated in C++, but they are not native in any language. The  
 difference from, say, Python, is that you can have slices still refer to  
 the original array where the data is stored. For example, one would load  
 a text into a large buffer or memory-mapped file, and have a lexeme  
 contain a slice into it, instead of a position and length. String  
 semantics without overhead. Plus one would insert asserts here and there  
 to make sure that such a slice keeps pointing into the loaded text.

Umm, I don't quite follow you. After the lexer has tokenized a token and  
the parser has accepted it, the actual text comes unnecessary (unless it  
is an identifier). So why would I want to load the whole file into memory  
and have tokens pointing into that big piece of text?...
With lexer for an ide where parsing needs to be done constantly and on  
varying places it is a different story, I guess...

 For industrial compiler I'd choose C++ as development language and x86   
 instruction set as output, but making cisc compilers is so much harder   
 than risc compilers, so maybe I'll go with the MIPS here too.

 Non-processor targets might also be interesting... e.g. ANDF, the  
 architecture- neutral distribution format. There are converters from  
 ANDF to target code for different architectures.

Yeah, but that is a bit too much of rocket science for me :) Getting the  
compiler to do proper executable is hard enough, I don't want to hinder  
the development with unnecessarily complex target platforms :)

 My primary goal is to get my hands on different optimization  
 techniques  and to get familiar with complex flow- and data-analyses.

 I had just chatted with MadMan/TAP (aka MadenMann) yesterday. Perhaps he  
 would also be interested... He wanted to write a custom compiler for  
 Sega Mega-Drive.

I was thinking of making a compiler for ARM's RISC processor for GBA, but  
that project never really took off. It would have been an interesting  
project though, because the device has its restrictions and the  
instruction set is so simple.

 Btw, if anyone has pointers to some nice documents about OBJ-file   
 structure and such, I'd be interested.

 These are different. Digitalmars, Watcom, Borland: look for OMF.  
 Microsoft and some others use COFF. Other operating systems use either  
 some variant of COFF or some variant of ELF, or even something  
 completely different. The format of object files need not necessarily be  
 in correspondence with OS executable format, although i guess it makes  
 linker's life easier.

Hmm, so different compilers need different kind of OBJ files? So I can't  
use Watcom objs/libs on VC++... oh well.

Thanks for the info!

-texmex/sampsa lehtonen
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 16 2004

Ilya Minkov <minkov cs.tum.edu> writes:

Sampsa Lehtonen schrieb:

 Well I plan to implement just a subset of D first. Leave the 
 exceptions,  templates, mixins for later, just to get the primitive 
 things running.  Probably the most gratifying thing in compiler 
 construction is the moment  when you actually get something compiled and 
 it runs!

Okeydokey.

 By tools I meant parser generators and such, and refactoring tools are 
 a  whole different story though they are related to code parsing. 
 Refactoring  means transformatinos on code that do not break the meaning 
 of the code.  Pretty fun stuff, really. There are plenty of different 
 refactoring types,  starting from simple variable renaming to 
 super-class extraction etc.  Basicly they are tools to aid programming 
 by automating the tedious  primitive tasks programmers do daily. Check 
 out more at  http://www.refactoring.com/

Gotta look at them.

 BTW, nice thing about D is that D programs can be parsed easily as 
 there  are no preprocessor. This makes refactoring possible, unlike on 
 C++ where  the preprocessor can really f*ck up things. Currently I'm 
 waiting a D ide  that would include refactoring tools ;)

Yes. But i guess, someone would have to write refractoring tools before 
someone else would integrate them into an editor.

 COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll 
 go  with JavaCC (or Antlr, as it generates parsers in C++ too).

Yup. ANTLR is really worth it, but i just find COCO/R nice. The whole 
program is tiny, and generated compilers are small, complete, fast, 
readable. It has a few special features like comment, pragma processing, 
etc, extendable lookup both in parser and lexer, and context dependancy 
can be used. It should even be able to parse C++, i think. Though not 
many people have been writing grammars for it.

 Umm, I don't quite follow you. After the lexer has tokenized a token 
 and  the parser has accepted it, the actual text comes unnecessary 
 (unless it  is an identifier). So why would I want to load the whole 
 file into memory  and have tokens pointing into that big piece of text?...
 With lexer for an ide where parsing needs to be done constantly and on  
 varying places it is a different story, I guess...

I have found that a lexeme need almost only carry the text and the 
pointer to the lexer. Type of the lexeme is taken from the class 
hierarchy. Concrete subtypes may contain further information or methods. 
So far i have following types of lexeme defined:

* Indetifier;
* Numerical (matching both integer and floating-point);
* Crude.

I only need to read the first symbol to guess the type of the lexeme: a 
letter or underscore makes it an identifier, a number makes it numeric, 
and everything else is "crude", and is matched using a large switch of 
switches which includes operators and everything else unwieldy. I have 
language keywords be identifiers in the lexer, and only checked in the 
parser later. The lexeme is parsed in the constructor of the 
corresponding type - thus there is no stepping back, if there is a 
mismatch it is a fatal error.

So far i discriminate Crudes and keywords by text in the parser, this is 
very fast. No copies of data are being made, and in fact there are 
usually only few comparisons taking place each time because first 
characters carry the most information.

Note also that one doesn't have to have the lexemes store their position 
in the file for error reporting and such - in a function to get file and 
line i assert that the lexeme string is within the lexer's storage, and 
then i slice the lexer's storage from the beginning to the start of the 
lexeme. Then i only need to count the line ends in there to figure out 
the line. :) Or, i can even have a table with offsets of all line ends 
and simply scan through it.

Like, it all is nothing that couldn't be done some other way, but it 
just works so nicely!

 Yeah, but that is a bit too much of rocket science for me :) Getting 
 the  compiler to do proper executable is hard enough, I don't want to 
 hinder  the development with unnecessarily complex target platforms :)

I thought it might be a bit simpler. But a real simple CPU is perhaps 
better suited.

 I was thinking of making a compiler for ARM's RISC processor for GBA, 
 but  that project never really took off. It would have been an 
 interesting  project though, because the device has its restrictions and 
 the  instruction set is so simple.

Ever heard of "Gamepark-32", a korean game handheld? It is very popular 
with crazy developers. There are almost no games for it other than in 
korean language, and the handheld itself has to be imported, but it's 
cheap (aroung 120 eur IIRC), has the GCC devtools, and is accessed by 
USB, and runs programs from SmartMedia cards. :) It is based upon an 
ARM9 (as opposed to ARM7 in GBA) clocked with frequencies of 66, 100, 
133 or even (unwarranted) 166 MHz - the freq can be manipulated 
programmatically. It is a pure framebuffer device, it doesn't have any 
sprite acceleration like GBA, but a much better display (320x240 
hicolor), and one may have fun to figure out some cool software tricks 
to make it reach some notable performance.

The specs and some links for example here:

http://darkfader.net/gp32/

-eye

Aug 16 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Writing Compilers in D