digitalmars.D - [GSOC Draft Proposal] ANTLR and Java based D parser for IDE usage

Luca Boasso (134/134) Mar 27 2011 Sorry for my late draft proposal, I'm currently moving so I didn't

Bernard Helyer (4/4) Mar 28 2011 You've hit on every point I thought important for the ANTLR project, and...

Luca Boasso (2/6) Mar 28 2011

Bruno Medeiros (56/107) Mar 29 2011 Like Andrei said, and as is already mentioned in this proposal, I think

Luca Boasso (191/334) Mar 29 2011 Hello,

Bruno Medeiros (30/66) Apr 04 2011 In line with my previous comments on the proposal, I have some comments

Luca Boasso (52/135) Apr 04 2011 Thank you for your comments.

Luca Boasso (39/391) Mar 30 2011 Hello,

Andrei Alexandrescu (3/9) Mar 30 2011 Have a safe trip!

Luca Boasso (8/20) Apr 01 2011 Thanks Andrei, I'm back in my home country :)

Luca Boasso <luke.boasso gmail.com> writes:

Sorry for my late draft proposal, I'm currently moving so I didn't
have enough time this days.
I would be glad to have your opinion.

Thank you

<DRAFT PROPOSAL>

Rationale
---------

There are different IDEs for the D programming language. The purpose of thi=
s
project proposal is to write a parser for the D programming language (v1 an=
d v2)
that is tailored for IDEs needs. The new parser will be designed to be modu=
lar
and abstracted from any particular IDE implementation detail, so that it ca=
n be
used in different IDEs or with tools that need an abstract syntax tree of t=
he
source code for further analysis.
Particular care will be taken to integrate the new parser with the DDT
Eclipse-based IDE so that this project will be useful in the short-term.
The DDT project needs a new parser up-to-date with the latest D syntax, wit=
h
a better error recovery and improved performance.
Thanks to this integration it will be possible to understand the appropriat=
e
interface for the parser so that in the long-term the same code could be us=
ed in
different projects.

I will use the ANTLR parser generator for this project. This parser generat=
or
has been proven to be a valuable tool with advanced features for tree
construction and  tree manipulations that cuts development time [1]. The LL=
(*)
parsing algorithm  on which ANTLR is based upon allows efficient parsing of
complex grammar with good error handling and unrestricted grammar actions [=
2].

The use of a parser generator allows the creations of parsers in different
programming languages. This project will focus on the creation of a Java pa=
rser.
Since ANTLR support many target languages [3], it will not be so difficult =
to
generate a parser in the original implementation language of the IDE.
Eg. Generate a C++ parser for the D language that will be used in the IDE
written in C++.

Furthermore, updates of the D grammar are reflected in a more convenient wa=
y
through modifications of the ANTLR grammar of D, than through a modificatio=
n of
a hand-written parser.
In particular, one of the problems faced by DDT developers was to keep thei=
r
parser up-to-date with the reference one (DMD parser) [4].
It is time-consuming and error-prone to manually port the DMD parser writte=
n in
C++ to another language, instead most of the modification will be handled b=
y
ANTLR.

In addition, easy modification of the D language syntax encourages
experimentation for the benefit of the language's evolution.


Finally in the process of writing a new parser eventual misunderstanding or
inconsistency of the D language reference and documentations will be addres=
sed.
A good set of test will be created to guarantee the compatibility of the ne=
w
parser with the official language definition and the DMD parser.

Timeline
--------
This is a tentative timeline to be further discussed with the help of the
community. I'm committed to dedicate substantially to this project
knowing that I also have to pass some exams. I estimate that I could spend
initially approximately 25h/week. After the exam session I will work full-t=
ime
on this project.

* April 25 =96 May 23: Community Bonding Period
  Since I am new in the D community I will spend some time learning how to
  contribute while following the guideline of the community and the DDT pro=
ject.
  I will start reviewing the language reference asking for clarifications w=
hen
  needed. Once I have got an overall understanding I will write the product=
ion
  rules of the D grammar (v1 and v2) in the ANTLR grammar notation (similar=
 to
  EBNF).

* May 23 =96 July 11: Developing phase I
  The correctness of the parser is of paramount importance. I will create
  many tests to exercise the parser (at this point just a
"recognizer") obtained
  as output from ANTLR.
  Once I am confident with the parser conforms to the language reference an=
d
  recognizes the same language as the parser in DMD, I will enhance it with
  AST construction rules.
  At this point, I need to discuss with the DDT team the type of AST that h=
as to
  be built for IDEs purposes, and confirm which annotations are most useful
  (eg. source ranges).

* July 11 =96 August 15: Developing phase II
  In this phase I will create unit tests to verify the correctness of the
  generated trees and I will focus on the integration of the parser with th=
e DDT
  project.
  In the remaining time I will provide good error recovery to the parser an=
d I
  will improve the overall performance.

* August 15 - August 22: Final phase
  I will use this last week to polish the code and improve the documentatio=
n.
  As a final task, I will think about how support for incremental parsing c=
an be
  added in the future.

About me
---------

I am Luca Boasso, I am a computer engineering student of Polytechnic Univer=
sity
of Turin [5]. I won a scholarship for a double-degree program and obtained =
the
first master in France under Telecom ParisTech [6] in 2010.
After having an internship in Panasonic R&D in Cupertino [7], I am currentl=
y
finishing my last semester in Italy to receive my second master.
I have been always had a passion for programming language design and
implementations. I study and use in my spare time several programming langu=
ages.
I have become a recent supporter of the D programming language and I am ama=
zed
by its expressiveness, power and net improvement over C++.
In my master thesis, I designed a domain specific language for SAP lab Fran=
ce
and I wrote several parsers by hand and also with the help of ANTLR.

Contact Information
-------------------
Skype: luca_378
Email: luke.boasso gmail.com

Mar 27 2011

Bernard Helyer <b.helyer gmail.com> writes:

You've hit on every point I thought important for the ANTLR project, and 
it seems to me you understand the underlying influences of the need of 
it. I'm not a mentor, but I wish you all the best in this project -- a 
complete ANTLR grammar, with generated code in use, would be a great boon.

Mar 28 2011

Luca Boasso <luke.boasso gmail.com> writes:

Thank you very much, I appreciate a lot your comments

On 3/28/11, Bernard Helyer <b.helyer gmail.com> wrote:
 You've hit on every point I thought important for the ANTLR project, and
 it seems to me you understand the underlying influences of the need of
 it. I'm not a mentor, but I wish you all the best in this project -- a
 complete ANTLR grammar, with generated code in use, would be a great boon.

Mar 28 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 28/03/2011 01:52, Luca Boasso wrote:
Sorry for my late draft proposal, I'm currently moving so I didn't
have enough time this days.
I would be glad to have your opinion.

Thank you

Rationale
---------

There are different IDEs for the D programming language. The purpose of this
project proposal is to write a parser for the D programming language (v1 and
v2)
that is tailored for IDEs needs. The new parser will be designed to be modular
and abstracted from any particular IDE implementation detail, so that it can be
used in different IDEs or with tools that need an abstract syntax tree of the
source code for further analysis.
Particular care will be taken to integrate the new parser with the DDT
Eclipse-based IDE so that this project will be useful in the short-term.

The DDT project needs a new parser up-to-date with the latest D syntax, with
a better error recovery and improved performance.
Thanks to this integration it will be possible to understand the appropriate
interface for the parser so that in the long-term the same code could be used
in
different projects.

I will use the ANTLR parser generator for this project. This parser generator
has been proven to be a valuable tool with advanced features for tree
construction and tree manipulations that cuts development time [1]. The LL(*)
parsing algorithm on which ANTLR is based upon allows efficient parsing of
complex grammar with good error handling and unrestricted grammar actions [2].

The use of a parser generator allows the creations of parsers in different
programming languages. This project will focus on the creation of a Java
parser.
Since ANTLR support many target languages [3], it will not be so difficult to
generate a parser in the original implementation language of the IDE.
Eg. Generate a C++ parser for the D language that will be used in the IDE
written in C++.

Furthermore, updates of the D grammar are reflected in a more convenient way
through modifications of the ANTLR grammar of D, than through a modification of
a hand-written parser.
In particular, one of the problems faced by DDT developers was to keep their
parser up-to-date with the reference one (DMD parser) [4].
It is time-consuming and error-prone to manually port the DMD parser written in
C++ to another language, instead most of the modification will be handled by
ANTLR.

In addition, easy modification of the D language syntax encourages
experimentation for the benefit of the language's evolution.

Finally in the process of writing a new parser eventual misunderstanding or
inconsistency of the D language reference and documentations will be addressed.
A good set of test will be created to guarantee the compatibility of the new
parser with the official language definition and the DMD parser.

Like Andrei said, and as is already mentioned in this proposal, I think
the focus of this parser project should to integrate with DDT, so that
we can have something directly useful at the conclusion of the project.
And also to validate that the parser is worthwhile for IDE usage.
Fortunately this is not contrary to the other goals of making the
grammar reusable for other ANTLR-based parsers coded in another
language, or to make the D parser reusable in other Java-based projects.
The DDT AST classes (and the basic semantic engine) are already isolated
in their own bundle/module, conceptually independent of any Eclipse code
(there a few minor coded dependencies that are trivially removable).

The proposal text looks good to me, but one missing thing that I think
is key to consider is error recovery. The current parser (Descent/DMD)
is already fairly good at this, (although it could be improved in some
regards). The new ANTLR parser would not need to be as good as DMD, but
it should have good recovery at least in same basic IDE usage cases. So
for example:

/* block structure stuff: */
void func() {
blah(
}
// the parser should still recover successfully and parse the rest of //
the file after func

Recovery inside statements, and some other use cases are also very
important, but this can be discussed in more detail later, my point now
is just that the consideration of the syntax recovery should be present
to the proposal. (just mention it, no need to write much about it)

Some other comments relating to implementation and design details:

Once I have got an overall understanding I will write the production
rules of the D grammar (v1 and v2) in the ANTLR grammar notation (similar to
EBNF).

Hum, I am inclined to think that having two separate grammars for each
version of D is not the best approach. For starters, even for D2 there
are not one, but many version of the language, even with regards to just
parsing D2. True, we may choose to not support those previous versions,
and focus only on D2 as of TDPL, but it is still important to be mindful
of this. Also because there might be additions to the syntax of D2.
And here IDE development differs somewhat from a compiler. In a compiler
you would just change the parser code to the latest version of the
language. And so the latest compiler only supports the latest version of
the language. However, in the IDE you ideally want the latest version of
the IDE to support *all* previous versions of the language. Or at least
all versions that users might still want to code in.
So it is better perhaps to have just one grammar that is a superset of
D1 and D2, (and then afterward have some "syntax" validator on the
AST/tokens to make sure it is valid to a given language version)

On 28/03/2011 01:52, Luca Boasso wrote:
At this point, I need to discuss with the DDT team the type of AST

that has to
be built for IDEs purposes, and confirm which annotations are most

useful
(eg. source ranges).

As for the AST that should be generated, you can already see how it
should (mostly) be, by looking here:
http://code.google.com/a/eclipselabs.org/p/ddt/source/browse/#hg%2Forg.dsource.ddt.dtool%2Fsrc%2Fdtool%2Fast
That AST is generally what the parser should generate, although minor
adjustments and changes might be necessary or desirable, yes.
There are also some parser tests there, but they are very few and limited.

I also have some comments for the timeline but I'll leave that for
another post.

--
Bruno Medeiros - Software Engineer

Mar 29 2011

Luca Boasso <luke.boasso gmail.com> writes:

Hello,

thank you very much for your useful comments.

I have updated the proposal version in the www.google-melange.com.
I post here the changes and the updated version.

Changes:
- There are different IDEs for the D programming language. The purpose of t=
his
project proposal is to write a parser for a superset of the D programming
language (including v1 and v2) that is tailored for IDEs needs.

-Specifically, a good error recovery strategy is an essential feature in an=
 IDE.
The parser should be able to restore itself to a state where processing of =
the
input can continue with reasonable hopes that further processing will provi=
de
meaningful diagnostic information.

-Once I have got an overall understanding I will write the production rules=
 of
a superset of the D grammar in the ANTLR grammar notation (similar to EBNF)=
.

PROPOSAL

ANTLR and Java based D parser for IDE usage
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Luca Boasso

Abstract
--------

The project aims to implement an ANTLR parser for the D programming languag=
e and
the consequent integration with the DDT Eclipse-based IDE.
The parser will be designed to be reused in other IDEs or tools.

Rationale
---------

There are different IDEs for the D programming language. The purpose of thi=
s
project proposal is to write a parser for a superset of the D programming
language (including v1 and v2) that is tailored for IDEs needs.
The new parser will be designed to be modular and abstracted from any
particular IDE implementation detail, so that it can be used in different I=
DEs
or with tools that need an abstract syntax tree of the source code for furt=
her
analysis.

Particular care will be taken to integrate the new parser with the DDT
Eclipse-based IDE so that this project will be useful in the short-term.
The DDT project needs a new parser up-to-date with the latest D syntax, wit=
h
a better error recovery and improved performance [0].
Specifically, a good error recovery strategy is an essential feature in an =
IDE.
The parser should be able to restore itself to a state where processing of =
the
input can continue with reasonable hopes that further processing will provi=
de
meaningful diagnostic information.

Thanks to this integration it will be possible to understand the appropriat=
e
interface for the parser so that in the long-term the same code could be us=
ed
in different projects.

I will use the ANTLR parser generator for this project. This parser generat=
or
has been proven to be a valuable tool with advanced features for tree
construction and tree manipulations that cuts development time [1]. The LL(=
*)
parsing algorithm on which ANTLR is based upon allows efficient parsing of
complex grammar with good error handling and unrestricted grammar actions [=
2].

The use of a parser generator allows the creations of parsers in different
programming languages.
This project will focus on the creation of a Java parser. Since ANTLR suppo=
rt
many target languages [3], it will not be so difficult to  generate a parse=
r in
the original implementation language of the IDE. Eg. Generate a C++ parser =
for
the D language that will be used in the IDE  written in C++.

Furthermore, updates of the D grammar are reflected in a more convenient wa=
y
through modifications of the ANTLR grammar of D, than through a modificatio=
n of
a hand-written parser.
In particular, one of the problems faced by DDT developers was to keep thei=
r
parser up-to-date with the reference one (DMD parser) [4].
It is time-consuming and error-prone to manually port and mantain the DMD p=
arser
written in C++ to another language, instead most of the modification will b=
e
handled by ANTLR.

In addition, easy modification of the D language syntax encourages
experimentation for the benefit of the language's evolution.

Finally in the process of writing a new parser eventual misunderstanding or
inconsistency of the D language reference and documentations will be addres=
sed.

A good set of test will be created to guarantee the compatibility of the ne=
w
parser with the official language definition and the DMD parser.

Timeline
--------

This is a tentative timeline to be further discussed with the help of the
community.
I am committed to dedicate substantially to this project knowing that I als=
o
have to pass some exams.
I estimate that I could spend initially approximately 30h/week.
After the exam session I will work full-time on this project.

- April 25 =96 May 23: Community Bonding Period
  Since I am new in the D community I will spend some time learning how to
  contribute while following the guideline of the community and the
DDT project.
  I will start reviewing the language reference asking for clarifications
  when needed.
  Once I have got an overall understanding I will write the production rule=
s of
  a superset of the D grammar in the ANTLR grammar notation (similar to EBN=
F).

- May 23 =96 July 11: Developing phase I
  The correctness of the parser is of paramount importance.
  I will create many tests to exercise the parser (at this point just a
  "recognizer") obtained as output from ANTLR.
  Once I am confident with the parser conforms to the language reference an=
d
  recognizes the same language as the parser in DMD, I will enhance it with=
 AST
  construction rules.
  At this point, I need to discuss with the DDT team the type of AST that h=
as to
  be built for IDEs purposes, and confirm which annotations are most useful
  (eg. source ranges).

- July 11 =96 August 15: Developing phase II
  In this phase I will create unit tests to verify the correctness of the
  generated trees and I will focus on the integration of the parser with th=
e DDT
  project.
  In the remaining time I will provide good error recovery to the parser an=
d I
  will improve the overall performance.

- August 15 - August 22: Final phase
  I will use this last week to polish the code and improve the documentatio=
n.
  As a final task, I will think about how support for incremental parsing c=
an be
  added in the future.

About me
--------

I am Luca Boasso, I am a computer engineering student of Polytechnic Univer=
sity
of Turin [5].
I won a scholarship for a double-degree program and obtained the first mast=
er
in France under Telecom ParisTech [6] in 2010.
After having an internship in Panasonic R&D in Cupertino [7], I am currentl=
y
finishing my last semester in Italy to receive my second master.
I have been always had a passion for programming language design and
implementations.
I study and use in my spare time several programming languages.
I have become a recent supporter of the D programming language and I am ama=
zed
by its expressiveness, power and net improvement over C++.
In my master thesis, I designed a domain specific language for SAP lab Fran=
ce
and I wrote several parsers by hand and also with the help of ANTLR.

Contact Information
-------------------

Skype: luca_378
Email: luke.boasso gmail.com

References
----------

[0]   http://code.google.com/a/eclipselabs.org/p/ddt/wiki/DevelopmentGuide
[1]   http://www.antlr.org/why.html
[2]   http://www.antlr.org/papers/LL-star-PLDI11.pdf
[3]   http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets
[4]   http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Des=
cent_
      and_D_Eclipse_IDE_635.html
[5]   https://secure.wikimedia.org/wikipedia/en/wiki/Polytechnic_University=
_of
      _Turin
[6]   https://secure.wikimedia.org/wikipedia/en/wiki/TELECOM_ParisTech
[7]   http://www.research.panasonic.com/

On 3/29/11, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:
 On 28/03/2011 01:52, Luca Boasso wrote:
 Sorry for my late draft proposal, I'm currently moving so I didn't
 have enough time this days.
 I would be glad to have your opinion.

 Thank you

 <DRAFT PROPOSAL>

 Rationale
 ---------

 There are different IDEs for the D programming language. The purpose of
 this
 project proposal is to write a parser for the D programming language (v1
 and v2)
 that is tailored for IDEs needs. The new parser will be designed to be
 modular
 and abstracted from any particular IDE implementation detail, so that it
 can be
 used in different IDEs or with tools that need an abstract syntax tree o=

f
 the
 source code for further analysis.
 Particular care will be taken to integrate the new parser with the DDT
 Eclipse-based IDE so that this project will be useful in the short-term.

  >
 The DDT project needs a new parser up-to-date with the latest D syntax,
 with
 a better error recovery and improved performance.
 Thanks to this integration it will be possible to understand the
 appropriate
 interface for the parser so that in the long-term the same code could be
 used in
 different projects.

 I will use the ANTLR parser generator for this project. This parser
 generator
 has been proven to be a valuable tool with advanced features for tree
 construction and  tree manipulations that cuts development time [1]. The
 LL(*)
 parsing algorithm  on which ANTLR is based upon allows efficient parsing
 of
 complex grammar with good error handling and unrestricted grammar action=

s
 [2].

 The use of a parser generator allows the creations of parsers in differe=

nt
 programming languages. This project will focus on the creation of a Java
 parser.
 Since ANTLR support many target languages [3], it will not be so difficu=

lt
 to
 generate a parser in the original implementation language of the IDE.
 Eg. Generate a C++ parser for the D language that will be used in the ID=

E
 written in C++.

 Furthermore, updates of the D grammar are reflected in a more convenient
 way
 through modifications of the ANTLR grammar of D, than through a
 modification of
 a hand-written parser.
 In particular, one of the problems faced by DDT developers was to keep
 their
 parser up-to-date with the reference one (DMD parser) [4].
 It is time-consuming and error-prone to manually port the DMD parser
 written in
 C++ to another language, instead most of the modification will be handle=

d
 by
 ANTLR.

 In addition, easy modification of the D language syntax encourages
 experimentation for the benefit of the language's evolution.

 Finally in the process of writing a new parser eventual misunderstanding
 or
 inconsistency of the D language reference and documentations will be
 addressed.
 A good set of test will be created to guarantee the compatibility of the
 new
 parser with the official language definition and the DMD parser.

 Like Andrei said, and as is already mentioned in this proposal, I think
 the focus of this parser project should to integrate with DDT, so that
 we can have something directly useful at the conclusion of the project.
 And also to validate that the parser is worthwhile for IDE usage.
 Fortunately this is not contrary to the other goals of making the
 grammar reusable for other ANTLR-based parsers coded in another
 language, or to make the D parser reusable in other Java-based projects.
 The DDT AST classes (and the basic semantic engine) are already isolated
 in their own bundle/module, conceptually independent of any Eclipse code
 (there a few minor coded dependencies that are trivially removable).

 The proposal text looks good to me, but one missing thing that I think
 is key to consider is error recovery. The current parser (Descent/DMD)
 is already fairly good at this, (although it could be improved in some
 regards). The new ANTLR parser would not need to be as good as DMD, but
 it should have good recovery at least in same basic IDE usage cases. So
 for example:

 /* block structure stuff: */
 void func() {
    blah(
 }
 // the parser should still recover successfully and parse the rest of //
 the file after func

 Recovery inside statements, and some other use cases are also very
 important, but this can be discussed in more detail later, my point now
 is just that the consideration of the syntax recovery should be present
 to the proposal. (just mention it, no need to write much about it)

 Some other comments relating to implementation and design details:

 Once I have got an overall understanding I will write the production
    rules of the D grammar (v1 and v2) in the ANTLR grammar notation
 (similar to
    EBNF).

 Hum, I am inclined to think that having two separate grammars for each
 version of D is not the best approach. For starters, even for D2 there
 are not one, but many version of the language, even with regards to just
 parsing D2. True, we may choose to not support those previous versions,
 and focus only on D2 as of TDPL, but it is still important to be mindful
 of this. Also because there might be additions to the syntax of D2.
 And here IDE development differs somewhat from a compiler. In a compiler
 you would just change the parser code to the latest version of the
 language. And so the latest compiler only supports the latest version of
 the language. However, in the IDE you ideally want the latest version of
 the IDE to support *all* previous versions of the language. Or at least
 all versions that users might still want to code in.
 So it is better perhaps to have just one grammar that is a superset of
 D1 and D2, (and then afterward have some "syntax" validator on the
 AST/tokens to make sure it is valid to a given language version)

 On 28/03/2011 01:52, Luca Boasso wrote:
  >    At this point, I need to discuss with the DDT team the type of AST
 that has to
  >    be built for IDEs purposes, and confirm which annotations are most
 useful
  >    (eg. source ranges).

 As for the AST that should be generated, you can already see how it
 should (mostly) be, by looking here:
 http://code.google.com/a/eclipselabs.org/p/ddt/source/browse/#hg%2Forg.ds=

ource.ddt.dtool%2Fsrc%2Fdtool%2Fast
 That AST is generally what the parser should generate, although minor
 adjustments and changes might be necessary or desirable, yes.
 There are also some parser tests there, but they are very few and limited=

.
 I also have some comments for the timeline but I'll leave that for
 another post.

 --
 Bruno Medeiros - Software Engineer

Mar 29 2011

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 29/03/2011 19:51, Luca Boasso wrote:
 Timeline
 --------

 This is a tentative timeline to be further discussed with the help of the
 community.
 I am committed to dedicate substantially to this project knowing that I also
 have to pass some exams.
 I estimate that I could spend initially approximately 30h/week.
 After the exam session I will work full-time on this project.

 - April 25 � May 23: Community Bonding Period
    Since I am new in the D community I will spend some time learning how to
    contribute while following the guideline of the community and the
 DDT project.
    I will start reviewing the language reference asking for clarifications
    when needed.
    Once I have got an overall understanding I will write the production rules
of
    a superset of the D grammar in the ANTLR grammar notation (similar to EBNF).

 - May 23 � July 11: Developing phase I
    The correctness of the parser is of paramount importance.
    I will create many tests to exercise the parser (at this point just a
    "recognizer") obtained as output from ANTLR.
    Once I am confident with the parser conforms to the language reference and
    recognizes the same language as the parser in DMD, I will enhance it with
AST
    construction rules.
    At this point, I need to discuss with the DDT team the type of AST that has
to
    be built for IDEs purposes, and confirm which annotations are most useful
    (eg. source ranges).

 - July 11 � August 15: Developing phase II
    In this phase I will create unit tests to verify the correctness of the
    generated trees and I will focus on the integration of the parser with the
DDT
    project.
    In the remaining time I will provide good error recovery to the parser and I
    will improve the overall performance.

 - August 15 - August 22: Final phase
    I will use this last week to polish the code and improve the documentation.
    As a final task, I will think about how support for incremental parsing can
be
    added in the future.

In line with my previous comments on the proposal, I have some comments 
regarding the timeline as well. They are somewhat general comments, it 
may not be that worthwhile to go into much detail in the timeline aspect 
unless the proposal is actually accepted.

There is not much point in writing tests for a language-recognizer only 
parser, in other words, a test that only checks if the parser recognizes 
the source as valid or not. We can just feed a lot of existing valid 
source files(like Phobos, Tango, etc.) and check that the parser 
validates it correctly. (That doesn't test the *invalid* syntax cases, 
but that's a less important case for an IDE parser than making sure it 
is correct for the *valid* syntax cases)

The other thing is that AST generation with all the necessary info is 
probably going to be the most significant aspect of this project, in 
terms of effort required. And to implement the AST actions, I suspect it 
might be necessary (or at least desirable) to change the language 
grammar to better suit the actions that generate the AST.
So with this in mind, I think it would be better that, instead of doing 
a complete D language recognizer first and then adding the AST 
generation functionality, what should be done first is a AST-generating 
parser for a very limited D-like subset language (for example, a 
language with just variable, class, and function/function-parameter 
declarations), and then when we have this, to start expanding the 
grammar until it supports D1/D2 and has all the extra minutiae.
The point of this is develop a prototype with the essential and more 
difficult aspects of the parser (AST generation, source ranges, some 
error correction) as soon as possible, and the extra stuff afterwards, 
instead of the other way around.

-- 
Bruno Medeiros - Software Engineer

Apr 04 2011

Luca Boasso <luke.boasso gmail.com> writes:

Thank you for your comments.

Here the updated timeline, I'm always looking for advices:

- April 25 =96 May 23: Community Bonding Period
  Since I am new in the D community I will spend some time learning how to
  contribute while following the guideline of the community and the DDT pro=
ject.
  I will start reviewing the language reference asking for clarifications
  when needed.
  Once I have got an overall understanding I will write the production rule=
s of
  a subset of the D grammar(D0) in the ANTLR grammar notation (similar to E=
BNF).
  Since the AST generation functionality is a key factor for a correct
  integration with DDT, I will enhance the D0 parser with AST construction
  rules from the beginning.
  At this point, I need to discuss with the DDT team the type of AST that h=
as to
  be built for IDEs purposes, and confirm which annotations are most useful
  (eg. source ranges).

- May 23 =96 July 11: Developing phase I
  A fully functional D0 parser will be integrated in DDT.
  Once the integration is complete I will augment the parser to handle a
  superset of the D1 and D2 grammars.
  To check the correctness of the parser, it will be tested with existing a=
nd
  large D code base (like Phobos, Tango, the Andrei's TDPL book source
  code...).
  Subsequently I will modify the tree construction rules to reflect the cha=
nges
  in the syntax.

- July 11 =96 August 15: Developing phase II
  In this phase I will create unit tests to verify the correctness of the
  generated trees and I will focus on the remain aspects of the integration
  with the DDT project.
  In the remaining time I will provide good error recovery to the parser an=
d I
  will improve the overall performance.

- August 15 - August 22: Final phase
  I will use this last week to polish the code and improve the documentatio=
n.
  As a final task, I will think about how support for incremental parsing c=
an be
  added in the future.

On 4/4/11, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:
 On 29/03/2011 19:51, Luca Boasso wrote:
 Timeline
 --------

 This is a tentative timeline to be further discussed with the help of th=


e
 community.
 I am committed to dedicate substantially to this project knowing that I
 also
 have to pass some exams.
 I estimate that I could spend initially approximately 30h/week.
 After the exam session I will work full-time on this project.

 - April 25 =96 May 23: Community Bonding Period
    Since I am new in the D community I will spend some time learning how
 to
    contribute while following the guideline of the community and the
 DDT project.
    I will start reviewing the language reference asking for clarificatio=


ns
    when needed.
    Once I have got an overall understanding I will write the production
 rules of
    a superset of the D grammar in the ANTLR grammar notation (similar to
 EBNF).

 - May 23 =96 July 11: Developing phase I
    The correctness of the parser is of paramount importance.
    I will create many tests to exercise the parser (at this point just a
    "recognizer") obtained as output from ANTLR.
    Once I am confident with the parser conforms to the language referenc=


e
 and
    recognizes the same language as the parser in DMD, I will enhance it
 with AST
    construction rules.
    At this point, I need to discuss with the DDT team the type of AST th=


at
 has to
    be built for IDEs purposes, and confirm which annotations are most
 useful
    (eg. source ranges).

 - July 11 =96 August 15: Developing phase II
    In this phase I will create unit tests to verify the correctness of t=


he
    generated trees and I will focus on the integration of the parser wit=


h
 the DDT
    project.
    In the remaining time I will provide good error recovery to the parse=


r
 and I
    will improve the overall performance.

 - August 15 - August 22: Final phase
    I will use this last week to polish the code and improve the
 documentation.
    As a final task, I will think about how support for incremental parsi=


ng
 can be
    added in the future.

 In line with my previous comments on the proposal, I have some comments
 regarding the timeline as well. They are somewhat general comments, it
 may not be that worthwhile to go into much detail in the timeline aspect
 unless the proposal is actually accepted.

 There is not much point in writing tests for a language-recognizer only
 parser, in other words, a test that only checks if the parser recognizes
 the source as valid or not. We can just feed a lot of existing valid
 source files(like Phobos, Tango, etc.) and check that the parser
 validates it correctly. (That doesn't test the *invalid* syntax cases,
 but that's a less important case for an IDE parser than making sure it
 is correct for the *valid* syntax cases)

 The other thing is that AST generation with all the necessary info is
 probably going to be the most significant aspect of this project, in
 terms of effort required. And to implement the AST actions, I suspect it
 might be necessary (or at least desirable) to change the language
 grammar to better suit the actions that generate the AST.
 So with this in mind, I think it would be better that, instead of doing
 a complete D language recognizer first and then adding the AST
 generation functionality, what should be done first is a AST-generating
 parser for a very limited D-like subset language (for example, a
 language with just variable, class, and function/function-parameter
 declarations), and then when we have this, to start expanding the
 grammar until it supports D1/D2 and has all the extra minutiae.
 The point of this is develop a prototype with the essential and more
 difficult aspects of the parser (AST generation, source ranges, some
 error correction) as soon as possible, and the extra stuff afterwards,
 instead of the other way around.

 --
 Bruno Medeiros - Software Engineer

Apr 04 2011

Luca Boasso <luke.boasso gmail.com> writes:

Hello,
I will move from California to Italy tomorrow so I will not be able to
read new posts.
As soon as I arrive home I'll read and respond to any reviews.

Thank you

Luca Boasso

On 3/29/11, Luca Boasso <luke.boasso gmail.com> wrote:
 Hello,

 thank you very much for your useful comments.

 I have updated the proposal version in the www.google-melange.com.
 I post here the changes and the updated version.

 Changes:
 - There are different IDEs for the D programming language. The purpose of
 this
 project proposal is to write a parser for a superset of the D programming
 language (including v1 and v2) that is tailored for IDEs needs.

 -Specifically, a good error recovery strategy is an essential feature in =

an
 IDE.
 The parser should be able to restore itself to a state where processing o=

f
 the
 input can continue with reasonable hopes that further processing will
 provide
 meaningful diagnostic information.

 -Once I have got an overall understanding I will write the production rul=

es
 of
 a superset of the D grammar in the ANTLR grammar notation (similar to
 EBNF).

 PROPOSAL

 ANTLR and Java based D parser for IDE usage
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 Luca Boasso

 Abstract
 --------

 The project aims to implement an ANTLR parser for the D programming langu=

age
 and
 the consequent integration with the DDT Eclipse-based IDE.
 The parser will be designed to be reused in other IDEs or tools.

 Rationale
 ---------

 There are different IDEs for the D programming language. The purpose of
 this
 project proposal is to write a parser for a superset of the D programming
 language (including v1 and v2) that is tailored for IDEs needs.
 The new parser will be designed to be modular and abstracted from any
 particular IDE implementation detail, so that it can be used in different
 IDEs
 or with tools that need an abstract syntax tree of the source code for
 further
 analysis.

 Particular care will be taken to integrate the new parser with the DDT
 Eclipse-based IDE so that this project will be useful in the short-term.
 The DDT project needs a new parser up-to-date with the latest D syntax,
 with
 a better error recovery and improved performance [0].
 Specifically, a good error recovery strategy is an essential feature in a=

n
 IDE.
 The parser should be able to restore itself to a state where processing o=

f
 the
 input can continue with reasonable hopes that further processing will
 provide
 meaningful diagnostic information.

 Thanks to this integration it will be possible to understand the
 appropriate
 interface for the parser so that in the long-term the same code could be
 used
 in different projects.

 I will use the ANTLR parser generator for this project. This parser
 generator
 has been proven to be a valuable tool with advanced features for tree
 construction and tree manipulations that cuts development time [1]. The
 LL(*)
 parsing algorithm on which ANTLR is based upon allows efficient parsing o=

f
 complex grammar with good error handling and unrestricted grammar actions
 [2].

 The use of a parser generator allows the creations of parsers in differen=

t
 programming languages.
 This project will focus on the creation of a Java parser. Since ANTLR
 support
 many target languages [3], it will not be so difficult to  generate a par=

ser
 in
 the original implementation language of the IDE. Eg. Generate a C++ parse=

r
 for
 the D language that will be used in the IDE  written in C++.

 Furthermore, updates of the D grammar are reflected in a more convenient
 way
 through modifications of the ANTLR grammar of D, than through a modificat=

ion
 of
 a hand-written parser.
 In particular, one of the problems faced by DDT developers was to keep
 their
 parser up-to-date with the reference one (DMD parser) [4].
 It is time-consuming and error-prone to manually port and mantain the DMD
 parser
 written in C++ to another language, instead most of the modification will
 be
 handled by ANTLR.

 In addition, easy modification of the D language syntax encourages
 experimentation for the benefit of the language's evolution.

 Finally in the process of writing a new parser eventual misunderstanding =

or
 inconsistency of the D language reference and documentations will be
 addressed.

 A good set of test will be created to guarantee the compatibility of the
 new
 parser with the official language definition and the DMD parser.

 Timeline
 --------

 This is a tentative timeline to be further discussed with the help of the
 community.
 I am committed to dedicate substantially to this project knowing that I
 also
 have to pass some exams.
 I estimate that I could spend initially approximately 30h/week.
 After the exam session I will work full-time on this project.

 - April 25 =96 May 23: Community Bonding Period
   Since I am new in the D community I will spend some time learning how t=

o
   contribute while following the guideline of the community and the
 DDT project.
   I will start reviewing the language reference asking for clarifications
   when needed.
   Once I have got an overall understanding I will write the production ru=

les
 of
   a superset of the D grammar in the ANTLR grammar notation (similar to
 EBNF).

 - May 23 =96 July 11: Developing phase I
   The correctness of the parser is of paramount importance.
   I will create many tests to exercise the parser (at this point just a
   "recognizer") obtained as output from ANTLR.
   Once I am confident with the parser conforms to the language reference
 and
   recognizes the same language as the parser in DMD, I will enhance it wi=

th
 AST
   construction rules.
   At this point, I need to discuss with the DDT team the type of AST that
 has to
   be built for IDEs purposes, and confirm which annotations are most usef=

ul
   (eg. source ranges).

 - July 11 =96 August 15: Developing phase II
   In this phase I will create unit tests to verify the correctness of the
   generated trees and I will focus on the integration of the parser with =

the
 DDT
   project.
   In the remaining time I will provide good error recovery to the parser =

and
 I
   will improve the overall performance.

 - August 15 - August 22: Final phase
   I will use this last week to polish the code and improve the
 documentation.
   As a final task, I will think about how support for incremental parsing
 can be
   added in the future.

 About me
 --------

 I am Luca Boasso, I am a computer engineering student of Polytechnic
 University
 of Turin [5].
 I won a scholarship for a double-degree program and obtained the first
 master
 in France under Telecom ParisTech [6] in 2010.
 After having an internship in Panasonic R&D in Cupertino [7], I am
 currently
 finishing my last semester in Italy to receive my second master.
 I have been always had a passion for programming language design and
 implementations.
 I study and use in my spare time several programming languages.
 I have become a recent supporter of the D programming language and I am
 amazed
 by its expressiveness, power and net improvement over C++.
 In my master thesis, I designed a domain specific language for SAP lab
 France
 and I wrote several parsers by hand and also with the help of ANTLR.

 Contact Information
 -------------------

 Skype: luca_378
 Email: luke.boasso gmail.com

 References
 ----------

 [0]   http://code.google.com/a/eclipselabs.org/p/ddt/wiki/DevelopmentGuid=

e
 [1]   http://www.antlr.org/why.html
 [2]   http://www.antlr.org/papers/LL-star-PLDI11.pdf
 [3]   http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets
 [4]
 http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent=

_
       and_D_Eclipse_IDE_635.html
 [5]
 https://secure.wikimedia.org/wikipedia/en/wiki/Polytechnic_University_of
       _Turin
 [6]   https://secure.wikimedia.org/wikipedia/en/wiki/TELECOM_ParisTech
 [7]   http://www.research.panasonic.com/

 On 3/29/11, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:
 On 28/03/2011 01:52, Luca Boasso wrote:
 Sorry for my late draft proposal, I'm currently moving so I didn't
 have enough time this days.
 I would be glad to have your opinion.

 Thank you

 <DRAFT PROPOSAL>

 Rationale
 ---------

 There are different IDEs for the D programming language. The purpose of
 this
 project proposal is to write a parser for the D programming language (v=

1
 and v2)
 that is tailored for IDEs needs. The new parser will be designed to be
 modular
 and abstracted from any particular IDE implementation detail, so that i=

t
 can be
 used in different IDEs or with tools that need an abstract syntax tree
 of
 the
 source code for further analysis.
 Particular care will be taken to integrate the new parser with the DDT
 Eclipse-based IDE so that this project will be useful in the short-term=

.
  >
 The DDT project needs a new parser up-to-date with the latest D syntax,
 with
 a better error recovery and improved performance.
 Thanks to this integration it will be possible to understand the
 appropriate
 interface for the parser so that in the long-term the same code could b=

e
 used in
 different projects.

 I will use the ANTLR parser generator for this project. This parser
 generator
 has been proven to be a valuable tool with advanced features for tree
 construction and  tree manipulations that cuts development time [1]. Th=

e
 LL(*)
 parsing algorithm  on which ANTLR is based upon allows efficient parsin=

g
 of
 complex grammar with good error handling and unrestricted grammar
 actions
 [2].

 The use of a parser generator allows the creations of parsers in
 different
 programming languages. This project will focus on the creation of a Jav=

a
 parser.
 Since ANTLR support many target languages [3], it will not be so
 difficult
 to
 generate a parser in the original implementation language of the IDE.
 Eg. Generate a C++ parser for the D language that will be used in the
 IDE
 written in C++.

 Furthermore, updates of the D grammar are reflected in a more convenien=

t
 way
 through modifications of the ANTLR grammar of D, than through a
 modification of
 a hand-written parser.
 In particular, one of the problems faced by DDT developers was to keep
 their
 parser up-to-date with the reference one (DMD parser) [4].
 It is time-consuming and error-prone to manually port the DMD parser
 written in
 C++ to another language, instead most of the modification will be
 handled
 by
 ANTLR.

 In addition, easy modification of the D language syntax encourages
 experimentation for the benefit of the language's evolution.

 Finally in the process of writing a new parser eventual misunderstandin=

g
 or
 inconsistency of the D language reference and documentations will be
 addressed.
 A good set of test will be created to guarantee the compatibility of th=

e
 new
 parser with the official language definition and the DMD parser.

 Like Andrei said, and as is already mentioned in this proposal, I think
 the focus of this parser project should to integrate with DDT, so that
 we can have something directly useful at the conclusion of the project.
 And also to validate that the parser is worthwhile for IDE usage.
 Fortunately this is not contrary to the other goals of making the
 grammar reusable for other ANTLR-based parsers coded in another
 language, or to make the D parser reusable in other Java-based projects.
 The DDT AST classes (and the basic semantic engine) are already isolated
 in their own bundle/module, conceptually independent of any Eclipse code
 (there a few minor coded dependencies that are trivially removable).

 The proposal text looks good to me, but one missing thing that I think
 is key to consider is error recovery. The current parser (Descent/DMD)
 is already fairly good at this, (although it could be improved in some
 regards). The new ANTLR parser would not need to be as good as DMD, but
 it should have good recovery at least in same basic IDE usage cases. So
 for example:

 /* block structure stuff: */
 void func() {
    blah(
 }
 // the parser should still recover successfully and parse the rest of //
 the file after func

 Recovery inside statements, and some other use cases are also very
 important, but this can be discussed in more detail later, my point now
 is just that the consideration of the syntax recovery should be present
 to the proposal. (just mention it, no need to write much about it)

 Some other comments relating to implementation and design details:

 Once I have got an overall understanding I will write the production
    rules of the D grammar (v1 and v2) in the ANTLR grammar notation
 (similar to
    EBNF).

 Hum, I am inclined to think that having two separate grammars for each
 version of D is not the best approach. For starters, even for D2 there
 are not one, but many version of the language, even with regards to just
 parsing D2. True, we may choose to not support those previous versions,
 and focus only on D2 as of TDPL, but it is still important to be mindful
 of this. Also because there might be additions to the syntax of D2.
 And here IDE development differs somewhat from a compiler. In a compiler
 you would just change the parser code to the latest version of the
 language. And so the latest compiler only supports the latest version of
 the language. However, in the IDE you ideally want the latest version of
 the IDE to support *all* previous versions of the language. Or at least
 all versions that users might still want to code in.
 So it is better perhaps to have just one grammar that is a superset of
 D1 and D2, (and then afterward have some "syntax" validator on the
 AST/tokens to make sure it is valid to a given language version)

 On 28/03/2011 01:52, Luca Boasso wrote:
  >    At this point, I need to discuss with the DDT team the type of AST
 that has to
  >    be built for IDEs purposes, and confirm which annotations are most
 useful
  >    (eg. source ranges).

 As for the AST that should be generated, you can already see how it
 should (mostly) be, by looking here:
 http://code.google.com/a/eclipselabs.org/p/ddt/source/browse/#hg%2Forg.d=

source.ddt.dtool%2Fsrc%2Fdtool%2Fast
 That AST is generally what the parser should generate, although minor
 adjustments and changes might be necessary or desirable, yes.
 There are also some parser tests there, but they are very few and
 limited.

 I also have some comments for the timeline but I'll leave that for
 another post.

 --
 Bruno Medeiros - Software Engineer

Mar 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 03/30/2011 11:20 PM, Luca Boasso wrote:
 Hello,
 I will move from California to Italy tomorrow so I will not be able to
 read new posts.
 As soon as I arrive home I'll read and respond to any reviews.


 Thank you

 Luca Boasso

Have a safe trip!

Andrei

Mar 30 2011

Luca Boasso <luke.boasso gmail.com> writes:

Thanks Andrei, I'm back in my home country :)
Does somebody has any other constructive critique on my proposal?
I think I have polished the points that Bruno brought up, maybe the
timeline should be adjusted.
Any suggestions?

Thank you very much

Luca Boasso



On 3/31/11, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 03/30/2011 11:20 PM, Luca Boasso wrote:
 Hello,
 I will move from California to Italy tomorrow so I will not be able to
 read new posts.
 As soon as I arrive home I'll read and respond to any reviews.


 Thank you

 Luca Boasso

 Have a safe trip!

 Andrei

Apr 01 2011

D Programming

C/C++ Programming

Other

digitalmars.D - [GSOC Draft Proposal] ANTLR and Java based D parser for IDE usage