digitalmars.D - D Parsing (again)/ D grammar

Vladimir Kazanov (24/24) Oct 02 2014 Hello, everyone!

Daniel Kozak via Digitalmars-d (4/34) Oct 02 2014 V Thu, 02 Oct 2014 13:49:10 +0000

Vladimir Kazanov (4/5) Oct 02 2014 Thank you, this definitely looks like something I was looking for.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/14) Oct 02 2014 Cool, GLL is the way to go IMO, but I am also looking at

Vladimir Kazanov (19/28) Oct 02 2014 They are somewhat similar in terms of asymptotic complexity on

Cliff (6/37) Oct 02 2014 What has steered you down the path of writing your own parser

Vladimir Kazanov (9/13) Oct 02 2014 Like I said in the introducing post, this is a personal

Cliff (5/19) Oct 02 2014 Very cool - post the GitHub or equivalent when you get the chance

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/13) Oct 02 2014 This sounds nice! Does this mean that it would be possible to use

Vladimir Kazanov (17/31) Oct 02 2014 Chances are that I will be able to get the original GLL parser

Philippe Sigaud via Digitalmars-d (24/34) Oct 02 2014 I did that during this summer, almost to the point it was

Vladimir Kazanov (7/18) Oct 02 2014 Nice! Is it public? Github?

Philippe Sigaud via Digitalmars-d (23/35) Oct 02 2014 No github repo. I could put it alongside Pegged, I suppose. I used git

Vladimir Kazanov (26/43) Oct 02 2014 I have a huge collection of projects I never published :-) We all

Philippe Sigaud via Digitalmars-d (12/32) Oct 02 2014 I don't know. Using fibers, I'm hoping to get interesting results one

Vladimir Kazanov (15/36) Oct 02 2014 Don't know and don't care, really. All I know is that Scott sort

Philippe Sigaud via Digitalmars-d (30/33) Oct 03 2014 Sorry for the microscopic documentation (Pegged is more documented

Vladimir Kazanov (10/32) Oct 03 2014 This is perfectly fine with me: I think I should be okay with the

Philippe Sigaud via Digitalmars-d (4/6) Oct 03 2014 Be sure to keep me informed of any enormous mistake I made. I tried

Vladimir Kazanov (8/13) Oct 03 2014 I am not sure if there are any. Maybe GLL just IS non-practical,

Vladimir Kazanov (4/34) Oct 02 2014 Check the mailbox,

Philippe Sigaud via Digitalmars-d (3/5) Oct 02 2014 I sent it to you. I was asleep, sorry :-)

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/15) Oct 02 2014 Nice! If you manage to get a GLL generator for D (or C++) going

Vladimir Kazanov (9/15) Oct 02 2014 You will definitely hear about it here :-) I don't know yet if

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

Hello, everyone!

Recently, I have been doing some experimental work related to 
generalized parser generators. It is not yet ready for a public 
release, I am not even sure it will ever be. Anyway, right now I 
am considering adding D as a target for parser generation. 
Actually, it's more than that: the grand target is to be able to 
generate a D language parser in D. With Python, for example, it 
is rather easy, as I can just mechanically transform the original 
grammar into a suitable form and be done with it.

The generator itself is quite powerful, theoretically it should 
be able to handle all context-free grammar (see 
http://dotat.at/tmp/gll.pdf for theory).

There is a grammar page on dlang.org 
(http://dlang.org/grammar.html). I have also found a related 
discussion in the forum 
(http://forum.dlang.org/thread/bwsofbnigfbrxwouiobj forum.dlang.org). 
 From the discussion I found out that D parser is a hand-made 
RD-parser with "a few tricks"(c).

So, my questions:

1. How realistic is the grammar? Is it a real one or just a 
sketch of sorts? Is it something developers use to build the 
parser?

2. There's also a D grammar for Pegged. How good is it? Can it be 
used as a starting point to build a parser?

Oct 02 2014

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Thu, 02 Oct 2014 13:49:10 +0000
Vladimir Kazanov via Digitalmars-d <digitalmars-d puremagic.com>
napsáno:

 Hello, everyone!
 
 Recently, I have been doing some experimental work related to 
 generalized parser generators. It is not yet ready for a public 
 release, I am not even sure it will ever be. Anyway, right now I 
 am considering adding D as a target for parser generation. 
 Actually, it's more than that: the grand target is to be able to 
 generate a D language parser in D. With Python, for example, it 
 is rather easy, as I can just mechanically transform the original 
 grammar into a suitable form and be done with it.
 
 The generator itself is quite powerful, theoretically it should 
 be able to handle all context-free grammar (see 
 http://dotat.at/tmp/gll.pdf for theory).
 
 There is a grammar page on dlang.org 
 (http://dlang.org/grammar.html). I have also found a related 
 discussion in the forum 
 (http://forum.dlang.org/thread/bwsofbnigfbrxwouiobj forum.dlang.org). 
  From the discussion I found out that D parser is a hand-made 
 RD-parser with "a few tricks"(c).
 
 So, my questions:
 
 1. How realistic is the grammar? Is it a real one or just a 
 sketch of sorts? Is it something developers use to build the 
 parser?
 
 2. There's also a D grammar for Pegged. How good is it? Can it be 
 used as a starting point to build a parser?

https://github.com/Hackerpilot/DGrammar

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 14:08:09 UTC, Daniel Kozak via 
Digitalmars-d wrote:

 https://github.com/Hackerpilot/DGrammar

Thank you, this definitely looks like something I was looking for.

Any known pitfalls here..?

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 13:49:12 UTC, Vladimir Kazanov 
wrote:
 The generator itself is quite powerful, theoretically it should 
 be able to handle all context-free grammar (see 
 http://dotat.at/tmp/gll.pdf for theory).

Cool, GLL is the way to go IMO, but I am also looking at 
Earley-parsers. What is the advantage of GLL over Earley if you 
use a parser generator? I think they both are O(3) or something 
like that?

 From the discussion I found out that D parser is a hand-made 
 RD-parser with "a few tricks"(c).

I think D is close to LL(2) for the most part. But I suppose a 
GLL parser could allow keywords to be used as symbol names in 
most cases? That would be nice.

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad 
wrote:
 Cool, GLL is the way to go IMO, but I am also looking at 
 Earley-parsers. What is the advantage of GLL over Earley if you 
 use a parser generator? I think they both are O(3) or something 
 like that?

They are somewhat similar in terms of asymptotic complexity on 
complicated examples. Constant is better though. But there's a 
nice property of all generalized parsers: for LL (for GLL) and  
LR (for GLR) parts of grammars they go almost as fast as LL/LR 
parsers do. On ambiguities they slow down, of course.

There are four properties I really like:

1. GLL should be faster than Earley's (even the modern 
incarnations of it), but this is something I have yet to test.

2. It is fully general.

3. The automatically generated code repeats the original grammar 
structure - the same way recursive decent parsers do.

4. The core parser is still that simple LL/RD parser I can 
practically debug.

This comes at a price, as usual... I would not call it obvious 
:-) But nobody can say that modern Earley's flavours are trivial.

 From the discussion I found out that D parser is a hand-made 
 RD-parser with "a few tricks"(c).

 I think D is close to LL(2) for the most part. But I suppose a 
 GLL parser could allow keywords to be used as symbol names in 
 most cases? That would be nice.

This is possible, I guess, the same way people do it in GLR 
parsers.

Oct 02 2014

"Cliff" <cliff.s.hudson gmail.com> writes:

On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
wrote:
 On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim 
 Grøstad wrote:
 Cool, GLL is the way to go IMO, but I am also looking at 
 Earley-parsers. What is the advantage of GLL over Earley if 
 you use a parser generator? I think they both are O(3) or 
 something like that?

 They are somewhat similar in terms of asymptotic complexity on 
 complicated examples. Constant is better though. But there's a 
 nice property of all generalized parsers: for LL (for GLL) and  
 LR (for GLR) parts of grammars they go almost as fast as LL/LR 
 parsers do. On ambiguities they slow down, of course.

 There are four properties I really like:

 1. GLL should be faster than Earley's (even the modern 
 incarnations of it), but this is something I have yet to test.

 2. It is fully general.

 3. The automatically generated code repeats the original 
 grammar structure - the same way recursive decent parsers do.

 4. The core parser is still that simple LL/RD parser I can 
 practically debug.

 This comes at a price, as usual... I would not call it obvious 
 :-) But nobody can say that modern Earley's flavours are 
 trivial.

 From the discussion I found out that D parser is a hand-made 
 RD-parser with "a few tricks"(c).

 I think D is close to LL(2) for the most part. But I suppose a 
 GLL parser could allow keywords to be used as symbol names in 
 most cases? That would be nice.

 This is possible, I guess, the same way people do it in GLR 
 parsers.

What has steered you down the path of writing your own parser 
generator as opposed to using an existing one such as ANTLR?  
Were there properties you wanted that it didn't have, or 
performance, or...?

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:

 What has steered you down the path of writing your own parser 
 generator as opposed to using an existing one such as ANTLR?  
 Were there properties you wanted that it didn't have, or 
 performance, or...?

Like I said in the introducing post, this is a personal 
experiment of sorts. I am aware of most alternatives, such as 
ANTLR's ALL(*) and many, MANY others. :) And I would never write 
something myself as a part of my full-time job.

But right now I am writing an article on generalized parsers, 
toying with implementations I could lay my hands on, implementing 
others. GLL is a rather exotic LL flavor which looks attractive 
in theory. I want to see it in practice.

Oct 02 2014

"Cliff" <cliff.s.hudson gmail.com> writes:

On Thursday, 2 October 2014 at 17:43:45 UTC, Vladimir Kazanov 
wrote:
 On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:

 What has steered you down the path of writing your own parser 
 generator as opposed to using an existing one such as ANTLR?  
 Were there properties you wanted that it didn't have, or 
 performance, or...?

 Like I said in the introducing post, this is a personal 
 experiment of sorts. I am aware of most alternatives, such as 
 ANTLR's ALL(*) and many, MANY others. :) And I would never 
 write something myself as a part of my full-time job.

 But right now I am writing an article on generalized parsers, 
 toying with implementations I could lay my hands on, 
 implementing others. GLL is a rather exotic LL flavor which 
 looks attractive in theory. I want to see it in practice.

Very cool - post the GitHub or equivalent when you get the chance 
(assuming you are sharing).  This is an area of interest for me 
as well.

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
wrote:
 3. The automatically generated code repeats the original 
 grammar structure - the same way recursive decent parsers do.

 4. The core parser is still that simple LL/RD parser I can 
 practically debug.

This sounds nice! Does this mean that it would be possible to use 
your parser generator to create a skeleton which is then 
manipulated manually or is there non-local complexities that 
makes manual edits risky?

One reason I believe GLL is the right way to go is that I think 
RD makes it easier to generate good error messages suitable for 
display (to the end user).

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 17:39:55 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
 wrote:
 3. The automatically generated code repeats the original 
 grammar structure - the same way recursive decent parsers do.

 4. The core parser is still that simple LL/RD parser I can 
 practically debug.

 This sounds nice! Does this mean that it would be possible to 
 use your parser generator to create a skeleton which is then 
 manipulated manually or is there non-local complexities that 
 makes manual edits risky?

Chances are that I will be able to get the original GLL parser 
generator from one of algorithm authors (Adrian Johnstone). He's 
really helpful here. From that point, I will only have to add a 
frontend for generating a concrete parser, starting with Python - 
it already has a fully working grammar. Hopefully, I will also be 
able to find a suitable grammar for D, it is always a pleasure to 
play with the language.

Otherwise, I will continue my current effort - implementing the 
GLL parser generator in D.

Anyway, right now, from what I see in the papers, it looks like 
it is quite possible, yes. Again, requires a bit of understanding 
(nothing special, really), but compared to dumb LALR-style tables 
it's nothing.

 One reason I believe GLL is the right way to go is that I think 
 RD makes it easier to generate good error messages suitable for 
 display (to the end user).

This is one of the reasons I prefer LL/RD parsers :-) They are 
easy to follow.

Oct 02 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

 Chances are that I will be able to get the original GLL parser generator
 from one of algorithm authors (Adrian Johnstone). He's really helpful here.
 From that point, I will only have to add a frontend for generating a
 concrete parser, starting with Python - it already has a fully working
 grammar. Hopefully, I will also be able to find a suitable grammar for D, it
 is always a pleasure to play with the language.

 Otherwise, I will continue my current effort - implementing the GLL parser
 generator in D.

I did that during this summer, almost to the point it was
self-sustained (that is, the GLL parser generator was used to generate
a parser for grammars files). I chose a range interface: input is a
string, the parse forest is output as a range of parse trees.

I loved to see it generate the different possible parse trees on
ambiguous grammars, and accepting left- and right-recursing grammars!
GLL is a very noce algorithm.

Halas, even with some shortcuts on Kleene stars it was quite slow. I
tried to use threads (spawning worker threads on alternatives), but
that did not change the timings much.

I could make it generate a parser for JSON and compared it to the
jsonx module that Sönke presented here. Bah, it was 1000 times slower
(which is all relative: it still takes only a few ms to parse a big
JSON file. But still, far away from the microseconds it should need).
Pegged was faster that this GLL generator, but of course still far
slower than jsonx.

[Of course, Pegged can cheat: it can parse your file at compile-time,
resulting in 0-µs parsing time at runtime :-)]

Also, the way I coded it I hit CTFE limits and could not have it parse
at compile-time. A shame, really.

 This is one of the reasons I prefer LL/RD parsers :-) They are easy to
 follow.

I would like to try a LALR compile-time generator and compile-time
parser. I'm pretty sure LALR tables could be expanded directly as D
code instead of being read during parsing.

Philippe

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 18:20:36 UTC, Philippe Sigaud via 
Digitalmars-d wrote:
 I did that during this summer, almost to the point it was
 self-sustained (that is, the GLL parser generator was used to 
 generate
 a parser for grammars files). I chose a range interface: input 
 is a
 string, the parse forest is output as a range of parse trees.

Nice! Is it public? Github?

 Halas, even with some shortcuts on Kleene stars it was quite 
 slow. I
 tried to use threads (spawning worker threads on alternatives), 
 but
 that did not change the timings much.

AFAIK, multithreading is a bad idea in parsing.

Actually, in the gll-combinators Scala project they have similar 
speed problems. I don't if it's a fundamental algorithm problem 
or an implementation details that lead to slowdowns.

Oct 02 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

 I did that during this summer, almost to the point it was
 self-sustained (that is, the GLL parser generator was used to generate
 a parser for grammars files). I chose a range interface: input is a
 string, the parse forest is output as a range of parse trees.


 Nice! Is it public? Github?

No github repo. I could put it alongside Pegged, I suppose. I used git
internally, though.

 I was a bit disheartened by the speed I got, so did not publish nor
announced it here.

Note also that I saw it as an alternative engine for my own Pegged
project, so I used the same way of defining grammars (some prefixes
ans suffixes for dropping nodes in the parse tree and so on).

I can send you the code (well the entire repo), if you wish. I did not
touch it for the past 3 months, so I already don't remember what state
it was in :-(.
Looking at the code now, it seems I'm still using Pegged to parse the
initial grammar. Bootstrapping did not go well.

Send me an email at firstname . lastname  gmail.com

(philippe sigaud)


 Halas, even with some shortcuts on Kleene stars it was quite slow. I
 tried to use threads (spawning worker threads on alternatives), but
 that did not change the timings much.


 AFAIK, multithreading is a bad idea in parsing.

I think, as many things in CS, that people developed parsing
algorithms before multicores existed.
Spawning threads or fibers for some alternatives (A | B) seems nice to
me. I got interesting results with a purely multithreaded parser that
spawned children for each choice.


 Actually, in the gll-combinators Scala project they have similar speed
 problems. I don't if it's a fundamental algorithm problem or an
 implementation details that lead to slowdowns.

Well, when all resulting implementations have speed problems...
I found an article by the GLL authors explaining how they coded their
algorithm for more speed, but I couldn't wrap my head around it.

Philippe

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via 
Digitalmars-d
 No github repo. I could put it alongside Pegged, I suppose. I 
 used git
 internally, though.

  I was a bit disheartened by the speed I got, so did not 
 publish nor
 announced it here.

I have a huge collection of projects I never published :-) We all 
do, I guess.

 I think, as many things in CS, that people developed parsing
 algorithms before multicores existed.
 Spawning threads or fibers for some alternatives (A | B) seems 
 nice to
 me. I got interesting results with a purely multithreaded 
 parser that
 spawned children for each choice.

No, this is not exactly what I mean. Multithreading can be 
perfectly fine in some cases, very fruitful sometimes. Say, in 
NLP, when one has to process long or highly ambiguous strings, 
and the parse tree can become huge and is of importance in 
itself... Yes. In programming language parsing this is just a 
small step towards further stages within a longer pipeline. It 
has to be fast enough to make multithreading on this step an 
overkill.

 Well, when all resulting implementations have speed problems...

Well... This is why I want to study the original implementation. 
I had read the story of red-black trees some time ago - it took a 
while to get it right, more to make it elegant.

BTW, there's one an interesting related work that probably should 
be taken as a positive example of generalized parsing: the 
Elkhound parser generator. It uses a hybrid LALR/GLR approach, 
thus achieving better performance. There's much more to it, I 
didn't go too deep into it.

 I found an article by the GLL authors explaining how they coded 
 their
 algorithm for more speed, but I couldn't wrap my head around it.

By now, I have read the original Tomita's GLR paper, Antlr ALL(*) 
paper, a few recent GLR papers, three papers on GLL and a few 
related ones . It took... A while. I sort of understand the idea, 
but still not sure about the details :-)

What's the name of the paper you read? "Modelling GLL Parser 
Implementation"?

Oct 02 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

 I have a huge collection of projects I never published :-) We all do, I
 guess.

Oh, the ratio is more and 100 projects for one published...


 No, this is not exactly what I mean. Multithreading can be perfectly fine in
 some cases, very fruitful sometimes. Say, in NLP, when one has to process
 long or highly ambiguous strings, and the parse tree can become huge and is
 of importance in itself... Yes. In programming language parsing this is just
 a small step towards further stages within a longer pipeline. It has to be
 fast enough to make multithreading on this step an overkill.

I don't know. Using fibers, I'm hoping to get interesting results one
day. I got it by a workstorm before trying fibers. OS threads were a
bit to heavy and didn't work that well.


 BTW, there's one an interesting related work that probably should be taken
 as a positive example of generalized parsing: the Elkhound parser generator.
 It uses a hybrid LALR/GLR approach, thus achieving better performance.
 There's much more to it, I didn't go too deep into it.

Yes, Elkhound is interesting, their approach is nice. But It gave me
the impression to be abandoned for a few years?

 I found an article by the GLL authors explaining how they coded their
 algorithm for more speed, but I couldn't wrap my head around it.


 By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a
 few recent GLR papers, three papers on GLL and a few related ones . It
 took... A while. I sort of understand the idea, but still not sure about the
 details :-)

ALL(*) is on my todo list. I tried to implement it in Spring, but got
bogged down in the details. Even the white paper has some imprecisions
when you really sit down and try to code it.
I could have a look at ANTLR v4 source, but wanted to have a clean
start, right from the white paper.

 What's the name of the paper you read? "Modelling GLL Parser
 Implementation"?

Yes.

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Friday, 3 October 2014 at 05:10:45 UTC, Philippe Sigaud via 
Digitalmars-d wrote:


 Yes, Elkhound is interesting, their approach is nice. But It 
 gave me
 the impression to be abandoned for a few years?

Don't know and don't care, really. All I know is that Scott sort 
of managed to deal with the main generalized parser application 
problems by avoiding them most of the time :)) May be a bad 
sign... After all, most of modern parser generators or parser 
combinators do not use GLRs, although they do sound interesting 
in theory.

Something tells me one has to stress this word here: *theory* :-)

 By now, I have read the original Tomita's GLR paper, Antlr 
 ALL(*) paper, a
 few recent GLR papers, three papers on GLL and a few related 
 ones . It
 took... A while. I sort of understand the idea, but still not 
 sure about the
 details :-)

 ALL(*) is on my todo list. I tried to implement it in Spring, 
 but got
 bogged down in the details. Even the white paper has some 
 imprecisions
 when you really sit down and try to code it.
 I could have a look at ANTLR v4 source, but wanted to have a 
 clean
 start, right from the white paper.

 What's the name of the paper you read? "Modelling GLL Parser
 Implementation"?

 Yes.

Scientists... "The algorithm is hard to implement... Okay, let's 
invent an esoteric paper-only language to explain things to 
people" :-)

Thanks a lot, by the way!

I've just skimmed through the code and the README... You did not 
use the packed forest representation, did you..?

Oct 02 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

 Thanks a lot, by the way!

 I've just skimmed through the code and the README... You did not use the
 packed forest representation, did you..?

Sorry for the microscopic documentation (Pegged is more documented
than that...), it was a 'for me only' project.

The forest is packed, in the sense that common nodes are re-used and
shared among parse trees: all intermediate parse results from any
grammar part is stored and used to produce the parse nodes.

The range view gives access to parse trees one after another, but the
global parse forest is present in the grammar object (or rather,
generated and completed during the parse process: each new parse
result completes the parse forest).

It has a strange effect on parsing times repartition among the parse results:
If you time the different parse trees, you'll see that the first one
might take maybe 40% of the entire parsing time all by itself, because
it has to discover all parse results alone. The following trees are
very rapidly calculated, since the intermediate parse results are
already known. Of course, once the parse trees begin to deviate from
the first ones, the process slows down again since they have to
explore as-yet-unused rules and input slices.

I'm not sure the previous paragraph is clear...

Imagine you have 10 different parse trees. They could be disributed like this:


1                          40%
2                          2
3                          3
4                          3
5                          5
6                          6
7                          8
8                          10
9                          11
10                        12

Oct 03 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Friday, 3 October 2014 at 16:20:29 UTC, Philippe Sigaud via 
Digitalmars-d wrote:
 Thanks a lot, by the way!

 I've just skimmed through the code and the README... You did 
 not use the
 packed forest representation, did you..?

 Sorry for the microscopic documentation (Pegged is more 
 documented
 than that...), it was a 'for me only' project.

This is perfectly fine with me: I think I should be okay with the 
theory behind
the code, and your style isn't cryptic.

 I'm not sure the previous paragraph is clear...

I got the point.

 Imagine you have 10 different parse trees. They could be 
 disributed like this:


 1                          40%
 2                          2
 3                          3
 4                          3
 5                          5
 6                          6
 7                          8
 8                          10
 9                          11
 10                        12

Interesting, indeed.

Anyway, thank you very much for the code. The weekend is coming 
-> I'll play with your implementation and see if there any 
improvements possible.

Oct 03 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

 Anyway, thank you very much for the code. The weekend is coming -> I'll play
 with your implementation and see if there any improvements possible.

Be sure to keep me informed of any enormous mistake I made. I tried
Appender and other concatenation means, without big success.

Btw, I saw on the ML that using byKeys.front() is very slow. Use
keys[0] of somesuch instead.

Oct 03 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Friday, 3 October 2014 at 17:24:43 UTC, Philippe Sigaud via 
Digitalmars-d wrote:

 Be sure to keep me informed of any enormous mistake I made. I 
 tried
 Appender and other concatenation means, without big success.

I am not sure if there are any. Maybe GLL just IS non-practical, 
after all. Right now only one thing is for sure: the generalized 
parsing fruit is not a low-hanging one.

Yeap, I will let you know. This is sort of a personal challenge 
now :)

 Btw, I saw on the ML that using byKeys.front() is very slow. Use
 keys[0] of somesuch instead.

Hm... Funny. Did you investigate into it..?

Oct 03 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via 
Digitalmars-d wrote:
 I did that during this summer, almost to the point it was
 self-sustained (that is, the GLL parser generator was used to 
 generate
 a parser for grammars files). I chose a range interface: 
 input is a
 string, the parse forest is output as a range of parse trees.


 Nice! Is it public? Github?

 No github repo. I could put it alongside Pegged, I suppose. I 
 used git
 internally, though.

  I was a bit disheartened by the speed I got, so did not 
 publish nor
 announced it here.

 Note also that I saw it as an alternative engine for my own 
 Pegged
 project, so I used the same way of defining grammars (some 
 prefixes
 ans suffixes for dropping nodes in the parse tree and so on).

 I can send you the code (well the entire repo), if you wish. I 
 did not
 touch it for the past 3 months, so I already don't remember 
 what state
 it was in :-(.
 Looking at the code now, it seems I'm still using Pegged to 
 parse the
 initial grammar. Bootstrapping did not go well.

 Send me an email at firstname . lastname  gmail.com

 (philippe sigaud)

Check the mailbox,

Thank you

Oct 02 2014

Philippe Sigaud via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, Oct 2, 2014 at 11:19 PM, Vladimir Kazanov via Digitalmars-d
<digitalmars-d puremagic.com> wrote:

 Check the mailbox,

 Thank you

I sent it to you. I was asleep, sorry :-)

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 18:06:04 UTC, Vladimir Kazanov 
wrote:
 Chances are that I will be able to get the original GLL parser 
 generator from one of algorithm authors (Adrian Johnstone). 
 He's really helpful here. From that point, I will only have to 
 add a frontend for generating a concrete parser, starting with 
 Python - it already has a fully working grammar. Hopefully, I 
 will also be able to find a suitable grammar for D, it is 
 always a pleasure to play with the language.

Nice! If you manage to get a GLL generator for D (or C++) going 
I'd love to try it out.

The only general parser generator I've tested so far is the 
Tomita-style GLR parser called DParser, but IIRC the 
documentation could need some improvement:

http://sourceforge.net/projects/dparser/

Oct 02 2014

"Vladimir Kazanov" <vkazanov inbox.ru> writes:

On Thursday, 2 October 2014 at 21:26:15 UTC, Ola Fosheim Grøstad 
wrote:

 Nice! If you manage to get a GLL generator for D (or C++) going 
 I'd love to try it out.

You will definitely hear about it here :-) I don't know yet if 
the parser itself will be worth the trouble, but a more or less 
complete resulting blog post with tests/results/examples will 
definitely see the world.

 The only general parser generator I've tested so far is the 
 Tomita-style GLR parser called DParser, but IIRC the 
 documentation could need some improvement:

 http://sourceforge.net/projects/dparser/

Yes, I know this one. A scannerless GLR parser. GLRs are trendy 
these days :) Easy to find many more, there's even an Emacs mode 
using GLR for Ada parsing.

Oct 02 2014

D Programming

C/C++ Programming

Other

digitalmars.D - D Parsing (again)/ D grammar