www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.d.lexer : voting thread

reply "Dicebot" <public dicebot.lv> writes:
After brief discussion with Brian and gathering data from the 
review thread, I have decided to start voting for `std.d.lexer` 
inclusion into Phobos.

-----------------------------------------------------

All relevant information can be found here: 
http://wiki.dlang.org/Review/std.d.lexer (it includes link to 
post-review change set and some clarifications by Brian)

Review thread is here: 
http://forum.dlang.org/post/jsnhlcbulwyjuqcqoepe forum.dlang.org

-----------------------------------------------------

Instructions for voters

 The goal of the vote is to allow the Review Manager to decided 
 if the
 community agrees on the inclusion of the submission.
 
    Place further discussion of the library in the official 
 review thread.
        If replying to an opinion stated during a vote, copy all 
 relevant
        context and post in the official review thread.
 
    If you would like to see the proposed module included into 
 Phobos
        Vote Yes
    If one condition must be met
        Vote Yes explicitly stating it is under a condition and 
 what condition.
        You may specify an improvement you'd like to see, but be 
 sure to state
        it is not a condition/showstopper.
    Otherwise
        Vote No
        A brief reason should be provided though details on what 
 needs
        improvement should be placed in the official review 
 thread.
(c) wiki.dlang.org/Review/Process ----------------------------------------------------- If you need to ask any last moment questions before making your decision, please do it in last review thread (linked in beginning of this post). Voting will last until the next weekend (Oct 12 23:59 GMT +0) Thanks for your attention.
Oct 02 2013
next sibling parent =?UTF-8?B?IkFuZHLDqSI=?= <andre andre.to> writes:
Yes!
Oct 02 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-02 16:41, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
Yes. Not a condition but I would prefer the default exception being thrown not to be Exception but a subclass. -- /Jacob Carlborg
Oct 02 2013
parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
On Wednesday, 2 October 2013 at 18:41:32 UTC, Jacob Carlborg 
wrote:
 On 2013-10-02 16:41, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review
 thread, I have decided to start voting for `std.d.lexer` 
 inclusion into
 Phobos.
Yes. Not a condition but I would prefer the default exception being thrown not to be Exception but a subclass.
Yes, I agree with Jacob. Btw, you have a "Yes, if" vote here. :)
Oct 04 2013
prev sibling next sibling parent =?UTF-8?B?IsOYaXZpbmQi?= <oivind.loe gmail.com> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.

 -----------------------------------------------------

 All relevant information can be found here: 
 http://wiki.dlang.org/Review/std.d.lexer (it includes link to 
 post-review change set and some clarifications by Brian)

 Review thread is here: 
 http://forum.dlang.org/post/jsnhlcbulwyjuqcqoepe forum.dlang.org

 -----------------------------------------------------

 Instructions for voters

 The goal of the vote is to allow the Review Manager to decided 
 if the
 community agrees on the inclusion of the submission.
 
   Place further discussion of the library in the official 
 review thread.
       If replying to an opinion stated during a vote, copy all 
 relevant
       context and post in the official review thread.
 
   If you would like to see the proposed module included into 
 Phobos
       Vote Yes
   If one condition must be met
       Vote Yes explicitly stating it is under a condition and 
 what condition.
       You may specify an improvement you'd like to see, but be 
 sure to state
       it is not a condition/showstopper.
   Otherwise
       Vote No
       A brief reason should be provided though details on what 
 needs
       improvement should be placed in the official review 
 thread.
(c) wiki.dlang.org/Review/Process ----------------------------------------------------- If you need to ask any last moment questions before making your decision, please do it in last review thread (linked in beginning of this post). Voting will last until the next weekend (Oct 12 23:59 GMT +0) Thanks for your attention.
Yes! :)
Oct 02 2013
prev sibling next sibling parent Justin Whear <justin economicmodeling.com> writes:
Yes.

I see this effort driving great advances in D's tooling ecosystem.
Oct 02 2013
prev sibling next sibling parent "Daniel Kozak" <kozzi11 gmail.com> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.

 -----------------------------------------------------

 All relevant information can be found here: 
 http://wiki.dlang.org/Review/std.d.lexer (it includes link to 
 post-review change set and some clarifications by Brian)

 Review thread is here: 
 http://forum.dlang.org/post/jsnhlcbulwyjuqcqoepe forum.dlang.org

 -----------------------------------------------------

 Instructions for voters

 The goal of the vote is to allow the Review Manager to decided 
 if the
 community agrees on the inclusion of the submission.
 
   Place further discussion of the library in the official 
 review thread.
       If replying to an opinion stated during a vote, copy all 
 relevant
       context and post in the official review thread.
 
   If you would like to see the proposed module included into 
 Phobos
       Vote Yes
   If one condition must be met
       Vote Yes explicitly stating it is under a condition and 
 what condition.
       You may specify an improvement you'd like to see, but be 
 sure to state
       it is not a condition/showstopper.
   Otherwise
       Vote No
       A brief reason should be provided though details on what 
 needs
       improvement should be placed in the official review 
 thread.
(c) wiki.dlang.org/Review/Process ----------------------------------------------------- If you need to ask any last moment questions before making your decision, please do it in last review thread (linked in beginning of this post). Voting will last until the next weekend (Oct 12 23:59 GMT +0) Thanks for your attention.
Yes :)
Oct 02 2013
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
Yes!
Oct 03 2013
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.

 -----------------------------------------------------

 All relevant information can be found here: 
 http://wiki.dlang.org/Review/std.d.lexer (it includes link to 
 post-review change set and some clarifications by Brian)

 Review thread is here: 
 http://forum.dlang.org/post/jsnhlcbulwyjuqcqoepe forum.dlang.org

 -----------------------------------------------------

 Instructions for voters

 The goal of the vote is to allow the Review Manager to decided 
 if the
 community agrees on the inclusion of the submission.
 
   Place further discussion of the library in the official 
 review thread.
       If replying to an opinion stated during a vote, copy all 
 relevant
       context and post in the official review thread.
 
   If you would like to see the proposed module included into 
 Phobos
       Vote Yes
   If one condition must be met
       Vote Yes explicitly stating it is under a condition and 
 what condition.
       You may specify an improvement you'd like to see, but be 
 sure to state
       it is not a condition/showstopper.
   Otherwise
       Vote No
       A brief reason should be provided though details on what 
 needs
       improvement should be placed in the official review 
 thread.
(c) wiki.dlang.org/Review/Process ----------------------------------------------------- If you need to ask any last moment questions before making your decision, please do it in last review thread (linked in beginning of this post). Voting will last until the next weekend (Oct 12 23:59 GMT +0) Thanks for your attention.
Yes.
Oct 03 2013
prev sibling next sibling parent "Namespace" <rswhite4 googlemail.com> writes:
Yes
Oct 03 2013
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
Yes.

( I have not found any rules that prohibit review manager from 
voting :) )
Oct 03 2013
parent "Tove" <tove fransson.se> writes:
On Thursday, 3 October 2013 at 11:04:26 UTC, Dicebot wrote:
 Yes.

 ( I have not found any rules that prohibit review manager from 
 voting :) )
I'd love to say yes, since I've been dreaming of the day when we finally have a lexer... but I decided to put my yes under the condition that it can lex itself using ctfe. My first attempt with adding a "import(__FILE__)" unittest failed with v2.063.2: Error: memcpy cannot be interpreted at compile time, because it has no available source code lexer.d(1966): called from here: move(lex) lexer.d(454): called from here: r.this(lexerSource(range), config) Maybe this is this fixed in HEAD though?
Oct 03 2013
prev sibling next sibling parent reply "nazriel" <spam dzfl.pl> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.

 -----------------------------------------------------
[...]

 Thanks for your attention.
Yes. (Btw, someone got benchmarks of std.d.lexer? I remember that Brain was benchmarking his module quite a lot in order to catch up with DMD's lexer but I can't find links in IRC logs. I wonder if he achieved his goal in this regard)
Oct 03 2013
next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 (Btw, someone got benchmarks of std.d.lexer?
 I remember that Brain was benchmarking his module quite a lot 
 in order to catch up with DMD's lexer but I can't find links in 
 IRC logs. I wonder if he achieved his goal in this regard)
The most recent set of timings that I have can be found here: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png They're a bit old at this point, but not much has changed in the lexer internals. I can try running another set of benchmarks soon. (The hardest part is hacking DMD to just do the lexing) The times on the X-axis are milliseconds.
Oct 03 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/3/13 12:47 PM, Brian Schott wrote:
 On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 (Btw, someone got benchmarks of std.d.lexer?
 I remember that Brain was benchmarking his module quite a lot in order
 to catch up with DMD's lexer but I can't find links in IRC logs. I
 wonder if he achieved his goal in this regard)
The most recent set of timings that I have can be found here: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png They're a bit old at this point, but not much has changed in the lexer internals. I can try running another set of benchmarks soon. (The hardest part is hacking DMD to just do the lexing) The times on the X-axis are milliseconds.
I see we're considerably behind dmd. If improving performance would come at the price of changing the API, it may be sensible to hold off adoption for a bit. Andrei
Oct 03 2013
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu 
wrote:
 On 10/3/13 12:47 PM, Brian Schott wrote:
 On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 (Btw, someone got benchmarks of std.d.lexer?
 I remember that Brain was benchmarking his module quite a lot 
 in order
 to catch up with DMD's lexer but I can't find links in IRC 
 logs. I
 wonder if he achieved his goal in this regard)
The most recent set of timings that I have can be found here: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png They're a bit old at this point, but not much has changed in the lexer internals. I can try running another set of benchmarks soon. (The hardest part is hacking DMD to just do the lexing) The times on the X-axis are milliseconds.
I see we're considerably behind dmd. If improving performance would come at the price of changing the API, it may be sensible to hold off adoption for a bit. Andrei
Considerably? They look very similar to me. dmd is just slightly winning.
Oct 03 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/3/13 1:15 PM, Brad Anderson wrote:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
 On 10/3/13 12:47 PM, Brian Schott wrote:
 On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 (Btw, someone got benchmarks of std.d.lexer?
 I remember that Brain was benchmarking his module quite a lot in order
 to catch up with DMD's lexer but I can't find links in IRC logs. I
 wonder if he achieved his goal in this regard)
The most recent set of timings that I have can be found here: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png They're a bit old at this point, but not much has changed in the lexer internals. I can try running another set of benchmarks soon. (The hardest part is hacking DMD to just do the lexing) The times on the X-axis are milliseconds.
I see we're considerably behind dmd. If improving performance would come at the price of changing the API, it may be sensible to hold off adoption for a bit. Andrei
Considerably? They look very similar to me. dmd is just slightly winning.
To me 10% is considerable. Andrei
Oct 03 2013
parent reply "Dicebot" <public dicebot.lv> writes:
Please express your opinion in a clear "Yes", "No" or "Yes, if" 
form. I can't really interpret discussions into voting results.

Of course, you and Walter also have "veto" votes in addition but 
it needs to be said explicitly.
Oct 03 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/3/13 3:03 PM, Dicebot wrote:
 Please express your opinion in a clear "Yes", "No" or "Yes, if" form. I
 can't really interpret discussions into voting results.

 Of course, you and Walter also have "veto" votes in addition but it
 needs to be said explicitly.
That's why I renamed the thread! I didn't vote. Andrei
Oct 03 2013
parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 3 October 2013 at 22:18:13 UTC, Andrei Alexandrescu 
wrote:
 On 10/3/13 3:03 PM, Dicebot wrote:
 Please express your opinion in a clear "Yes", "No" or "Yes, 
 if" form. I
 can't really interpret discussions into voting results.

 Of course, you and Walter also have "veto" votes in addition 
 but it
 needs to be said explicitly.
That's why I renamed the thread! I didn't vote. Andrei
I mean I will be forced to ignore your opinion in current form when making voting summary and I will feel very uneasy about it :) (damn, that review manager thingy gets much more stressful by the end!)
Oct 03 2013
prev sibling next sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu 
wrote:
 On 10/3/13 12:47 PM, Brian Schott wrote:
 On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 (Btw, someone got benchmarks of std.d.lexer?
 I remember that Brain was benchmarking his module quite a lot 
 in order
 to catch up with DMD's lexer but I can't find links in IRC 
 logs. I
 wonder if he achieved his goal in this regard)
The most recent set of timings that I have can be found here: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png They're a bit old at this point, but not much has changed in the lexer internals. I can try running another set of benchmarks soon. (The hardest part is hacking DMD to just do the lexing) The times on the X-axis are milliseconds.
I see we're considerably behind dmd. If improving performance would come at the price of changing the API, it may be sensible to hold off adoption for a bit. Andrei
Quite frankly, I (or better say many of us) need a COMPLETE D lexer that is UP TO DATE. std.lexer should be, if it is a Phobos module, and that is all that matters. Performance optimizations can come later. So what if it's API will change? We, who use D2 since the very beginning, are used to it! API changes can be done smoothly, with phase-out stages. People would be informed what pieces of the API will become deprecated, and it is their responsibility to fix their code to reflect such changes. All that is needed is little bit of planning...
Oct 04 2013
prev sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu 
wrote:
 I see we're considerably behind dmd. If improving performance 
 would come at the price of changing the API, it may be sensible 
 to hold off adoption for a bit.

 Andrei
The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD. Here's the result: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times5.png I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github. https://github.com/Hackerpilot/lexerbenchmark
Oct 04 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-10-04 13:28, Brian Schott wrote:

 Here's the result:

 https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times5.png


 I suspect that I've made an error in the benchmarking due to how much
 faster std.d.lexer is than DMD now, so I've uploaded what I have to Github.

 https://github.com/Hackerpilot/lexerbenchmark
If these results are correct, me like :) -- /Jacob Carlborg
Oct 04 2013
prev sibling next sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Brian Schott wrote:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
 I see we're considerably behind dmd. If improving performance would
 come at the price of changing the API, it may be sensible to hold off
 adoption for a bit.

 Andrei
The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD. Here's the result: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times5.png I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github. https://github.com/Hackerpilot/lexerbenchmark
Interestingly, DMD is only faster when lexing std.datetime. This is relatively big file, so maybe the slowness is related to small buffering in std.d.lexer?
Oct 04 2013
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
04-Oct-2013 15:28, Brian Schott пишет:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
 I see we're considerably behind dmd. If improving performance would
 come at the price of changing the API, it may be sensible to hold off
 adoption for a bit.

 Andrei
The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD. Here's the result: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times5.png I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github. https://github.com/Hackerpilot/lexerbenchmark
I'm suspicious of: printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100)); Plus I think clock_gettime often has too coarse resolution (I'd use gettimeofday as more reliable). Also check core\time.d TickDuration.currSystemTick as it uses CLOCK_MONOTONIC on *nix. You should do the same to make timings meaningful. -- Dmitry Olshansky
Oct 11 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
 04-Oct-2013 15:28, Brian Schott =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wr=
ote:
 I see we're considerably behind dmd. If improving performance woul=
d
 come at the price of changing the API, it may be sensible to hold =
off
 adoption for a bit.
=20
 Andrei
=20 The old benchmarks measured total program run time. I ran a new set=
of
 benchmarks, placing stopwatch calls around just the lexing code to
 bypass any slowness caused by druntime startup. I also made a simil=
ar
 modification to DMD.
=20
 Here's the result:
=20
 https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/ex=
perimen
 tal/std_lexer/images/times5.png
=20
=20
 I suspect that I've made an error in the benchmarking due to how mu=
ch
 faster std.d.lexer is than DMD now, so I've uploaded what I have to=
 Github.
=20
 https://github.com/Hackerpilot/lexerbenchmark
=20 I'm suspicious of: printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100)); =20 Plus I think clock_gettime often has too coarse resolution (I'd use gettimeofday as more reliable). Also check core\time.d TickDuration.currSystemTick as it uses CLOCK_MONOTONIC on *nix. You should do the same to make timings meani=
ngful. Why not just use use std.datetime's benchmark or StopWatch? Though look= ing at=20 lexerbenchmark.d it looks like he's using StopWatch rather than clock_g= ettime=20 directly, and there are no printfs, so I don't know what code you're re= ferring=20 to here. From the looks of it though, he's basically reimplemented=20 std.datetime.benchmark in benchmarklexer.d and probably should have jus= t used=20 benchmark instead. - Jonathan M Davis
Oct 11 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11-Oct-2013 13:07, Jonathan M Davis пишет:
 On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
 04-Oct-2013 15:28, Brian Schott пишет:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
 I see we're considerably behind dmd. If improving performance would
 come at the price of changing the API, it may be sensible to hold off
 adoption for a bit.

 Andrei
The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD. Here's the result: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimen tal/std_lexer/images/times5.png I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github. https://github.com/Hackerpilot/lexerbenchmark
I'm suspicious of: printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100)); Plus I think clock_gettime often has too coarse resolution (I'd use gettimeofday as more reliable). Also check core\time.d TickDuration.currSystemTick as it uses CLOCK_MONOTONIC on *nix. You should do the same to make timings meaningful.
Why not just use use std.datetime's benchmark or StopWatch? Though looking at lexerbenchmark.d it looks like he's using StopWatch rather than clock_gettime directly, and there are no printfs, so I don't know what code you're referring to here. From the looks of it though, he's basically reimplemented std.datetime.benchmark in benchmarklexer.d and probably should have just used benchmark instead.
Cause it's C++ damn it! ;)
 - Jonathan M Davis
-- Dmitry Olshansky
Oct 11 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, October 11, 2013 13:53:29 Dmitry Olshansky wrote:
 11-Oct-2013 13:07, Jonathan M Davis =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
 04-Oct-2013 15:28, Brian Schott =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu =
wrote:
 I see we're considerably behind dmd. If improving performance wo=
uld
 come at the price of changing the API, it may be sensible to hol=
d off
 adoption for a bit.
=20
 Andrei
=20 The old benchmarks measured total program run time. I ran a new s=
et of
 benchmarks, placing stopwatch calls around just the lexing code t=
o
 bypass any slowness caused by druntime startup. I also made a sim=
ilar
 modification to DMD.
=20
 Here's the result:
=20
 https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/=
experim
 en
 tal/std_lexer/images/times5.png
=20
=20
 I suspect that I've made an error in the benchmarking due to how =
much
 faster std.d.lexer is than DMD now, so I've uploaded what I have =
to
 Github.
=20
 https://github.com/Hackerpilot/lexerbenchmark
=20 I'm suspicious of: printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100)); =20 Plus I think clock_gettime often has too coarse resolution (I'd us=
e
 gettimeofday as more reliable).
 Also check core\time.d  TickDuration.currSystemTick as it uses
 CLOCK_MONOTONIC on *nix. You should do the same to make timings
 meaningful.
=20 Why not just use use std.datetime's benchmark or StopWatch? Though =
looking
 at lexerbenchmark.d it looks like he's using StopWatch rather than
 clock_gettime directly, and there are no printfs, so I don't know w=
hat
 code you're referring to here. From the looks of it though, he's
 basically reimplemented std.datetime.benchmark in benchmarklexer.d =
and
 probably should have just used benchmark instead.
=20 Cause it's C++ damn it! ;)
Your comments would make perfect sense for C++, but lexerbenchmark.d is= in D.=20 And I don't know what else you could be talking about, because that's a= ll I=20 see referenced here. - Jonathan M Davis
Oct 11 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11-Oct-2013 14:58, Jonathan M Davis пишет:
 On Friday, October 11, 2013 13:53:29 Dmitry Olshansky wrote:
 11-Oct-2013 13:07, Jonathan M Davis пишет:
 On Friday, October 11, 2013 12:56:14 Dmitry Olshansky wrote:
 04-Oct-2013 15:28, Brian Schott пишет:
 On Thursday, 3 October 2013 at 20:11:02 UTC, Andrei Alexandrescu wrote:
 I see we're considerably behind dmd. If improving performance would
 come at the price of changing the API, it may be sensible to hold off
 adoption for a bit.

 Andrei
The old benchmarks measured total program run time. I ran a new set of benchmarks, placing stopwatch calls around just the lexing code to bypass any slowness caused by druntime startup. I also made a similar modification to DMD. Here's the result: https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experim en tal/std_lexer/images/times5.png I suspect that I've made an error in the benchmarking due to how much faster std.d.lexer is than DMD now, so I've uploaded what I have to Github. https://github.com/Hackerpilot/lexerbenchmark
I'm suspicious of: printf("%s\t%f\n", srcname, (total / 200.0) / (1000 * 100)); Plus I think clock_gettime often has too coarse resolution (I'd use gettimeofday as more reliable). Also check core\time.d TickDuration.currSystemTick as it uses CLOCK_MONOTONIC on *nix. You should do the same to make timings meaningful.
Why not just use use std.datetime's benchmark or StopWatch? Though looking at lexerbenchmark.d it looks like he's using StopWatch rather than clock_gettime directly, and there are no printfs, so I don't know what code you're referring to here. From the looks of it though, he's basically reimplemented std.datetime.benchmark in benchmarklexer.d and probably should have just used benchmark instead.
Cause it's C++ damn it! ;)
Your comments would make perfect sense for C++, but lexerbenchmark.d is in D. And I don't know what else you could be talking about, because that's all I see referenced here.
I was looking at dmd.diff actually in linked repo. https://github.com/Hackerpilot/lexerbenchmark/blob/master/dmd.diff lexerbenchmark.d uses StopWatch.
 - Jonathan M Davis
-- Dmitry Olshansky
Oct 11 2013
prev sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 3 October 2013 at 19:47:28 UTC, Brian Schott wrote:
 The most recent set of timings that I have can be found here: 
 https://raw.github.com/Hackerpilot/hackerpilot.github.com/master/experimental/std_lexer/images/times4.png
How exactly were these figures obtained? Based on the graphs, I'd guess that you measured execution time of a complete program (as LDC, which has a slightly higher startup overhead in druntime, overtakes GDC for larger inputs). If that's the case, DMD might be at an unfair advantage for this benchmark as it doesn't need to run all the druntime startup code – which is not a lot, but still. And indeed, its advantage seems to shrink for large inputs, although I don't want to imply that this could be the only reason. David
Oct 03 2013
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 3 October 2013 at 19:07:03 UTC, nazriel wrote:
 On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 ...
Please keep "btw"s in separate thread :)
Oct 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 If you need to ask any last moment questions before making your 
 decision, please do it in last review thread (linked in 
 beginning of this post).

 Voting will last until the next weekend (Oct 12 23:59 GMT +0)

 Thanks for your attention.
I sadly have to vote no in the current state. It is really needed to be able to reuse the same pool of identifier across several lexing (otherwize tooling around this lexer won't be able to manage mixins properly unless rolling its own identifier pool on top of the lexer's). This require the interface to change, so can't be introduced in a latter version without major breakage. I'd vote yes if above condition is met or to integrate current module as experimental (not in std).
Oct 03 2013
prev sibling next sibling parent reply Martin Nowak <code dawg.eu> writes:
On 10/02/2013 04:41 PM, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
I also have to vote with no for now. My biggest concern is that the lexer incorporates a string pool, something that isn't strictly part of lexing. IMO this is a major design flaw and possible performance/memory issue. It is buried into the API because byToken takes const(byte)[], i.e. mutable data, but each Token carries a string value, so it always requires a copy. For stream oriented lexing, e.g. token highlighting, no string pool is required at all. Instead the value type of Token should be something like take(input.save, lengthOfToken). Why was the Tok!">>=", Tok!"default" idea turned down. This leaves us with undesirable names like Tok.shiftRightAssign, Tok.default_. There are a few smaller issues that haven't yet been addressed, but of course this can be done during the merge code review. Adding it as experimental module would be a good idea.
Oct 03 2013
next sibling parent Martin Nowak <code dawg.eu> writes:
On 10/04/2013 04:57 AM, Martin Nowak wrote:
 I also have to vote with no for now.
And working in CTFE can't be easily given up either.
Oct 03 2013
prev sibling next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Friday, 4 October 2013 at 02:57:41 UTC, Martin Nowak wrote:
 Adding it as experimental module would be a good idea.
I would be in favor of adding such community-reviewed but not-quite-there-yet libraries to a special category on the DUB registry instead. It would also solve the visibility problem, and apart from the fact that it isn't really clear what being an »experimental« module would entail, having it as a package also allows for faster updates not reliant on the core release schedule. David
Oct 03 2013
parent Robert <jfanatiker gmx.at> writes:
I created https://github.com/phobos-x/phobosx for this, it is also in
the dub registry.=20

It could be used, until something more official is established.

Best regards,

Robert

On Fri, 2013-10-04 at 05:29 +0200, David Nadlinger wrote:
 On Friday, 4 October 2013 at 02:57:41 UTC, Martin Nowak wrote:
 Adding it as experimental module would be a good idea.
=20 I would be in favor of adding such community-reviewed but=20 not-quite-there-yet libraries to a special category on the DUB=20 registry instead. =20 It would also solve the visibility problem, and apart from the=20 fact that it isn't really clear what being an =C2=BBexperimental=C2=AB=
=20
 module would entail, having it as a package also allows for=20
 faster updates not reliant on the core release schedule.
=20
 David
Oct 04 2013
prev sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
 Why was the Tok!">>=", Tok!"default" idea turned down. This 
 leaves us with undesirable names like Tok.shiftRightAssign, 
 Tok.default_.
Martin, that is truly a matter of taste. I, for an instance, do not like Tok!">>=" - too many special characters there for my taste. To me it looks like some part of a weird Perl script.
Oct 04 2013
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.
No. Let's iron out the issues first, both interface and possible performance issues.
Oct 04 2013
prev sibling next sibling parent reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for `std.d.lexer` 
 inclusion into Phobos.
No. I really want to see `std.d.lexer` in Phobos, but have too many conditions. Documentation issues: - please specify the parser algorithm that you used for `std.d.lexer`. As I understand from review thread, you implement `GLR parser` - please document it (correct me if I wrong). Also, add link to the algorithm description, for example to the wikipedia: http://en.wikipedia.org/wiki/GLR_parser It helps to understand how `std.d.lexer` works. Also, please add best-case and worst-case time complexity (for example, from O(n) to O(n^3)), and best-case and worst-case memory complexity. - please add more usage examples. Currently you have only one big example how generate HTML markup of D code. Try to add a simple example for every function. - explicitly specify functions that can throw: add `Throws:` block for it and specify conditions when they can throw. UTF-16/UTF-32 support: - why standart `std.d.lexer` supports only UTF-8, but not a UTF-16/UTF-32? The official lexing specification allows all of them. The conversion from UTF-16/UTF-32 to UTF-8 is not a option due performance issues. If Phobos string functions too slow, please add a bug. If Phobos haven't got necessary functions, please add enhancement request. I think it's serious issue that affects all string utilities (like std.xml or std.json), not only `std.d.lexer`. Exception handling - please use `ParseException` as a default exception, not the `Exception`. Codestyle: - I don't like `TokenType` enum. You can use Tok!">>=" and `static if` to compare the token string to the `TokenType` enum. So, you will not lose performance, because string parsing will be done at compile time. Not a condition, but wishlist: - implement low-level API, not only high-level range-based API. I hope it can help increase performance for applications that really need it. - add ability to use `std.d.lexer` at the compile time.
Oct 04 2013
parent reply "Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:
On Friday, 4 October 2013 at 09:41:49 UTC, ilya-stromberg wrote:
 On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for 
 `std.d.lexer` inclusion into Phobos.
No. I really want to see `std.d.lexer` in Phobos, but have too many conditions. Documentation issues:
clip
 - please add more usage examples. Currently you have only one 
 big example how generate HTML markup of D code. Try to add a 
 simple example for every function.
clip Woah! A simple example for every function? Then it would put the rest of the Phobos documents to shame :o)
Oct 04 2013
next sibling parent reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Friday, 4 October 2013 at 14:30:12 UTC, Craig Dillabaugh wrote:
 On Friday, 4 October 2013 at 09:41:49 UTC, ilya-stromberg wrote:
 On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review thread, I have decided to start voting for 
 `std.d.lexer` inclusion into Phobos.
No. I really want to see `std.d.lexer` in Phobos, but have too many conditions. Documentation issues:
clip
 - please add more usage examples. Currently you have only one 
 big example how generate HTML markup of D code. Try to add a 
 simple example for every function.
clip Woah! A simple example for every function? Then it would put the rest of the Phobos documents to shame :o)
I said: "TRY to add". But yes, I feel that `std.d.lexer` don't have enough documentation.
Oct 04 2013
parent "Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:
On Friday, 4 October 2013 at 16:03:25 UTC, ilya-stromberg wrote:
 On Friday, 4 October 2013 at 14:30:12 UTC, Craig Dillabaugh 
 wrote:
 On Friday, 4 October 2013 at 09:41:49 UTC, ilya-stromberg 
 wrote:
 On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from 
 the review thread, I have decided to start voting for 
 `std.d.lexer` inclusion into Phobos.
No. I really want to see `std.d.lexer` in Phobos, but have too many conditions. Documentation issues:
clip
 - please add more usage examples. Currently you have only one 
 big example how generate HTML markup of D code. Try to add a 
 simple example for every function.
clip Woah! A simple example for every function? Then it would put the rest of the Phobos documents to shame :o)
I said: "TRY to add". But yes, I feel that `std.d.lexer` don't have enough documentation.
I think it was a good idea ... it just sort of jumped out at me as the Phobos documentation tends to be missing lots of examples. Thus the smiley on the end.
Oct 04 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 04, 2013 at 04:30:11PM +0200, Craig Dillabaugh wrote:
 On Friday, 4 October 2013 at 09:41:49 UTC, ilya-stromberg wrote:
[...]
- please add more usage examples. Currently you have only one big
example how generate HTML markup of D code. Try to add a simple
example for every function.
clip Woah! A simple example for every function? Then it would put the rest of the Phobos documents to shame :o)
The rest of Phobos docs *should* be put to shame. Except maybe for a few exceptions here and there, most of Phobos docs are far too scant, and need some serious TLC with many many more code examples. T -- Customer support: the art of getting your clients to pay for your own incompetence.
Oct 04 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/4/13 7:30 AM, Craig Dillabaugh wrote:
 On Friday, 4 October 2013 at 09:41:49 UTC, ilya-stromberg wrote:
 On Wednesday, 2 October 2013 at 14:41:56 UTC, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion
 into Phobos.
No. I really want to see `std.d.lexer` in Phobos, but have too many conditions. Documentation issues:
clip
 - please add more usage examples. Currently you have only one big
 example how generate HTML markup of D code. Try to add a simple
 example for every function.
clip Woah! A simple example for every function? Then it would put the rest of the Phobos documents to shame :o)
I would say matters that are passable for now and easy to improve later without disruption don't necessarily preclude approval. Andrei
Oct 04 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/2/13 7:41 AM, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
Thanks all involved for the work, first of all Brian. I have the proverbial good news and bad news. The only bad news is that I'm voting "no" on this proposal. But there's plenty of good news. 1. I am not attempting to veto this, so just consider it a normal vote when tallying. 2. I do vote for inclusion in the /etc/ package for the time being. 3. The work is good and the code valuable, so even in the case my suggestions (below) will be followed, a virtually all code pulp that gets work done can be reused. Vision ====== I'd been following the related discussions for a while, but I have made up my mind today as I was working on a C++ lexer today. The C++ lexer is for Facebook's internal linter. I'm translating the lexer from C++. Before long I realized two simple things. First, I can't reuse anything from Brian's code (without copying it and doing surgery on it), although it is extremely similar to what I'm doing. Second, I figured that it is almost trivial to implement a simple, generic, and reusable (across languages and tasks) static trie searcher that takes a compile-time array with all tokens and keywords and returns the token at the front of a range with minimum comparisons. Such a trie searcher is not intelligent, but is very composable and extremely fast. It is just smart enough to do maximum munch (e.g. interprets "==" and "foreach" as one token each, not two), but is not smart enough to distinguish an identifier "whileTrue" from the keyword "while" (it claims "while" was found and stops right at the beginning of "True" in the stream). This is for generality so applications can define how identifiers work (e.g. Lisp allows "-" in identifiers but D doesn't etc). The trie finder doesn't do numbers or comments either. No regexen of any kind. The beauty of it all is that all of these more involved bits (many of which are language specific) can be implemented modularly and trivially as a postprocessing step after the trie finder. For example the user specifies "/*" as a token to the trie finder. Whenever a comment starts, the trie finder will find and return it; then the user implements the alternate grammar of multiline comments. To encode the tokens returned by the trie, we must do away with definitions such as enum TokenType : ushort { invalid, assign, ... } These are fine for a tokenizer written in C, but are needless duplication from a D perspective. I think a better approach is: struct TokenType { string symbol; ... } TokenType tok(string s)() { static immutable string interned = s; return TokenType(interned); } Instead of associating token types with small integers, we associate them with string addresses. (For efficiency we may use pointers to zero-terminated strings, but I don't think that's necessary). Token types are interned by design, i.e. to compare two tokens for equality it suffices to compare the strings with "is" (this can be extended to general identifiers, not only statically-known tokens). Then, each token type has a natural representation that doesn't require the user to remember the name of the token. The left shift token is simply tok!"<<" and is application-global. The static trie finder does not even build a trie - it simply generates a bunch of switch statements. The signature I've used is: Tuple!(size_t, size_t, Token) staticTrieFinder(alias TokenTable, R)(R r) { It returns a tuple with (a) whitespace characters before token, (b) newlines before token, and (c) the token itself, returned as tok!"whatever". To use for C++: alias CppTokenTable = TypeTuple!( "~", "(", ")", "[", "]", "{", "}", ";", ",", "?", "<", "<<", "<<=", "<=", ">", ">>", ">>=", "%", "%=", "=", "==", "!", "!=", "^", "^=", "*", "*=", ":", "::", "+", "++", "+=", "&", "&&", "&=", "|", "||", "|=", "-", "--", "-=", "->", "->*", "/", "/=", "//", "/*", "\\", ".", "'", "\"", "and", "and_eq", "asm", "auto", ... ); Then the code uses staticTrieFinder!([CppTokenTable])(range). Of course, it's also possible to define the table itself as an array. I'm exploring right now in search for the most advantageous choices. I think the above would be a true lexer in the D spirit: - exploits D's string templates to essentially define non-alphanumeric symbols that are easy to use and understand, not confined to predefined tables (that enum!) and cheap to compare; - exploits D's code generation abilities to generate really fast code using inlined trie searching; - offers and API that is generic, flexible, and infinitely reusable. If what we need at this point is a conventional lexer for the D language, std.d.lexer is the ticket. But I think it wouldn't be difficult to push our ambitions way beyond that. What say you? Andrei
Oct 04 2013
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei Alexandrescu 
wrote:
 Vision
 ======

 I'd been following the related discussions for a while, but I 
 have made up my mind today as I was working on a C++ lexer 
 today. The C++ lexer is for Facebook's internal linter. I'm 
 translating the lexer from C++.

 Before long I realized two simple things. First, I can't reuse 
 anything from Brian's code (without copying it and doing 
 surgery on it), although it is extremely similar to what I'm 
 doing.

 Second, I figured that it is almost trivial to implement a 
 simple, generic, and reusable (across languages and tasks) 
 static trie searcher that takes a compile-time array with all 
 tokens and keywords and returns the token at the front of a 
 range with minimum comparisons.

 Such a trie searcher is not intelligent, but is very composable 
 and extremely fast. It is just smart enough to do maximum munch 
 (e.g. interprets "==" and "foreach" as one token each, not 
 two), but is not smart enough to distinguish an identifier 
 "whileTrue" from the keyword "while" (it claims "while" was 
 found and stops right at the beginning of "True" in the 
 stream). This is for generality so applications can define how 
 identifiers work (e.g. Lisp allows "-" in identifiers but D 
 doesn't etc). The trie finder doesn't do numbers or comments 
 either. No regexen of any kind.

 The beauty of it all is that all of these more involved bits 
 (many of which are language specific) can be implemented 
 modularly and trivially as a postprocessing step after the trie 
 finder. For example the user specifies "/*" as a token to the 
 trie finder. Whenever a comment starts, the trie finder will 
 find and return it; then the user implements the alternate 
 grammar of multiline comments.
That is more or less how SDC's lexer works. You pass it 2AA : one with string associated with tokens type, and one with string to function's name that return the actual token (for instance to handle /*) and finally one when nothing matches. A giant 3 headed monster mixin is created from these data. That has been really handy so far.
 If what we need at this point is a conventional lexer for the D 
 language, std.d.lexer is the ticket. But I think it wouldn't be 
 difficult to push our ambitions way beyond that. What say you?
Yup, I do agree.
Oct 04 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
 Such a trie searcher is not intelligent, but is very composable and extremely
 fast.
Well, boys, I reckon this is it — benchmark combat toe to toe with the cooders. Now look, boys, I ain't much of a hand at makin' speeches, but I got a pretty fair idea that something doggone important is goin' on around there. And I got a fair idea the kinda personal emotions that some of you fellas may be thinkin'. Heck, I reckon you wouldn't even be human bein's if you didn't have some pretty strong personal feelin's about benchmark combat. I want you to remember one thing, the folks back home is a-countin' on you and by golly, we ain't about to let 'em down. I tell you something else, if this thing turns out to be half as important as I figure it just might be, I'd say that you're all in line for some important promotions and personal citations when this thing's over with. That goes for ever' last one of you regardless of your race, color or your creed. Now let's get this thing on the hump - we got some benchmarkin' to do.
Oct 04 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-05 02:24, Andrei Alexandrescu wrote:

 Thanks all involved for the work, first of all Brian.

 I have the proverbial good news and bad news. The only bad news is that
 I'm voting "no" on this proposal.

 [Snip]
Is this something in the middle of a hand written lexer and a lexer automatically generated? I think we can have both. A hand written lexer, specifically targeted for D that is very fast. Then a more general lexer that can be used for many languages. I have to say I think this is a bit unfair to dump this huge thing in the voting thread. You haven't made a single post in the discussion thread and now you're coming with this big suggestions in the voting thread. -- /Jacob Carlborg
Oct 05 2013
next sibling parent reply Artur Skawina <art.08.09 gmail.com> writes:
On 10/05/13 13:45, Jacob Carlborg wrote:
 I think we can have both. A hand written lexer, specifically targeted for D
that is very fast. Then a more general lexer that can be used for many
languages.
The assumption, that a hand-written lexer will be much faster than a generated one, is wrong. If there's any significant perf difference then it's just a matter of improving the generator. An automatically generated lexer will be much more flexible (the source spec can be reused without a single modification for anything from an intelligent LOC-like counter or a syntax highlighter to a compiler), easier to maintain/review and less buggy. Compare the perf numbers previously posted here for the various lexers with: $ time ./tokenstats stats std/datetime.d Lexed 1589336 bytes, found 461315 tokens, 13770 keywords, 65946 identifiers. Comments: Line: 958 ~40.16 Block: 1 ~16 Nesting: 534 ~441.7 [count avg_len] 0m0.010s user 0m0.001s system 0m0.011s elapsed 99.61% CPU $ time ./tokenstats dump-no-io std/datetime.d 0m0.013s user 0m0.001s system 0m0.014s elapsed 99.78% CPU 'tokenstats' is built from PEG-like spec plus a bit CT magic. The generator supports inline rules written in D too, but the only ones actually written in D are for defining what an identifier is, matching EOLs and handling DelimitedStrings. Initially, performance was not a consideration at all and there's some very low hanging fruit in there; there's still room for improvement. Unfortunately, the language and compiler situation has prevented me from doing any work on this for the last half year or so. The code won't work with any current compiler and needs a lot of cleanups (which I have been planning to do /after/ updating the tooling, which seems very unlikely to be possible now), hence it's not in a releasable state. [1] artur [1] If anyone wants to play with it, use as a reference etc and isn't afraid of running a binary, a linux x86 one can be gotten from http://d-h.st/xtX The only really useful functionality is 'tokenstats dump file.d', which will dump all found tokens with line and columns numbers. It's just a tool i've been using for identifying regressions and benching.
Oct 05 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-05 19:52, Artur Skawina wrote:

 The assumption, that a hand-written lexer will be much faster than a generated
 one, is wrong.
I never said that the generated one would be slow. I only said that the hand written would be fast :) -- /Jacob Carlborg
Oct 06 2013
parent Artur Skawina <art.08.09 gmail.com> writes:
On 10/06/13 10:57, Jacob Carlborg wrote:
 On 2013-10-05 19:52, Artur Skawina wrote:
 
 The assumption, that a hand-written lexer will be much faster than a generated
 one, is wrong.
I never said that the generated one would be slow. I only said that the hand written would be fast :)
I know, but you said that having both is an option -- that would not make sense unless there's a significant advantage. A lexer is really a rather trivial piece of software, there's not much room for improvement over the obvious "fetch-a-character, use-it-to- determine-a-new-state, repeat-until-done, return the found state ( == matched token)" approach. So the core of an efficient hand-written lexer will not be very different from this: http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/mainloop.d That is already ~2kLOC and it's *just* the top-level loop; it does not include handling of nontrivial tokens (matches just keywords, punctuators and identifiers). Could a handwritten lexer be faster? Not by much, and any trick that would help the manually-written one could also be used by the generator. In fact, working on the generator is much easier than dealing with this kind of fragile hand-tuned mess. Imagine changing the lexical grammar a bit, or introducing a new kind of literal. With a more declarative solution this only involves a local change spanning a few lines and is relatively risk-free. Updating a handwritten lexer would involve many more changes, often in several different areas, and lots of opportunities for making mistakes.
 Would it be able to lex Scala and Ruby? Method names in Scala can contain many
symbols that is not usually allowed in other languages. You can have a method
named "==". In Ruby method names are allowed to end with "=", "?" or "!".
Yes, D makes it easy, you can for example simply define a function that determines what is and what isn't an identifier and pass that as an alias or mixin parameter. "Lexing" binary formats would be possible too :^). A complete D lexer can look as simple as this: http://repo.or.cz/w/girtod.git/blob/refs/heads/lexer:/dlanglexer.d which should also give you a good idea of how easy supporting other languages would be. (The "actions" are defined in separate modules, so that the grammars can be reused everywhere). There's a D PEG lexical grammar in there too, btw. I forgot to change the subject previously, sorry; was not trying to attempt or influence the voting. I'm just saying that Andrei's approach goes into the right direction (even if i disagree with the details). And IMHO the time before a useful std-lib-worthy lexer infrastructure materializes is measured in months, if not years. So if I was voting I'd probably say "yes" - because waiting for a better, but non-existent alternative is not going to help anybody. The hard part of the required work isn't coding - it's the design. If a better solution appears later, it should be able to /replace/ the hand-written one. And in the mean time, the experience from using the less-generic lexer can only help any "new" design. artur
Oct 07 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdan.org> writes:
Jacob Carlborg <doob me.com> wrote:
 On 2013-10-05 02:24, Andrei Alexandrescu wrote:
 
 Thanks all involved for the work, first of all Brian.
 
 I have the proverbial good news and bad news. The only bad news is that
 I'm voting "no" on this proposal.
 
 [Snip]
Is this something in the middle of a hand written lexer and a lexer automatically generated?
I don't understand this question.
 I think we can have both. A hand written lexer, specifically targeted for
 D that is very fast. Then a more general lexer that can be used for many
languages.
I agree with Artur that this is a fallacy.
 I have to say I think this is a bit unfair to dump this huge thing in the
 voting thread. You haven't made a single post in the discussion thread
 and now you're coming with this big suggestions in the voting thread.
The way I see it it's unfair of you to claim that. All I did was to vote and to explain that vote. I was very explicit I don't want to pull rank or anything. Besides it was an idea and such things are hard to time. I think std.d.lexer is a fine product that works as advertised. But I also believe very strongly that it doesn't exploit D's advantages and that adopting it would lock us into a suboptimal API. I have strengthened this opinion only since yesterday morning. Andrei
Oct 05 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-05 20:45, Andrei Alexandrescu wrote:

 I don't understand this question.

 I think we can have both. A hand written lexer, specifically targeted for
 D that is very fast. Then a more general lexer that can be used for many
languages.
I agree with Artur that this is a fallacy.
I never said that the generated one would be slow. I only said that the hand written would be fast :)
 I have to say I think this is a bit unfair to dump this huge thing in the
 voting thread. You haven't made a single post in the discussion thread
 and now you're coming with this big suggestions in the voting thread.
The way I see it it's unfair of you to claim that. All I did was to vote and to explain that vote. I was very explicit I don't want to pull rank or anything. Besides it was an idea and such things are hard to time. I think std.d.lexer is a fine product that works as advertised. But I also believe very strongly that it doesn't exploit D's advantages and that adopting it would lock us into a suboptimal API. I have strengthened this opinion only since yesterday morning.
I just think that if you were not completely satisfied with the current API or implementation you could have said so in the discussion thread. It would have at least given Brian a chance to do something about it, before the voting began. -- /Jacob Carlborg
Oct 06 2013
next sibling parent reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Sunday, 6 October 2013 at 08:59:57 UTC, Jacob Carlborg wrote:
 I just think that if you were not completely satisfied with the 
 current API or implementation you could have said so in the 
 discussion thread. It would have at least given Brian a chance 
 to do something about it, before the voting began.
Maybe we went to the voting too fast, and somebody had not enough time to read documentation and write a opinion? Maybe we should wait at least 1-2 weeks from last review before start a new voting? Maybe we should announce upcoming voting for one week prior to start a new voting thread? I belive that it pays additional attention to the new module and helps avoid situations like this.
Oct 06 2013
parent reply "Dicebot" <public dicebot.lv> writes:
On Sunday, 6 October 2013 at 09:37:18 UTC, ilya-stromberg wrote:
 Maybe we should wait at least 1-2 weeks from last review before 
 start a new voting? Maybe we should announce upcoming voting 
 for one week prior to start a new voting thread? I belive that 
 it pays additional attention to the new module and helps avoid 
 situations like this.
There were more than 1 week of time between last comment in review thread and start of voting. If you needed more time for review, you should have mentioned it. In current situation I simply have waited until Brian makes post-review changes he personally wanted and moved forward as it was pretty clear no further input is incoming. Any formal review may potentially result in short voting after if no critical issues are found so I don't think it makes sense in making any additional announcements. There are no special points of attention - if review was declared and you want to make some input, it should be done right there. Of course, review process is as much community-defined as anything else here. You can always define an alternative one and propose it for discussion. Right now though I am sticking to one mentioned in wiki + some of personal common sense for undefined parts (because I am lazy :)). Also you can lend a helping hand and manage next review on your own in a way you find reasonable :P
Oct 06 2013
parent reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Sunday, 6 October 2013 at 18:54:55 UTC, Dicebot wrote:
 Any formal review may potentially result in short voting after 
 if no critical issues are found so I don't think it makes sense 
 in making any additional announcements. There are no special 
 points of attention - if review was declared and you want to 
 make some input, it should be done right there.
Yes, but people are lazy. I don't talk about all of us, but most of people are lazy. Somebody of us will vote because it's interesting, but will not read/write review tread because it requests a time. So, additional announce of upcoming voting can help: "Guys, if you want to vote, it's time to read documentation and write your really cool idea before voting".
Oct 07 2013
parent reply "simendsjo" <simendsjo gmail.com> writes:
On Monday, 7 October 2013 at 13:29:30 UTC, ilya-stromberg wrote:
 On Sunday, 6 October 2013 at 18:54:55 UTC, Dicebot wrote:
 Any formal review may potentially result in short voting after 
 if no critical issues are found so I don't think it makes 
 sense in making any additional announcements. There are no 
 special points of attention - if review was declared and you 
 want to make some input, it should be done right there.
Yes, but people are lazy. I don't talk about all of us, but most of people are lazy. Somebody of us will vote because it's interesting, but will not read/write review tread because it requests a time. So, additional announce of upcoming voting can help: "Guys, if you want to vote, it's time to read documentation and write your really cool idea before voting".
This is the reason I've not cast any votes for standard modules - I haven't had the time, or don't have the competence, to cast a valid vote. It would be like voting for a political party without knowing where all parties stands in all cases.
Oct 07 2013
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, October 07, 2013 17:47:27 simendsjo wrote:
 On Monday, 7 October 2013 at 13:29:30 UTC, ilya-stromberg wrote:
 On Sunday, 6 October 2013 at 18:54:55 UTC, Dicebot wrote:
 Any formal review may potentially result in short voting after
 if no critical issues are found so I don't think it makes
 sense in making any additional announcements. There are no
 special points of attention - if review was declared and you
 want to make some input, it should be done right there.
Yes, but people are lazy. I don't talk about all of us, but most of people are lazy. Somebody of us will vote because it's interesting, but will not read/write review tread because it requests a time. So, additional announce of upcoming voting can help: "Guys, if you want to vote, it's time to read documentation and write your really cool idea before voting".
This is the reason I've not cast any votes for standard modules - I haven't had the time, or don't have the competence, to cast a valid vote. It would be like voting for a political party without knowing where all parties stands in all cases.
So, it would be like your typical political vote then. ;) - Jonathan m Davis
Oct 07 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/13 1:59 AM, Jacob Carlborg wrote:
 I think std.d.lexer is a fine product that works as advertised. But I
 also
 believe very strongly that it doesn't exploit D's advantages and that
 adopting it would lock us into a suboptimal API. I have strengthened this
 opinion only since yesterday morning.
I just think that if you were not completely satisfied with the current API or implementation you could have said so in the discussion thread. It would have at least given Brian a chance to do something about it, before the voting began.
I've always thought we must invest effort into generic lexers and parsers as opposed to ones for dedicated languages, and I have said so several times, most strongly in http://forum.dlang.org/thread/jii1gk$76s$1 digitalmars.com. When discussion and voting had started, I had acquiesced to not interfere because I thought I shouldn't discuss a working design against a hypothetical one. *That* would have been unfair. But now that such a design exists, I think it's fair to bring it up. Andrei
Oct 06 2013
prev sibling parent "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Saturday, 5 October 2013 at 11:45:47 UTC, Jacob Carlborg wrote:
 On 2013-10-05 02:24, Andrei Alexandrescu wrote:

 Thanks all involved for the work, first of all Brian.

 I have the proverbial good news and bad news. The only bad 
 news is that
 I'm voting "no" on this proposal.

 [Snip]
Is this something in the middle of a hand written lexer and a lexer automatically generated? I think we can have both. A hand written lexer, specifically targeted for D that is very fast. Then a more general lexer that can be used for many languages. I have to say I think this is a bit unfair to dump this huge thing in the voting thread. You haven't made a single post in the discussion thread and now you're coming with this big suggestions in the voting thread.
I asked the same question about support any grammar, not only D grammar, but Brian did not respond: http://forum.dlang.org/post/itlyubosepuqcchhuwdh forum.dlang.org
Oct 05 2013
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-05 02:24, Andrei Alexandrescu wrote:

 Such a trie searcher is not intelligent, but is very composable and
 extremely fast. It is just smart enough to do maximum munch (e.g.
 interprets "==" and "foreach" as one token each, not two), but is not
 smart enough to distinguish an identifier "whileTrue" from the keyword
 "while" (it claims "while" was found and stops right at the beginning of
 "True" in the stream). This is for generality so applications can define
 how identifiers work (e.g. Lisp allows "-" in identifiers but D doesn't
 etc). The trie finder doesn't do numbers or comments either. No regexen
 of any kind.
Would it be able to lex Scala and Ruby? Method names in Scala can contain many symbols that is not usually allowed in other languages. You can have a method named "==". In Ruby method names are allowed to end with "=", "?" or "!". -- /Jacob Carlborg
Oct 06 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/13 2:10 AM, Jacob Carlborg wrote:
 On 2013-10-05 02:24, Andrei Alexandrescu wrote:

 Such a trie searcher is not intelligent, but is very composable and
 extremely fast. It is just smart enough to do maximum munch (e.g.
 interprets "==" and "foreach" as one token each, not two), but is not
 smart enough to distinguish an identifier "whileTrue" from the keyword
 "while" (it claims "while" was found and stops right at the beginning of
 "True" in the stream). This is for generality so applications can define
 how identifiers work (e.g. Lisp allows "-" in identifiers but D doesn't
 etc). The trie finder doesn't do numbers or comments either. No regexen
 of any kind.
Would it be able to lex Scala and Ruby? Method names in Scala can contain many symbols that is not usually allowed in other languages. You can have a method named "==". In Ruby method names are allowed to end with "=", "?" or "!".
Yes, easily. Have the trie matcher stop upon whatever symbol it detects and then handle the tail with Ruby-specific code. Andrei
Oct 06 2013
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 05.10.2013 02:24, schrieb Andrei Alexandrescu:
 Instead of associating token types with small integers, we associate
 them with string addresses. (For efficiency we may use pointers to
 zero-terminated strings, but I don't think that's necessary).
would it be also more efficent to generate a big string out of the token list containing all tokes concatenated and use a generated string-slice for the associated string accesses? imutable string generated_flat_token_stream = "...publicprivateclass..." "public" = generated_flat_token_stream[3..9] or would that kill caching on todays machines?
Oct 06 2013
prev sibling next sibling parent reply "Joseph Rushton Wakeling" <joseph.wakeling webdrake.net> writes:
On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei Alexandrescu 
wrote:
 On 10/2/13 7:41 AM, Dicebot wrote:
 After brief discussion with Brian and gathering data from the 
 review
 thread, I have decided to start voting for `std.d.lexer` 
 inclusion into
 Phobos.
Thanks all involved for the work, first of all Brian. I have the proverbial good news and bad news. The only bad news is that I'm voting "no" on this proposal. But there's plenty of good news. 1. I am not attempting to veto this, so just consider it a normal vote when tallying. 2. I do vote for inclusion in the /etc/ package for the time being. 3. The work is good and the code valuable, so even in the case my suggestions (below) will be followed, a virtually all code pulp that gets work done can be reused. Vision ====== I'd been following the related discussions for a while, but I have made up my mind today as I was working on a C++ lexer today. The C++ lexer is for Facebook's internal linter. I'm translating the lexer from C++. Before long I realized two simple things. First, I can't reuse anything from Brian's code (without copying it and doing surgery on it), although it is extremely similar to what I'm doing. Second, I figured that it is almost trivial to implement a simple, generic, and reusable (across languages and tasks) static trie searcher that takes a compile-time array with all tokens and keywords and returns the token at the front of a range with minimum comparisons. Such a trie searcher is not intelligent, but is very composable and extremely fast. It is just smart enough to do maximum munch (e.g. interprets "==" and "foreach" as one token each, not two), but is not smart enough to distinguish an identifier "whileTrue" from the keyword "while" (it claims "while" was found and stops right at the beginning of "True" in the stream). This is for generality so applications can define how identifiers work (e.g. Lisp allows "-" in identifiers but D doesn't etc). The trie finder doesn't do numbers or comments either. No regexen of any kind. The beauty of it all is that all of these more involved bits (many of which are language specific) can be implemented modularly and trivially as a postprocessing step after the trie finder. For example the user specifies "/*" as a token to the trie finder. Whenever a comment starts, the trie finder will find and return it; then the user implements the alternate grammar of multiline comments. To encode the tokens returned by the trie, we must do away with definitions such as enum TokenType : ushort { invalid, assign, ... } These are fine for a tokenizer written in C, but are needless duplication from a D perspective. I think a better approach is: struct TokenType { string symbol; ... } TokenType tok(string s)() { static immutable string interned = s; return TokenType(interned); } Instead of associating token types with small integers, we associate them with string addresses. (For efficiency we may use pointers to zero-terminated strings, but I don't think that's necessary). Token types are interned by design, i.e. to compare two tokens for equality it suffices to compare the strings with "is" (this can be extended to general identifiers, not only statically-known tokens). Then, each token type has a natural representation that doesn't require the user to remember the name of the token. The left shift token is simply tok!"<<" and is application-global. The static trie finder does not even build a trie - it simply generates a bunch of switch statements. The signature I've used is: Tuple!(size_t, size_t, Token) staticTrieFinder(alias TokenTable, R)(R r) { It returns a tuple with (a) whitespace characters before token, (b) newlines before token, and (c) the token itself, returned as tok!"whatever". To use for C++: alias CppTokenTable = TypeTuple!( "~", "(", ")", "[", "]", "{", "}", ";", ",", "?", "<", "<<", "<<=", "<=", ">", ">>", ">>=", "%", "%=", "=", "==", "!", "!=", "^", "^=", "*", "*=", ":", "::", "+", "++", "+=", "&", "&&", "&=", "|", "||", "|=", "-", "--", "-=", "->", "->*", "/", "/=", "//", "/*", "\\", ".", "'", "\"", "and", "and_eq", "asm", "auto", ... ); Then the code uses staticTrieFinder!([CppTokenTable])(range). Of course, it's also possible to define the table itself as an array. I'm exploring right now in search for the most advantageous choices. I think the above would be a true lexer in the D spirit: - exploits D's string templates to essentially define non-alphanumeric symbols that are easy to use and understand, not confined to predefined tables (that enum!) and cheap to compare; - exploits D's code generation abilities to generate really fast code using inlined trie searching; - offers and API that is generic, flexible, and infinitely reusable. If what we need at this point is a conventional lexer for the D language, std.d.lexer is the ticket. But I think it wouldn't be difficult to push our ambitions way beyond that. What say you?
How quickly do you think this vision could be realized? If soon, I'd say it's worth delaying a decision on the current proposed lexer, if not ... well, jam tomorrow, perfect is the enemy of good, and all that ...
Oct 06 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/13 5:40 AM, Joseph Rushton Wakeling wrote:
 How quickly do you think this vision could be realized? If soon, I'd say
 it's worth delaying a decision on the current proposed lexer, if not ...
 well, jam tomorrow, perfect is the enemy of good, and all that ...
I'm working on related code, and got all the way there in one day (Friday) with a C++ tokenizer for linting purposes (doesn't open #includes or expand #defines etc; it wasn't meant to). The core generated fragment that does the matching is at https://dpaste.de/GZY3. The surrounding switch statement (also in library code) handles whitespace and line counting. The client code needs to handle by hand things like parsing numbers (note how the matcher stops upon the first digit), identifiers, comments (matcher stops upon detecting "//" or "/*") etc. Such things can be achieved with hand-written code (as I do), other similar tokenizers, DFAs, etc. The point is that the core loop that looks at every character looking for a lexeme is fast. Andrei
Oct 06 2013
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 06/10/13 18:07, Andrei Alexandrescu wrote:
 I'm working on related code, and got all the way there in one day (Friday) with
 a C++ tokenizer for linting purposes (doesn't open #includes or expand #defines
 etc; it wasn't meant to).

 The core generated fragment that does the matching is at
https://dpaste.de/GZY3.

 The surrounding switch statement (also in library code) handles whitespace and
 line counting. The client code needs to handle by hand things like parsing
 numbers (note how the matcher stops upon the first digit), identifiers,
comments
 (matcher stops upon detecting "//" or "/*") etc. Such things can be achieved
 with hand-written code (as I do), other similar tokenizers, DFAs, etc. The
point
 is that the core loop that looks at every character looking for a lexeme is
fast.
What I'm getting at is that I'd be prepared to give a vote "no to std, yes to etc" for Brian's d.lexer, _if_ I was reasonably certain that we'd see an alternative lexer module submitted to Phobos within the next month :-)
Oct 06 2013
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
06-Oct-2013 20:07, Andrei Alexandrescu пишет:
 On 10/6/13 5:40 AM, Joseph Rushton Wakeling wrote:
 How quickly do you think this vision could be realized? If soon, I'd say
 it's worth delaying a decision on the current proposed lexer, if not ...
 well, jam tomorrow, perfect is the enemy of good, and all that ...
I'm working on related code, and got all the way there in one day (Friday) with a C++ tokenizer for linting purposes (doesn't open #includes or expand #defines etc; it wasn't meant to). The core generated fragment that does the matching is at https://dpaste.de/GZY3. The surrounding switch statement (also in library code) handles whitespace and line counting. The client code needs to handle by hand things like parsing numbers (note how the matcher stops upon the first digit), identifiers, comments (matcher stops upon detecting "//" or "/*") etc. Such things can be achieved with hand-written code (as I do), other similar tokenizers, DFAs, etc. The point is that the core loop that looks at every character looking for a lexeme is fast.
This is something I agree with. I'd call that loop the "dispatcher loop" in a sense that it detects the kind of stuff and forwards to a special hot loop for that case (if any, e.g. skipping comments). BTW it absolutely must be able to do so in one step, the generated code already knows that the token is tok!"//" hence it may call proper handler right there. case '/': ... switch(s[1]){ ... case '/': // it's a pseudo token anyway so instead of //t = tok!"//"; // just _handle_ it! t = hookFor!"//"(); //user hook for pseudo-token // eats whitespace & returns tok!"comment" or some such // if need be break token_scan; } This also helps to get not only "raw" tokens but allow user to cook extra tokens by hand for special cases that can't be handled by "dispatcher loop".
 Andrei
-- Dmitry Olshansky
Oct 11 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/11/13 2:17 AM, Dmitry Olshansky wrote:
 06-Oct-2013 20:07, Andrei Alexandrescu пишет:
 On 10/6/13 5:40 AM, Joseph Rushton Wakeling wrote:
 How quickly do you think this vision could be realized? If soon, I'd say
 it's worth delaying a decision on the current proposed lexer, if not ...
 well, jam tomorrow, perfect is the enemy of good, and all that ...
I'm working on related code, and got all the way there in one day (Friday) with a C++ tokenizer for linting purposes (doesn't open #includes or expand #defines etc; it wasn't meant to). The core generated fragment that does the matching is at https://dpaste.de/GZY3. The surrounding switch statement (also in library code) handles whitespace and line counting. The client code needs to handle by hand things like parsing numbers (note how the matcher stops upon the first digit), identifiers, comments (matcher stops upon detecting "//" or "/*") etc. Such things can be achieved with hand-written code (as I do), other similar tokenizers, DFAs, etc. The point is that the core loop that looks at every character looking for a lexeme is fast.
This is something I agree with. I'd call that loop the "dispatcher loop" in a sense that it detects the kind of stuff and forwards to a special hot loop for that case (if any, e.g. skipping comments). BTW it absolutely must be able to do so in one step, the generated code already knows that the token is tok!"//" hence it may call proper handler right there. case '/': ... switch(s[1]){ ... case '/': // it's a pseudo token anyway so instead of //t = tok!"//"; // just _handle_ it! t = hookFor!"//"(); //user hook for pseudo-token // eats whitespace & returns tok!"comment" or some such // if need be break token_scan; } This also helps to get not only "raw" tokens but allow user to cook extra tokens by hand for special cases that can't be handled by "dispatcher loop".
That's a good idea. The only concerns I have are: * I'm biased toward patterns for laying efficient code, having hacked into such for the past year. Even discounting for that, I have the feeling that speed is near the top of the list of people who evaluate lexer generators. I fear that too much inline code present inside a fairly large switch statement may hurt efficiency, which is why I'm biased in favor of "small core loop dispatching upon the first few characters, out-of-line code for handling particular cases that need attention". * I've grown to be a big fan of the simplicity of the generator. Yes, that also means bare on features but it's simple enough to be used casually for the simplest tasks that people wouldn't normally think of using a lexer for. If we add hookFor, it would be great if it didn't impact simplicity a lot. Andrei
Oct 11 2013
prev sibling next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei Alexandrescu 
wrote:
 2. I do vote for inclusion in the /etc/ package for the time 
 being.
What is your vision for the future of etc.*, assuming that we are also going to promote DUB (or another package manager) to "official" status soon as well? Personally, I always found etc.* to be on some strange middle ground between official and non-official – Can I expect these modules to stay around for a longer amount of time? Keep API compatibility according to Phobos policies? The fact that e.g. the libcurl C API modules are also in there makes it seem like a grab-bag of random stuff we didn't quite want to put anywhere else, at least to me. The docs aren't really helpful either: »Modules in etc are not standard D modules. They are here because they are experimental, or for some other reason are not quite suitable for std, although they are still useful.« David
Oct 06 2013
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 06/10/13 18:57, David Nadlinger wrote:
 The docs aren't really helpful either: »Modules in etc are not standard D
 modules. They are here because they are experimental, or for some other reason
 are not quite suitable for std, although they are still useful.«
I actually realized I had no idea about what etc was until the last couple of days, and then I thought -- isn't this really what has just been discussed under the proposed name of stdx? ... and if so, why isn't it being used?
Oct 06 2013
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Sunday, 6 October 2013 at 17:08:25 UTC, Joseph Rushton 
Wakeling wrote:
 isn't this really what has just been discussed under the 
 proposed name of stdx?

 ... and if so, why isn't it being used?
This is exactly why I'm not too thrilled to make another attempt at establishing something like that. ;) David
Oct 06 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/13 10:10 AM, David Nadlinger wrote:
 On Sunday, 6 October 2013 at 17:08:25 UTC, Joseph Rushton Wakeling wrote:
 isn't this really what has just been discussed under the proposed name
 of stdx?

 ... and if so, why isn't it being used?
This is exactly why I'm not too thrilled to make another attempt at establishing something like that. ;)
We could improve things on our end by featuring etc documentation more prominently etc. I don't think there's a need to reboot things with stdx. Just improve etc. Andrei
Oct 06 2013
parent reply Brad Roberts <braddr puremagic.com> writes:
On 10/6/13 1:41 PM, Andrei Alexandrescu wrote:
 On 10/6/13 10:10 AM, David Nadlinger wrote:
 On Sunday, 6 October 2013 at 17:08:25 UTC, Joseph Rushton Wakeling wrote:
 isn't this really what has just been discussed under the proposed name
 of stdx?

 ... and if so, why isn't it being used?
This is exactly why I'm not too thrilled to make another attempt at establishing something like that. ;)
We could improve things on our end by featuring etc documentation more prominently etc. I don't think there's a need to reboot things with stdx. Just improve etc. Andrei
I'm largely staying out of this conversation, but there's one area that I think is pretty important, speed of development. By having a less official, more readily committable to, repository it stands to reason that it'll evolve faster and fluidly than the phobos code base docs or should. Some of it is just that phobos pull requests lanquish too long, but that's not ALL it is. The bar should be different, not that phobos' bar should be lower. My 2 cents, Brad
Oct 06 2013
parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 6 October 2013 at 21:32:25 UTC, Brad Roberts wrote:
 On 10/6/13 1:41 PM, Andrei Alexandrescu wrote:
 On 10/6/13 10:10 AM, David Nadlinger wrote:
 On Sunday, 6 October 2013 at 17:08:25 UTC, Joseph Rushton 
 Wakeling wrote:
 isn't this really what has just been discussed under the 
 proposed name
 of stdx?

 ... and if so, why isn't it being used?
This is exactly why I'm not too thrilled to make another attempt at establishing something like that. ;)
We could improve things on our end by featuring etc documentation more prominently etc. I don't think there's a need to reboot things with stdx. Just improve etc. Andrei
I'm largely staying out of this conversation, but there's one area that I think is pretty important, speed of development. By having a less official, more readily committable to, repository it stands to reason that it'll evolve faster and fluidly than the phobos code base docs or should. Some of it is just that phobos pull requests lanquish too long, but that's not ALL it is. The bar should be different, not that phobos' bar should be lower. My 2 cents, Brad
This. The very point of such category is to provide more flexible and still officially approved source for not-yet-there modules. Whatever the reason is that prevents it from straightforward inclusion, it is likely to be reason for plenty of commits to the module. Limiting its polishing to Phobos release model hinders core rationale for having such semi-official module list - ability of module author to polish it in his own tempo using more extensive field test results. I actually kind of think "etc." should be deprecated and eventually removed from Phobos at all. For C bindings we now have Deimos, for experimental packages it simply does not work that good.
Oct 07 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/6/13 9:57 AM, David Nadlinger wrote:
 On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei Alexandrescu wrote:
 2. I do vote for inclusion in the /etc/ package for the time being.
What is your vision for the future of etc.*, assuming that we are also going to promote DUB (or another package manager) to "official" status soon as well?
I think /etc/ should be a stepping stone to std, just like in C++ boost is for std (and boost's sandbox is for boost). Andrei
Oct 06 2013
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-10-06 22:40, Andrei Alexandrescu wrote:

 I think /etc/ should be a stepping stone to std, just like in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed. -- /Jacob Carlborg
Oct 06 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
 On 2013-10-06 22:40, Andrei Alexandrescu wrote:
 I think /etc/ should be a stepping stone to std, just like in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed.
That's what I thought that it was for. I don't remember etc ever really being discussed before, and all it has are C bindings, so the idea that it would hold anything other than C bindings is news to me, though I think that we should probably shy away from putting C bindings in Phobos in general. - Jonathan M Davi
Oct 07 2013
parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
 On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
 On 2013-10-06 22:40, Andrei Alexandrescu wrote:
 I think /etc/ should be a stepping stone to std, just like 
 in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed.
That's what I thought that it was for. I don't remember etc ever really being discussed before, and all it has are C bindings, so the idea that it would hold anything other than C bindings is news to me, though I think that we should probably shy away from putting C bindings in Phobos in general. - Jonathan M Davi
The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons. I don't see any need to remove etc.
Oct 12 2013
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
 On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
 On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
 On 2013-10-06 22:40, Andrei Alexandrescu wrote:
 I think /etc/ should be a stepping stone to std, just like
 in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed.
That's what I thought that it was for. I don't remember etc ever really being discussed before, and all it has are C bindings, so the idea that it would hold anything other than C bindings is news to me, though I think that we should probably shy away from putting C bindings in Phobos in general. - Jonathan M Davi
The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons. I don't see any need to remove etc.
Deimos is for C bindings, not Phobos. We don't want any more modules in std built on top of C bindings for libraries that aren't guaranteed to be on all of the systems that we support. Having std.net.curl has been very problematic due to the problems with getting a proper version of libcurl to link against in Windows, and there has even been some discussion of removing it entirely. So, there will be no more Phobos modules built on anything like curl or openssl or gcrypt or any other C library which isn't guaranteed to be on all systems. That being the case, there's no point in putting C bindings in Phobos. Deimos was created specifically so that there wolud be a place to get bindings to C libraries. We may want to make some adjustments to how Deimos is handled, but it's our solution to C bindings, not Phobos: https://github.com/D-Programming-Deimos druntime should have C bindings for the OSes that we support, but that's the only C bindings that should be in D's standard libraries. Whether we'll remove any that we have is still up for debate, but we're not adding any more. - Jonathan M Davis
Oct 12 2013
next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
Am 13.10.2013 01:11, schrieb Jonathan M Davis:
 On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
 On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
 On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
 On 2013-10-06 22:40, Andrei Alexandrescu wrote:
 I think /etc/ should be a stepping stone to std, just like
 in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed.
That's what I thought that it was for. I don't remember etc ever really being discussed before, and all it has are C bindings, so the idea that it would hold anything other than C bindings is news to me, though I think that we should probably shy away from putting C bindings in Phobos in general. - Jonathan M Davi
The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons. I don't see any need to remove etc.
Deimos is for C bindings, not Phobos. We don't want any more modules in std built on top of C bindings for libraries that aren't guaranteed to be on all of the systems that we support. Having std.net.curl has been very problematic due to the problems with getting a proper version of libcurl to link against in Windows, and there has even been some discussion of removing it entirely. So, there will be no more Phobos modules built on anything like curl or openssl or gcrypt or any other C library which isn't guaranteed to be on all systems. That being the case, there's no point in putting C bindings in Phobos. Deimos was created specifically so that there wolud be a place to get bindings to C libraries. We may want to make some adjustments to how Deimos is handled, but it's our solution to C bindings, not Phobos: https://github.com/D-Programming-Deimos druntime should have C bindings for the OSes that we support, but that's the only C bindings that should be in D's standard libraries. Whether we'll remove any that we have is still up for debate, but we're not adding any more. - Jonathan M Davis
+1 for removing std.net.curl. -- Paulo
Oct 13 2013
prev sibling parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Saturday, 12 October 2013 at 23:12:03 UTC, Jonathan M Davis 
wrote:
 - Jonathan M Davis
OK, for libraries that are not well supported on all platforms, that makes sense.
Oct 13 2013
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, October 13, 2013 19:09:36 SomeDude wrote:
 On Saturday, 12 October 2013 at 23:12:03 UTC, Jonathan M Davis
 
 wrote:
 - Jonathan M Davis
OK, for libraries that are not well supported on all platforms, that makes sense.
Yeah, and because Windows supports basically nothing out of the box except its own OS libraries (e.g. Win32 or WinRT), that means not supporting anything other than C bindings to the OS functions and leaving all other C bindings to something outside of the standard library like Deimos. But if we promote deimos properly (and dub will probably help with this), we should be able to make people aware of where they can find bindings to C libraries and make it less likely that people will reinvent the wheel in D if it's not actually worth doing (though in some cases it is worth doing - e.g. thanks to slicing, well-written D parsing libraries are likely to beat any C/C++ parsing libraries that operate on null-terminated strings). - Jonathan M Davis
Oct 13 2013
prev sibling parent Jordi Sayol <g.sayol yahoo.es> writes:
On 13/10/13 01:11, Jonathan M Davis wrote:
 On Saturday, October 12, 2013 11:09:21 SomeDude wrote:
 On Monday, 7 October 2013 at 07:12:13 UTC, Jonathan M Davis wrote:
 On Monday, October 07, 2013 08:36:16 Jacob Carlborg wrote:
 On 2013-10-06 22:40, Andrei Alexandrescu wrote:
 I think /etc/ should be a stepping stone to std, just like
 in C++ boost
 is for std (and boost's sandbox is for boost).
Currently "etc" seems like where C bindings are placed.
That's what I thought that it was for. I don't remember etc ever really being discussed before, and all it has are C bindings, so the idea that it would hold anything other than C bindings is news to me, though I think that we should probably shy away from putting C bindings in Phobos in general. - Jonathan M Davi
The problem is, if these C bindings are removed, the immediate reflex will be to think that Phobos doesn't have the features that were fulfilled by these bindings. So the impulse will be to reinvent the wheel, when these bindings are perfectly okay and do the job well. C bindings is a way to save us time and build upon proven quality libraries. I don't see any problem with C bindings being in the standard library, as long as they are really useful and high quality. The "not invented here" itch is a bad one. The workforce of the community should be directed at real problems and filling real gaps, rather than being wasted at reinventing the wheel merely for aethetic/ideological reasons. I don't see any need to remove etc.
Deimos is for C bindings, not Phobos. We don't want any more modules in std built on top of C bindings for libraries that aren't guaranteed to be on all of the systems that we support. Having std.net.curl has been very problematic due to the problems with getting a proper version of libcurl to link against in Windows, and there has even been some discussion of removing it entirely. So, there will be no more Phobos modules built on anything like curl or openssl or gcrypt or any other C library which isn't guaranteed to be on all systems. That being the case, there's no point in putting C bindings in Phobos. Deimos was created specifically so that there wolud be a place to get bindings to C libraries. We may want to make some adjustments to how Deimos is handled, but it's our solution to C bindings, not Phobos: https://github.com/D-Programming-Deimos druntime should have C bindings for the OSes that we support, but that's the only C bindings that should be in D's standard libraries. Whether we'll remove any that we have is still up for debate, but we're not adding any more. - Jonathan M Davis
+1 for removing std.net.curl too -- Jordi Sayol
Oct 13 2013
prev sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
On Sunday, 6 October 2013 at 20:40:50 UTC, Andrei Alexandrescu 
wrote:
 On 10/6/13 9:57 AM, David Nadlinger wrote:
 On Saturday, 5 October 2013 at 00:24:22 UTC, Andrei 
 Alexandrescu wrote:
 2. I do vote for inclusion in the /etc/ package for the time 
 being.
What is your vision for the future of etc.*, assuming that we are also going to promote DUB (or another package manager) to "official" status soon as well?
I think /etc/ should be a stepping stone to std, just like in C++ boost is for std (and boost's sandbox is for boost). Andrei
Please consider the stdx proposal instead. etc was always used for C bindings...
Oct 07 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
 On 10/2/13 7:41 AM, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
Thanks all involved for the work, first of all Brian. I have the proverbial good news and bad news. The only bad news is that I'm voting "no" on this proposal. But there's plenty of good news. 1. I am not attempting to veto this, so just consider it a normal vote when tallying. 2. I do vote for inclusion in the /etc/ package for the time being. 3. The work is good and the code valuable, so even in the case my suggestions (below) will be followed, a virtually all code pulp that gets work done can be reused.
[snip] To put my money where my mouth is, I have a proof-of-concept tokenizer for C++ in working state. http://dpaste.dzfl.pl/d07dd46d It contains some rather unsavory bits (I'm sure a ctRegex would be nicer for parsing numbers etc), but it works on a lot of code just swell. Most importantly, there's a clear distinction between the generic core and the C++-specific part. It should be obvious how to use the generic matcher for defining a D tokenizer. Token representation is minimalistic and expressive. Just write tk!"<<" for left shift, tk!"int" for int etc. Typos will be detected during compilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT; all needed by the generic tokenizer is the list of tokens. In return, it offers an efficient trie-based matcher for all tokens. (Keyword matching is unusual in that keywords are first found by the trie matcher, and then a simple check figures whether more characters follow, e.g. "if" vs. "iffy". Given that many tokenizers use a hashtable anyway to look up all symbols, there's no net loss of speed with this approach.) The lexer generator compiles fast and should run fast. If not, it should be easy to improve at the matcher level. Now, what I'm asking for is that std.d.lexer builds on this design instead of the traditional one. At a slight delay, we get the proverbial fishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. It is quite evident there's a bunch of code sharing going on already between std.d.lexer and the proposed design, so it shouldn't be hard to effect the adaptation. So with this I'm leaving it all within the hands of the submitter and the review manager. I didn't count the votes, but we may have a "yes" majority built up. Since additional evidence has been introduce, I suggest at least a revote. Ideally, there would be enough motivation for Brian to suspend the review and integrate the proposed design within std.d.lexer. Andrei
Oct 07 2013
next sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu 
wrote:
 http://dpaste.dzfl.pl/d07dd46d
I have to say, that `generateCases` function is rather disgusting. I'm really worried about the trend of using string mixins when not necessary, for no apparent gain. Surely you could have used static foreach to generate those cases instead, allowing code that is actually readable. It would probably have much better compile-time performance as well, but that's just speculation.
Oct 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/13 9:21 PM, Jakob Ovrum wrote:
 On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
 http://dpaste.dzfl.pl/d07dd46d
I have to say, that `generateCases` function is rather disgusting. I'm really worried about the trend of using string mixins when not necessary, for no apparent gain. Surely you could have used static foreach to generate those cases instead, allowing code that is actually readable. It would probably have much better compile-time performance as well, but that's just speculation.
This is the first shot, and I'm more interested in the API with the implementation to be improved. Your idea sounds great - care to put it in code so we see how it does? Andrei
Oct 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/13 9:26 PM, Andrei Alexandrescu wrote:
 On 10/7/13 9:21 PM, Jakob Ovrum wrote:
 On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
 http://dpaste.dzfl.pl/d07dd46d
I have to say, that `generateCases` function is rather disgusting. I'm really worried about the trend of using string mixins when not necessary, for no apparent gain. Surely you could have used static foreach to generate those cases instead, allowing code that is actually readable. It would probably have much better compile-time performance as well, but that's just speculation.
This is the first shot, and I'm more interested in the API with the implementation to be improved. Your idea sounds great - care to put it in code so we see how it does? Andrei
FWIW I just tried this, and it seems to work swell. int main(string[] args) { alias TypeTuple!(1, 2, 3, 4) tt; int a; switch (args.length) { foreach (i, _; tt) { case i + 1: return i * 42; } default: break; } return 0; } Interesting! Andrei
Oct 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/13 9:34 PM, Andrei Alexandrescu wrote:
 On 10/7/13 9:26 PM, Andrei Alexandrescu wrote:
 On 10/7/13 9:21 PM, Jakob Ovrum wrote:
 On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
 http://dpaste.dzfl.pl/d07dd46d
I have to say, that `generateCases` function is rather disgusting. I'm really worried about the trend of using string mixins when not necessary, for no apparent gain. Surely you could have used static foreach to generate those cases instead, allowing code that is actually readable. It would probably have much better compile-time performance as well, but that's just speculation.
This is the first shot, and I'm more interested in the API with the implementation to be improved. Your idea sounds great - care to put it in code so we see how it does? Andrei
FWIW I just tried this, and it seems to work swell. int main(string[] args) { alias TypeTuple!(1, 2, 3, 4) tt; int a; switch (args.length) { foreach (i, _; tt) { case i + 1: return i * 42; } default: break; } return 0; } Interesting! Andrei
On the other hand, I find it difficult to figure how the needed processing can be done with reasonable ease with just the above. So I guess it's your turn. Andrei
Oct 07 2013
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 8 October 2013 at 04:37:31 UTC, Andrei Alexandrescu 
wrote:
 So I guess it's your turn.
I was going to cook something up with `groupBy` (taken from the I'm still adamant this is the way to go, but I'm putting away the torch for now.
Oct 08 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/13 7:02 AM, Jakob Ovrum wrote:
 On Tuesday, 8 October 2013 at 04:37:31 UTC, Andrei Alexandrescu wrote:
 So I guess it's your turn.
I was going to cook something up with `groupBy` (taken from the still open!), but the former isn't CTFEable. Blergh. I'm still adamant this is the way to go, but I'm putting away the torch for now.
Fair enough. (Again, it would be unfair to compare an existing design against a hypothetical one.) I suspect at some point you will need to generate some custom code, which will come as a string that you need to mixin. But no matter. My most significant bit is, we need a trie lexer generator ONLY from the token strings, no TK_XXX user-provided symbols necessary. If all we need is one language (D) this is a non-issue because the library writer provides the token definitions. If we need to support user-provided languages, having the library manage the string -> small integer mapping becomes essential. Andrei
Oct 08 2013
parent reply Martin Nowak <code dawg.eu> writes:
On 10/08/2013 05:05 PM, Andrei Alexandrescu wrote:
 But no matter. My most significant bit is, we need a trie lexer
 generator ONLY from the token strings, no TK_XXX user-provided symbols
 necessary. If all we need is one language (D) this is a non-issue
 because the library writer provides the token definitions. If we need to
 support user-provided languages, having the library manage the string ->
 small integer mapping becomes essential.
It's good to get rid of the symbol names. You should try to map the strings onto an enum so that final switch works. final switch (t.type_) { case t!"<<": break; // ... }
Oct 09 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/13 6:10 PM, Martin Nowak wrote:
 On 10/08/2013 05:05 PM, Andrei Alexandrescu wrote:
 But no matter. My most significant bit is, we need a trie lexer
 generator ONLY from the token strings, no TK_XXX user-provided symbols
 necessary. If all we need is one language (D) this is a non-issue
 because the library writer provides the token definitions. If we need to
 support user-provided languages, having the library manage the string ->
 small integer mapping becomes essential.
It's good to get rid of the symbol names. You should try to map the strings onto an enum so that final switch works. final switch (t.type_) { case t!"<<": break; // ... }
Excellent point! In fact one would need to use t!"<<".id instead of t!"<<". I'll work on that next. Andrei
Oct 10 2013
next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Thursday, 10 October 2013 at 17:34:01 UTC, Andrei Alexandrescu 
wrote:
 Excellent point! In fact one would need to use t!"<<".id 
 instead of t!"<<".

 I'll work on that next.


 Andrei
I don't suppose this new lexer is on Github or something. I'd like to help get this new implementation up and running.
Oct 10 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/10/13 2:41 PM, Brian Schott wrote:
 On Thursday, 10 October 2013 at 17:34:01 UTC, Andrei Alexandrescu wrote:
 Excellent point! In fact one would need to use t!"<<".id instead of
 t!"<<".

 I'll work on that next.


 Andrei
I don't suppose this new lexer is on Github or something. I'd like to help get this new implementation up and running.
Thanks for your gracious comeback. I was fearing a "My work is not appreciated, I'm not trying to contribute anymore" etc. The code is part of Facebook's project that I mentioned in the announce forum. I am attempting to open source it, should have an answer soon. Andrei
Oct 10 2013
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11-Oct-2013 01:41, Brian Schott пишет:
 On Thursday, 10 October 2013 at 17:34:01 UTC, Andrei Alexandrescu wrote:
 Excellent point! In fact one would need to use t!"<<".id instead of
 t!"<<".

 I'll work on that next.


 Andrei
I don't suppose this new lexer is on Github or something. I'd like to help get this new implementation up and running.
Love this attitude! :) Having helped with std.d.lexer before (w.r.t. to performance mostly) I'm inclined to land a hand in perfecting the more generic one. -- Dmitry Olshansky
Oct 11 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11-Oct-2013 13:52, Dmitry Olshansky пишет:
 11-Oct-2013 01:41, Brian Schott пишет:
 On Thursday, 10 October 2013 at 17:34:01 UTC, Andrei Alexandrescu wrote:
 Excellent point! In fact one would need to use t!"<<".id instead of
 t!"<<".

 I'll work on that next.


 Andrei
I don't suppose this new lexer is on Github or something. I'd like to help get this new implementation up and running.
Love this attitude! :) Having helped with std.d.lexer before (w.r.t. to performance mostly) I'm inclined to land a hand in perfecting the more generic one.
s/land/lend/ -- Dmitry Olshansky
Oct 11 2013
prev sibling parent Martin Nowak <code dawg.eu> writes:
On 10/10/2013 07:34 PM, Andrei Alexandrescu wrote:
 Excellent point! In fact one would need to use t!"<<".id instead of t!"<<"
Either adding an alias this from TokenType to the enum or returning the enum in tk!"<<" would circumvent this. See http://dpaste.dzfl.pl/cdcba00d
Oct 10 2013
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, October 07, 2013 17:16:45 Andrei Alexandrescu wrote:
 So with this I'm leaving it all within the hands of the submitter and
 the review manager. I didn't count the votes, but we may have a "yes"
 majority built up. Since additional evidence has been introduce, I
 suggest at least a revote. Ideally, there would be enough motivation for
 Brian to suspend the review and integrate the proposed design within
 std.d.lexer.
I think that it's worth noting that if this vote passes, it will be the first vote for a Phobos module which passed and had any "no" votes cast against it (at least, if any of the previous modules had any "no" votes, I don't recall them; it's always been overwhelmingly in favor of inclusion). That in and of itself implies that the situation needs further examination. Though maybe it's simply that this particular module is in an area where we have more posters with strong opinions. Also, in general, I tend to think that we should move towards not merging new modules into Phobos as quickly as we have in the past. Whether the "stdx" proposal is the way to go or not is another matter, but I think that we should aim for having modules be more battle-tested before actually becoming full- fledged modules in Phobos. We've had great stuff reviewed and merged thus far, but we also tend to end up having to make minor tweaks to the API or later come to regret including it at all (e.g. std.net.curl). Having some sort of intermediate step prior to full inclusion for at least one or two releases would be a good move IMHO. - Jonathan M Davis
Oct 07 2013
next sibling parent "Brian Schott" <briancschott gmail.com> writes:
On Tuesday, 8 October 2013 at 05:22:32 UTC, Jonathan M Davis 
wrote:
 I think that it's worth noting that if this vote passes, it 
 will be the first
 vote for a Phobos module which passed and had any "no" votes 
 cast against it
 (at least, if any of the previous modules had any "no" votes, I 
 don't recall
 them; it's always been overwhelmingly in favor of inclusion). 
 That in and of
 itself implies that the situation needs further examination. 
 Though maybe it's
 simply that this particular module is in an area where we have 
 more posters
 with strong opinions.
I had noticed this. I'm not sure if a simple majority is good enough for the standard library.
Oct 07 2013
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Tuesday, 8 October 2013 at 05:22:32 UTC, Jonathan M Davis 
wrote:
 I think that it's worth noting that if this vote passes, it 
 will be the first
 vote for a Phobos module which passed and had any "no" votes 
 cast against it
 (at least, if any of the previous modules had any "no" votes, I 
 don't recall
 them; it's always been overwhelmingly in favor of inclusion).
Guess what was the main point of my concerns while following this voting thread. Until now there were at most one "No" vote for accepted proposals and exact "Yes" vote threshold is defined anywhere. When voting will end I will sum up some format stats on topic and after some hard thinking will make separate announcement/topic possible outcomes.
Oct 08 2013
prev sibling parent Martin Nowak <code dawg.eu> writes:
On 10/08/2013 07:22 AM, Jonathan M Davis wrote:
 We've had great stuff reviewed and merged thus far,
 but we also tend to end up having to make minor tweaks to the API or later
 come to regret including it at all (e.g. std.net.curl). Having some sort of
 intermediate step prior to full inclusion for at least one or two releases
 would be a good move IMHO.
It usually takes me a few month until I get to try a new module at which point it's mostly already voted and included. So the current approach doesn't work for me at all.
Oct 09 2013
prev sibling next sibling parent reply "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu 
wrote:
 To put my money where my mouth is, I have a proof-of-concept 
 tokenizer for C++ in working state.

 http://dpaste.dzfl.pl/d07dd46d
Why do you use "\0" as end-of-stream token: /** * All token types include regular and reservedTokens, plus the null * token ("") and the end-of-stream token ("\0"). */ We can have situation when the "\0" is a valid token, for example for binary formats. Is it possible to indicate end-of-stream another way, maybe via "empty" property for range-based API?
Oct 08 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/13 11:11 PM, ilya-stromberg wrote:
 On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei Alexandrescu wrote:
 To put my money where my mouth is, I have a proof-of-concept tokenizer
 for C++ in working state.

 http://dpaste.dzfl.pl/d07dd46d
Why do you use "\0" as end-of-stream token: /** * All token types include regular and reservedTokens, plus the null * token ("") and the end-of-stream token ("\0"). */ We can have situation when the "\0" is a valid token, for example for binary formats. Is it possible to indicate end-of-stream another way, maybe via "empty" property for range-based API?
I'm glad you asked. It's simply a decision by convention. I know no C++ source can contain a "\0", so I append it to the input and use it as a sentinel. A general lexer should take the EOF symbol as a parameter. One more thing: the trie matcher knows a priori (statically) what the maximum lookahead is - it's the maximum of all symbols. That can be used to pre-fill the input buffer such that there's never an out-of-bounds access, even with input ranges. Andrei
Oct 09 2013
parent "ilya-stromberg" <ilya-stromberg-2009 yandex.ru> writes:
On Wednesday, 9 October 2013 at 07:49:55 UTC, Andrei Alexandrescu 
wrote:
 On 10/8/13 11:11 PM, ilya-stromberg wrote:
 On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei 
 Alexandrescu wrote:
 To put my money where my mouth is, I have a proof-of-concept 
 tokenizer
 for C++ in working state.

 http://dpaste.dzfl.pl/d07dd46d
Why do you use "\0" as end-of-stream token: /** * All token types include regular and reservedTokens, plus the null * token ("") and the end-of-stream token ("\0"). */ We can have situation when the "\0" is a valid token, for example for binary formats. Is it possible to indicate end-of-stream another way, maybe via "empty" property for range-based API?
I'm glad you asked. It's simply a decision by convention. I know no C++ source can contain a "\0", so I append it to the input and use it as a sentinel. A general lexer should take the EOF symbol as a parameter. One more thing: the trie matcher knows a priori (statically) what the maximum lookahead is - it's the maximum of all symbols. That can be used to pre-fill the input buffer such that there's never an out-of-bounds access, even with input ranges. Andrei
So, it's interesting to see a new improved API, because we need a really generic lexer. I think it's not so difficult.
Oct 09 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/13 5:16 PM, Andrei Alexandrescu wrote:
 On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
 On 10/2/13 7:41 AM, Dicebot wrote:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
Thanks all involved for the work, first of all Brian. I have the proverbial good news and bad news. The only bad news is that I'm voting "no" on this proposal. But there's plenty of good news. 1. I am not attempting to veto this, so just consider it a normal vote when tallying. 2. I do vote for inclusion in the /etc/ package for the time being. 3. The work is good and the code valuable, so even in the case my suggestions (below) will be followed, a virtually all code pulp that gets work done can be reused.
[snip] To put my money where my mouth is, I have a proof-of-concept tokenizer for C++ in working state. http://dpaste.dzfl.pl/d07dd46d
I made an improvement to the way tokens are handled. In the paste above, "tk" is a function. A CTFE-able function that just returns a compile-time constant, but a function nevertheless. To actually reduce "tk" to a compile-time constant in all cases, I changed it as follows: template tk(string symbol) { import std.range; static if (symbol == "") { // Token ID 0 is reserved for "unrecognized token". enum tk = TokenType2(0); } else static if (symbol == "\0") { // Token ID max is reserved for "end of input". enum tk = TokenType2( cast(TokenIDRep) (1 + tokens.length + reservedTokens.length)); } else { //enum id = chain(tokens, reservedTokens).countUntil(symbol); // Find the id within the regular tokens realm enum idTokens = tokens.countUntil(symbol); static if (idTokens >= 0) { // Found, regular token. Add 1 because 0 is reserved. enum id = idTokens + 1; } else { // not found, only chance is within the reserved tokens realm enum idResTokens = reservedTokens.countUntil(symbol); enum id = idResTokens >= 0 ? tokens.length + idResTokens + 1 : -1; } static assert(id >= 0 && id < TokenIDRep.max, "Invalid token: " ~ symbol); enum tk = TokenType2(id); } This is even better now because token types are simple static constants. Andrei
Oct 09 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
08-Oct-2013 04:16, Andrei Alexandrescu пишет:
 On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
 To put my money where my mouth is, I have a proof-of-concept tokenizer
 for C++ in working state.

 http://dpaste.dzfl.pl/d07dd46d

 It contains some rather unsavory bits (I'm sure a ctRegex would be nicer
 for parsing numbers etc), but it works on a lot of code just swell.
No - ctRegex as it stands right now is too generic and conservative with the code it generates so "\d+" would do: a) use full Unicode for "Number" b) keep tabs on where to return as if there could be ambiguity of how many '\d' it may eat (no maximal munch etc.). The reason is that the fact that said '\d+' may be in the middle of some pattern (e.g. 9\d+0), and the fact that it's unambiguous on its own is not exploited. Both are quite suboptimal and there is a long road I going to take to have a _general_ solution for both points. One day we would reach that goal though. ATM just hack your way through if pattern is sooo simple.
 Most importantly, there's a clear distinction between the generic core
 and the C++-specific part. It should be obvious how to use the generic
 matcher for defining a D tokenizer.
 Token representation is minimalistic and expressive. Just write tk!"<<"
 for left shift, tk!"int" for int etc. Typos will be detected during
 compilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT;
 all needed by the generic tokenizer is the list of tokens. In return, it
 offers an efficient trie-based matcher for all tokens.

 (Keyword matching is unusual in that keywords are first found by the
 trie matcher, and then a simple check figures whether more characters
 follow, e.g. "if" vs. "iffy".
+1 Given that many tokenizers use a hashtable
 anyway to look up all symbols, there's no net loss of speed with this
 approach.
Yup. The only benefit is slimmer giant switch. Another "hybrid" option is insated of hash-table use a generated keyword trie searcher separately as a function. Then just test each identifier with it. This is what std.d.lexer does and is quite fast. (awaiting latest benchmarks)
 The lexer generator compiles fast and should run fast. If not, it should
 be easy to improve at the matcher level.

 Now, what I'm asking for is that std.d.lexer builds on this design
 instead of the traditional one. At a slight delay, we get the proverbial
 fishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. It
 is quite evident there's a bunch of code sharing going on already
 between std.d.lexer and the proposed design, so it shouldn't be hard to
 effect the adaptation.
Agreed. Let us take a moment to incorporate a better design.
 So with this I'm leaving it all within the hands of the submitter and
 the review manager. I didn't count the votes, but we may have a "yes"
 majority built up. Since additional evidence has been introduce, I
 suggest at least a revote. Ideally, there would be enough motivation for
 Brian to suspend the review and integrate the proposed design within
 std.d.lexer.


 Andrei
-- Dmitry Olshansky
Oct 11 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
  [...]
Some points: 1. This is a replacement for the switch statement starting at around line 505 in advance() https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d It is not a replacement for the rest of the lexer. 2. Instead of explicit token type enums, such as: mod, /// % it would just be referred to as: tok!"%" Andrei pointed out to me that he has fixed the latter so it resolves to a small integer - meaning it works efficiently as cases in switch statements. This removes my primary objection to it. 3. This level of abstraction combined with efficient generation cannot be currently done in any other language. Hence, it makes for a sweet showcase of what D can do. Hence, I think we ought to adapt Brian's lexer by replacing the switch with Andrei's trie searcher, and replacing the enum TokenType with the tok!"string" syntax.
Oct 08 2013
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Wednesday, 9 October 2013 at 01:27:22 UTC, Walter Bright wrote:
 On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
 [...]
Some points: 1. This is a replacement for the switch statement starting at around line 505 in advance() https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d
Github tip: You can link to a specific line by clicking the line number and copying and pasting your new URL.
Oct 08 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/13 6:26 PM, Walter Bright wrote:
 On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
  [...]
Some points: 1. This is a replacement for the switch statement starting at around line 505 in advance() https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d It is not a replacement for the rest of the lexer. 2. Instead of explicit token type enums, such as: mod, /// % it would just be referred to as: tok!"%" Andrei pointed out to me that he has fixed the latter so it resolves to a small integer - meaning it works efficiently as cases in switch statements. This removes my primary objection to it. 3. This level of abstraction combined with efficient generation cannot be currently done in any other language. Hence, it makes for a sweet showcase of what D can do. Hence, I think we ought to adapt Brian's lexer by replacing the switch with Andrei's trie searcher, and replacing the enum TokenType with the tok!"string" syntax.
Thanks, that's exactly what I had in mind. Also the trie searcher should be exposed by the library so people can implement other languages. Let me make another, more strategic, point. Projects like Rust and Go have dozens of people getting paid to work on them. In the time it takes us to crank one conventional lexer/parser for a language, they can crank five. The answer is we can't win with a conventional approach. We must leverage D's strengths to amplify our speed of execution, and in this context an integrated generic lexer generator is the ticket. There is one thing I neglected to mention, and I apologize for that. Coming with this all on the eve of voting must be quite demotivating for Brian, who's been through all the arduous steps to get his work to production quality. I hope the compensating factor is that the proposed change is a net positive for the greater good. Andrei
Oct 08 2013
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu 
wrote:
 On 10/8/13 6:26 PM, Walter Bright wrote:
 On 10/4/2013 5:24 PM, Andrei Alexandrescu wrote:
 [...]
Some points: 1. This is a replacement for the switch statement starting at around line 505 in advance() https://github.com/Hackerpilot/phobos/blob/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23/std/d/lexer.d It is not a replacement for the rest of the lexer. 2. Instead of explicit token type enums, such as: mod, /// % it would just be referred to as: tok!"%" Andrei pointed out to me that he has fixed the latter so it resolves to a small integer - meaning it works efficiently as cases in switch statements. This removes my primary objection to it. 3. This level of abstraction combined with efficient generation cannot be currently done in any other language. Hence, it makes for a sweet showcase of what D can do. Hence, I think we ought to adapt Brian's lexer by replacing the switch with Andrei's trie searcher, and replacing the enum TokenType with the tok!"string" syntax.
Thanks, that's exactly what I had in mind. Also the trie searcher should be exposed by the library so people can implement other languages. Let me make another, more strategic, point. Projects like Rust and Go have dozens of people getting paid to work on them. In the time it takes us to crank one conventional lexer/parser for a language, they can crank five. The answer is we can't win with a conventional approach. We must leverage D's strengths to amplify our speed of execution, and in this context an integrated generic lexer generator is the ticket. There is one thing I neglected to mention, and I apologize for that. Coming with this all on the eve of voting must be quite demotivating for Brian, who's been through all the arduous steps to get his work to production quality. I hope the compensating factor is that the proposed change is a net positive for the greater good. Andrei
Overall, I think this is going into the right direction. However, there is one thing I don't like with that design. When you go throw the big switch of death, you match the beginning of the string and then you go back to a function that will test where does it come from and act accordingly. That is kind of wasteful. What SDC does is that it calls a function-template with the part matched by the big switch of death passed as template argument. The nice thing about it is that it is easy to trnsform this compile time argument into a runtime one by simply forwarding it (what is done to parse identifier that begins by a keyword for instance).
Oct 08 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/13 9:32 PM, deadalnix wrote:
 Overall, I think this is going into the right direction. However, there
 is one thing I don't like with that design.

 When you go throw the big switch of death, you match the beginning of
 the string and then you go back to a function that will test where does
 it come from and act accordingly. That is kind of wasteful.

 What SDC does is that it calls a function-template with the part matched
 by the big switch of death passed as template argument. The nice thing
 about it is that it is easy to trnsform this compile time argument into
 a runtime one by simply forwarding it (what is done to parse identifier
 that begins by a keyword for instance).
I think a bit of code would make all that much clearer. Andrei
Oct 08 2013
parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 9 October 2013 at 04:38:02 UTC, Andrei Alexandrescu 
wrote:
 On 10/8/13 9:32 PM, deadalnix wrote:
 Overall, I think this is going into the right direction. 
 However, there
 is one thing I don't like with that design.

 When you go throw the big switch of death, you match the 
 beginning of
 the string and then you go back to a function that will test 
 where does
 it come from and act accordingly. That is kind of wasteful.

 What SDC does is that it calls a function-template with the 
 part matched
 by the big switch of death passed as template argument. The 
 nice thing
 about it is that it is easy to trnsform this compile time 
 argument into
 a runtime one by simply forwarding it (what is done to parse 
 identifier
 that begins by a keyword for instance).
I think a bit of code would make all that much clearer. Andrei
Sure. So here is the lexer generation infos (this can be simplified by using the tok!"foobar" thing) : http://dpaste.dzfl.pl/7ec225ee Using theses infos, a huge switch based boilerplate is generated. Each "leaf" of the huge switch tree call a function template as follow, by passing as template argument what has been matched so far. You can then proceed as follow : http://dpaste.dzfl.pl/f2f0d22c You may wonder about the "?lexComment". The boilerplate generator understand ? as an indication that lexComment may or may not return a token (depending on lexer configuration) and generate what is needed to handle that (by testing if the function return a token, via some static ifs). You obviously ends up with a log of instance of lexIdentifier(string s)(), but this simply forward to lexIdentifier()(string s) and the forwarding function is removed trivially by the inliner.
Oct 08 2013
prev sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu 
wrote:
 for the greater good.
YOU CALL YOURSELVES A COMMUNITY THAT CARES? http://www.youtube.com/watch?v=yUpbOliTHJY
Oct 08 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/13 9:33 PM, Brian Schott wrote:
 On Wednesday, 9 October 2013 at 03:55:42 UTC, Andrei Alexandrescu wrote:
 for the greater good.
YOU CALL YOURSELVES A COMMUNITY THAT CARES? http://www.youtube.com/watch?v=yUpbOliTHJY
I swear I had that in mind when I wrote "the greater good". Awesome movie, and quite fit for the situation :o). Andrei
Oct 08 2013
prev sibling parent "Araq" <rumpf_a web.de> writes:
 3. This level of abstraction combined with efficient generation 
 cannot be currently done in any other language.
This is wrong.
Oct 11 2013
prev sibling next sibling parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Yes


I see the token type discussion as a matter of taste and one that could
cleanly be changed using a deprecation path (I wouldn't mind the Tok!str
approach, though). I also see no fundamental reason why the API forbids
extension for shared sting tables or table-less lexing. And pure
implementation details (IMO) shouldn't be a primary voting concern. Much
more important is that we don't defer this another year or more for no
good reason (changing pure implementation details or purely extending
the API are no good reasons when there is a solid implementation/API
already).

(sorry for the additional rationale)
Oct 06 2013
parent reply Martin Nowak <code dawg.eu> writes:
On 10/06/2013 10:18 AM, Sönke Ludwig wrote:
 I also see no fundamental reason why the API forbids
 extension for shared sting tables or table-less lexing.
The current API requires to copy slices of the const(ubyte)[] input to string values in every token. This can't be done efficiently without a string table. But a string table is unnecessary for many use-cases, so the API has a built-in performance/memory issue.
Oct 09 2013
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 10.10.2013 03:25, schrieb Martin Nowak:
 On 10/06/2013 10:18 AM, Sönke Ludwig wrote:
 I also see no fundamental reason why the API forbids
 extension for shared sting tables or table-less lexing.
The current API requires to copy slices of the const(ubyte)[] input to string values in every token. This can't be done efficiently without a string table. But a string table is unnecessary for many use-cases, so the API has a built-in performance/memory issue.
But it could be extended later to accept immutable input as a special case, thus removing that requirement, if I'm not overlooking something. In that case it still is a pure implementation detail.
Oct 10 2013
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 02, 2013 16:41:54 Dicebot wrote:
 After brief discussion with Brian and gathering data from the
 review thread, I have decided to start voting for `std.d.lexer`
 inclusion into Phobos.
I'm going to have to vote no. While Brian has done some great work, I think that it's clear from the discussion that there are still some potential issues (e.g. requiring a string table) that need further discussion and possibly API changes. Also, while I question that a generated lexer can beat a hand-written one, I think that we really should look at what Andrei's proposing and look at adjusting whan Brian has done accordingly - or at least do enough so that we can benchmark the two approaches. As such, accepting the lexer right now doesn't really make sense. However, we may want to make it so that the lexer is in some place of prominence (outside of Phobos - probably on dub but mentioned somewhere on dlang.org) as an _experimental_ module which is clearly marked as not finalized but which is ready for people to use and bang on. That way, we may be able to get some better feedback generated from more real world use. - Jonathan M Davis
Oct 09 2013
parent "Volcz" <volcz kth.se> writes:
On Thursday, 10 October 2013 at 04:33:15 UTC, Jonathan M Davis 
wrote:
 On Wednesday, October 02, 2013 16:41:54 Dicebot wrote:
 After brief discussion with Brian and gathering data from the
 review thread, I have decided to start voting for `std.d.lexer`
 inclusion into Phobos.
I'm going to have to vote no. While Brian has done some great work, I think that it's clear from the discussion that there are still some potential issues (e.g. requiring a string table) that need further discussion and possibly API changes. Also, while I question that a generated lexer can beat a hand-written one, I think that we really should look at what Andrei's proposing and look at adjusting whan Brian has done accordingly - or at least do enough so that we can benchmark the two approaches. As such, accepting the lexer right now doesn't really make sense. However, we may want to make it so that the lexer is in some place of prominence (outside of Phobos - probably on dub but mentioned somewhere on dlang.org) as an _experimental_ module which is clearly marked as not finalized but which is ready for people to use and bang on. That way, we may be able to get some better feedback generated from more real world use. - Jonathan M Davis
Vote: No. Same reason as Jonathan above.
Oct 10 2013
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
02-Oct-2013 18:41, Dicebot пишет:
 After brief discussion with Brian and gathering data from the review
 thread, I have decided to start voting for `std.d.lexer` inclusion into
 Phobos.
I'd have to answer as NO. In order to get to a YES state, it needs: a) Use tok!"==" notation (in line with generic lexer). It makes it far more convenient in the parser down the road as well. b) Ideally use generic lexer framework but it makes for 2 modules to include, so just make it easy to switch to later (no breakage etc.) c) Abstract away string table, let user provide his own hooks for that, and provide a default StringCache. d) Allow operation w/o StringTable at all (make it optional) including "just slice the input" mode. P.S. I'm not a fun of etc.d.lexer. Instead a dub repo seems like a good place for the moment, for these who need it right now. Other may collectively wait for or help in getting to perfection. -- Dmitry Olshansky
Oct 11 2013
prev sibling parent "Dicebot" <public dicebot.lv> writes:
Voting is closed.

Stats and outcome pending.
Oct 13 2013